### efficience vs. simplicity

I had to write a function that would generate a fixed-length sequence of symbols to be used as short, human-readable identifier for some database entities.

The first version I wrote rather quickly used a brute-force algorithm:

It works okay, but I was concerned with it's efficiency: it calls random.choice() function N times, where N is length of identifier. And I assume random.choice() must be computationally expensive. So, after some googling and hacking, I come up with the following:

I didn't profile this code, but IMO it should be much more efficient. But the problem is that it is twice as big and more difficult to comprehend - due to this base36 conversion trick.

And if you take into account that this function is called by the db-access code than you realize than it's costs are negligible compared to interprocess db calls.

Therefore I decided to stick with the simpler version. Nevertheless, it was a nice coding exercise, something like a mini-Kata.

The first version I wrote rather quickly used a brute-force algorithm:

```
```

def idgenerator(length, prefix=None, seq=None):

if prefix:

length = length - len(prefix)

seq = seq or (string.digits + string.ascii_lowercase)

while 1:

bits = []

for _ in range(length):

bits.append(random.choice(seq))

value = ''.join(bits)

if prefix:

value = prefix + value

yield value

It works okay, but I was concerned with it's efficiency: it calls random.choice() function N times, where N is length of identifier. And I assume random.choice() must be computationally expensive. So, after some googling and hacking, I come up with the following:

```
```

def str_base(seq, n, base=10):

results = []

while n:

n, r = divmod(n, base)

results.append(seq[r])

results.reverse()

return ''.join(results)

def idgenerator_fast(length, prefix=None, seq=None):

if prefix:

length = length - len(prefix)

seq = seq or (string.digits + string.ascii_lowercase)

N = len(seq) # a magic number

minvalue = int(seq[0]*(length-1), N)

maxvalue = int(seq[-1]*length, N)

while 1:

n = random.randint(minvalue, maxvalue)

value = str_base(seq, n, N)

if len(value) < length:

pad = seq[0]*(length-len(value))

value = pad + value

if prefix:

value = prefix + value

yield value

I didn't profile this code, but IMO it should be much more efficient. But the problem is that it is twice as big and more difficult to comprehend - due to this base36 conversion trick.

And if you take into account that this function is called by the db-access code than you realize than it's costs are negligible compared to interprocess db calls.

Therefore I decided to stick with the simpler version. Nevertheless, it was a nice coding exercise, something like a mini-Kata.

## 0 Comments:

Post a Comment

<< Home