Thursday, October 21, 2004

efficience vs. simplicity

I had to write a function that would generate a fixed-length sequence of symbols to be used as short, human-readable identifier for some database entities.

The first version I wrote rather quickly used a brute-force algorithm:


def idgenerator(length, prefix=None, seq=None):
if prefix:
length = length - len(prefix)
seq = seq or (string.digits + string.ascii_lowercase)
while 1:
bits = []
for _ in range(length):
bits.append(random.choice(seq))
value = ''.join(bits)
if prefix:
value = prefix + value
yield value


It works okay, but I was concerned with it's efficiency: it calls random.choice() function N times, where N is length of identifier. And I assume random.choice() must be computationally expensive. So, after some googling and hacking, I come up with the following:


def str_base(seq, n, base=10):
results = []
while n:
n, r = divmod(n, base)
results.append(seq[r])
results.reverse()
return ''.join(results)

def idgenerator_fast(length, prefix=None, seq=None):
if prefix:
length = length - len(prefix)
seq = seq or (string.digits + string.ascii_lowercase)
N = len(seq) # a magic number
minvalue = int(seq[0]*(length-1), N)
maxvalue = int(seq[-1]*length, N)
while 1:
n = random.randint(minvalue, maxvalue)
value = str_base(seq, n, N)
if len(value) < length:
pad = seq[0]*(length-len(value))
value = pad + value
if prefix:
value = prefix + value
yield value


I didn't profile this code, but IMO it should be much more efficient. But the problem is that it is twice as big and more difficult to comprehend - due to this base36 conversion trick.
And if you take into account that this function is called by the db-access code than you realize than it's costs are negligible compared to interprocess db calls.

Therefore I decided to stick with the simpler version. Nevertheless, it was a nice coding exercise, something like a mini-Kata.

0 Comments:

Post a Comment

<< Home