python - Why isn't "counting sort" a more widely used algorithm? -

i graphing out letter frequency in large academic documents. part of process, sorting letters large clippings of documents alphabetical order. using python's built in sorted function, , started wonder if make faster. wrote following function:

  def count_sort(l):         items = {'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,'l':0,'m':  0,'n':0,'o':0,'p':0,'q':0,'r':0,'s':0,'t':0,'u':0,'v':0,'w':0,'x':0,'y':0,'z' :0}         item in l:             items[item] += 1         sort_l = []         key in items:             sort_l += key*items[key]         return sort_l

when testing code versus sorted on 10000 letter long string of text, 20x faster.

with such performance boost, why isn't sorting algorithm in standard libs?

you have rediscovered counting sort algorithm.

to quote wikipedia:

for problem instances in maximum key value smaller number of items, counting sort can highly space-efficient, storage uses other input , output arrays count array uses space o(k).

the counting sort algorithm becomes more , more efficient(relatively) greater difference between total number of items being sorted, , number of unique items being sorted.

you can see why must looking @ own code, or wikipedia example code:

# calculate histogram of key frequencies: x in input:     count[key(x)] += 1  # calculate starting index each key: total = 0 in range(k):   # = 0, 1, ... k-1     oldcount = count[i]     count[i] = total     total += oldcount  # copy output array, preserving order of inputs equal keys: x in input:     output[count[key(x)]] = x     count[key(x)] += 1  return output

you have 2 loops in function: first iterate on letters sorting, , second iterate on items dictionary. posited previously, makes sense of items dictionary considerably smaller list sorting, becomes inefficient if number number of unique elements increases relative number of items being sorted.

like @brenbarn answered, when know characters expect , you're willing ignore other characters. while seems counting sort highly efficient in example gave, problem of sorting letters hardly common sorting problem.

below have fixed function print letters iterating through list rather iterating through keys in dictionary(as python's dictionaries not ordered)

def count_sort(l):     letters = [chr(i) in range(97, 122)]     items = dict()     letter in letters:         items[letter] = 0     item in l:         items[item] += 1     sort_l = list()     letter in letters:         sort_l.extend(letter*items[letter])     return sort_l

Fun enginering

Search This Blog

python - Why isn't "counting sort" a more widely used algorithm? -

Comments

Post a Comment