performance - Efficient way to fill 2d array in Python -


i have 3 arrays: array "words" of pairs ["id": "word"] length 5000000, array "ids" of unique ids length 13000 , array "dict" of unique words (dictionary) length 500000. code:

matrix = sparse.lil_matrix((len(ids), len(dict))) in words:     matrix[id.index(i['id']), dict.index(i['word'])] += 1.0 

but works slow (i haven't got matrix after 15 hours of work). there ideas optimize code?

first of don't name array dict, confusing hides built-in type dict.

the problem here you're doing in quadratic time, convert arrays dict , id dictionary first each word or id point index.

matrix = sparse.lil_matrix((len(ids), len(dict))) dict_from_dict = {word: ind ind, word in enumerate(dict)} dict_from_id = {id: ind ind, id in enumerate(id)} in words:     matrix[dict_from_id[i['id']], dict_from_dict[i['word']] += 1.0 

Comments