numpy - clustering based on tags in python -


i have search system books tagged, every book marked tags. ex.

book: tags book1: u'nipu', u'ypam', u'axei', u'wjqt', u'snur', u'fjqv', u'utmq' book2: u'nkem', u'jaqq', u'efoy', u'dags', u'fjqv' book3: u'ypam', u'axei', u'wjqt', u'snur', u'fjqv', u'utmq', u'ujha' .... .. . 

here have thousands of books different tags. looking clustering mechanism can create list based on tags. example:

tag: no of books  nipu: 12390 fjqv: 2345 .. . nipu,fjqv: 1243 snur,ujha: 2343 .. . nipu,fjqv,snur: 1290 .. . efoy,wjqt,fjqv,utmq: 1894 .... ... .. . ypam,axei,wjqt,snur,fjqv,utmq,ujha: 1 

any pointer helpful, spent time on kmeans not sure how use in scenario.

i don't think kmeans appropriate in situation, since you're looking equalities in data, not similarities. looks want find frequent itemsets. can computationally demanding task depending on size of data, there tricks cleverly interrogate search space.

look priori principle, , fk-1 x fk-1 method of candidate pruning. chapter 6 of book guide you: http://www-users.cs.umn.edu/~kumar/dmbook/index.php


Comments