i meet problem werid. here code.
#!/usr/bin/python # filename: parse_dblp.py # author: ivanchou import codecs, os import xml.etree.elementtree et paper_tag = ('article','inproceedings','proceedings','book', 'incollection','phdthesis','mastersthesis','www') class allentities: def __getitem__(self, key): return key print ('----------parse begin----------') # parse result store authors result = codecs.open('authors','w','utf-8') parser = et.xmlparser() parser.parser.useforeigndtd(true) parser.entity = allentities() event, article in et.iterparse('dblp_part.xml', events=("start", "end"), parser=parser): author in article.findall('author'): result.write(author.text + u'|') if event == 'end' , article.tag in paper_tag: result.write(os.linesep) article.clear() print ('----------parse end----------')
the file dblp_part.xml have create gist here: dblp_part.xml
it contains head 2336 line of dblp.xml, , last article element returns me nonetype error, , if exchange last 2 element, it's ok. bug of elementtree?
i newer of python, search of help.
Comments
Post a Comment