i'm trying href urls nested html code:
... <li class="dropdown"> <a href="#" class="dropdown-toggle wide-nav-link" data-toggle="dropdown">text_1 <b class="caret"></b></a> <ul class="dropdown-menu"> <li class="class_a"><a title="title_1" href="http://www.customurl_1.com">title_1</a></li> <li class="class_b"><a title="title_2" href="http://www.customurl_2.com">title_2</a></li> ... <li class="class_a"><a title="title_x" href="http://www.customurl_x.com">title_x</a></li> </ul> </li> ... <li class="dropdown"> <a href="#" class="dropdown-toggle wide-nav-link" data-toggle="dropdown">text_2 <b class="caret"></b></a> <ul class="dropdown-menu"> <li class="class_a"><a title="title_1" href="http://www.customurl_1.com">title_1</a></li> <li class="class_b"><a title="title_2" href="http://www.customurl_2.com">title_2</a></li> ... <li class="class_a"><a title="title_x" href="http://www.customurl_x.com">title_x</a></li> </ul> </li> ...
in original html code there 15 "li" blocks class "dropdown", want urls block text = text_1. it's possible grap these nested urls beautifulsoup?
thanks help
an example lxml , xpath:
from lxml import etree io import stringio parser = etree.htmlparser() tree = etree.parse(stringio(html), parser) hrefs = tree.xpath('//li[@class="dropdown" , a[starts-with(.,"text_1")]]/ul[@class="dropdown-menu"]/li/a/@href') print hrefs
where html
unicode string html content. result:
['http://www.customurl_1.com', 'http://www.customurl_2.com', 'http://www.customurl_x.com']
note: use starts-with
function more precise in xpath query, can use contains
in same way if text_1
not @ begining of text node.
query details:
// # anywhere in domtree li # li tag following conditions: [ # (opening condition bracket li) @class="dropdown" # li has class attribute equal "dropdown" , # , # child tag "a" [ # (open condition "a") starts-with(.,"text_1") # text starts "text_1" ] # (close condition "a") ] # (close condition li) / # li's child (/ stands immediate descendant) ul[@class="dropdown-menu"] # "ul" class equal "dropdown-menu" /li # "li" children of "ul" /a # "a" children of "li" /@href # href attributes children of "a"
Comments
Post a Comment