Python nested html tags with Beautifulsoup -


i'm trying href urls nested html code:

... <li class="dropdown"> <a href="#" class="dropdown-toggle wide-nav-link" data-toggle="dropdown">text_1 <b class="caret"></b></a> <ul class="dropdown-menu"> <li class="class_a"><a title="title_1" href="http://www.customurl_1.com">title_1</a></li> <li class="class_b"><a title="title_2" href="http://www.customurl_2.com">title_2</a></li> ... <li class="class_a"><a title="title_x" href="http://www.customurl_x.com">title_x</a></li> </ul> </li> ... <li class="dropdown"> <a href="#" class="dropdown-toggle wide-nav-link" data-toggle="dropdown">text_2 <b class="caret"></b></a> <ul class="dropdown-menu"> <li class="class_a"><a title="title_1" href="http://www.customurl_1.com">title_1</a></li> <li class="class_b"><a title="title_2" href="http://www.customurl_2.com">title_2</a></li> ... <li class="class_a"><a title="title_x" href="http://www.customurl_x.com">title_x</a></li> </ul> </li> ... 

in original html code there 15 "li" blocks class "dropdown", want urls block text = text_1. it's possible grap these nested urls beautifulsoup?

thanks help

an example lxml , xpath:

from lxml import etree io import stringio  parser = etree.htmlparser() tree   = etree.parse(stringio(html), parser) hrefs = tree.xpath('//li[@class="dropdown" , a[starts-with(.,"text_1")]]/ul[@class="dropdown-menu"]/li/a/@href')  print hrefs 

where html unicode string html content. result:

['http://www.customurl_1.com', 'http://www.customurl_2.com', 'http://www.customurl_x.com'] 

note: use starts-with function more precise in xpath query, can use contains in same way if text_1 not @ begining of text node.

query details:

//              # anywhere in domtree li              # li tag following conditions: [                               # (opening condition bracket li)     @class="dropdown"           # li has class attribute equal "dropdown"    ,                           # ,                               # child tag "a"     [                           # (open condition "a")         starts-with(.,"text_1") # text starts "text_1"     ]                           # (close condition "a") ]                               # (close condition li) /                            # li's child (/ stands immediate descendant) ul[@class="dropdown-menu"]   # "ul" class equal "dropdown-menu" /li                          # "li" children of "ul" /a                           # "a" children of "li" /@href                       # href attributes children of "a" 

Comments