Python RegEx - How do I form a regex that contains a hyphen inside a word -

i need regular expression match "/page-2" or "/page-3" part of bigger url such http://domain.com/articles/page-number

so far, have tried these combinations: '/page-\d' '/page-\d' '\b/page-\d\b'

please note, using regex part of rules in start_urls section in scrapy project. suggestions appreciated. here's code snippet:

class ndtvxolonewsitem(crawlspider):     name = "ndtvxolonews"     allowed_domains = ["http://gadgets.ndtv.com/tags/"]     start_urls = ["http://gadgets.ndtv.com/tags/xolo/articles"]     rules = [rule(linkextractor(allow=['\b/page\-\d\b']))]

allowed_domains should domain name. can filter specific path including start of url in regex

class ndtvxolonewsitem(crawlspider):     name = "ndtvxolonews"     allowed_domains = ["gadgets.ndtv.com"]     start_urls = ["http://gadgets.ndtv.com/tags/xolo/articles"]     rules = [rule(linkextractor(allow=['http://gadgets.ndtv.com/tags/.*/page\-\d+']))]

Fun enginering

Search This Blog

Python RegEx - How do I form a regex that contains a hyphen inside a word -

Comments

Post a Comment