html - Python, Limiting search at a specific hyperlink on webpage -


I'm getting a way to download .pdf file through a hyperlink to a webpage.

The way is:

  import lxml.html, urllib2, urlparse base_url = 'http://www.renderx.com/demos/examples.html' Res = urllib2.urlopen (base_url) tree = Lxml.html .fromstring (res.read ()) ns = {'re': 'http://exslt.org/regular-expressions'} tree.xpath for node ( '// a [re: test (@hrref,' \ .pdf $ ',' i ')]', namespace = ns): print urlparse.urljoin (base_url, node.attrib ['href'])  

The question is, instead of listing all the PDFs on the webpage, how can I get a PDF only under specific hyperlinks?

There is a way, like:

  'CA-Personal PDF' in the node:  

But if the .pdf file is renamed What's going on? Or do I just want to limit the search to the webpage on the "app" hyperlink? Thank you.

OK, not the best way, but no harm:

 Import  beautiful soup from BS 4 import urllib2 domain = 'http://www.renderx.com' url = 'http://www.renderx.com/demos/examples.html' page = urllib2.urlopen ( Url) = beautiful soup for AP in the soup app (page. Read ()) = soup.find_all ('a', text = "application"): print domain + aa ['href']  

Comments

Popular posts from this blog

java - Can't add JTree to JPanel of a JInternalFrame -

javascript - data.match(var) not working it seems -

javascript - How can I pause a jQuery .each() loop, while waiting for user input? -