html - Python, Limiting search at a specific hyperlink on webpage -

- May 15, 2014

I'm getting a way to download .pdf file through a hyperlink to a webpage.

The way is:

  import lxml.html, urllib2, urlparse base_url = 'http://www.renderx.com/demos/examples.html' Res = urllib2.urlopen (base_url) tree = Lxml.html .fromstring (res.read ()) ns = {'re': 'http://exslt.org/regular-expressions'} tree.xpath for node ( '// a [re: test (@hrref,' \ .pdf $ ',' i ')]', namespace = ns): print urlparse.urljoin (base_url, node.attrib ['href'])

The question is, instead of listing all the PDFs on the webpage, how can I get a PDF only under specific hyperlinks?

There is a way, like:

  'CA-Personal PDF' in the node:

But if the .pdf file is renamed What's going on? Or do I just want to limit the search to the webpage on the "app" hyperlink? Thank you.

OK, not the best way, but no harm:

 Import  beautiful soup from BS 4 import urllib2 domain = 'http://www.renderx.com' url = 'http://www.renderx.com/demos/examples.html' page = urllib2.urlopen ( Url) = beautiful soup for AP in the soup app (page. Read ()) = soup.find_all ('a', text = "application"): print domain + aa ['href']

Search This Blog

Quick

html - Python, Limiting search at a specific hyperlink on webpage -

Comments

Post a Comment

Popular posts from this blog

mysql - How to enter php data into a html multiple select box -

java - Can't add JTree to JPanel of a JInternalFrame -

java - How to drag a JavaFX node and detect a drop event outside the JavaFX Windows? -