Scrapy no such host crawler -

- April 15, 2012

I am using this crawler as my crawler

This 404 error is to catch the domain Designed to and save them. I wanted to modify it a bit and wanted to find it "no such host" error, which is error 12002.

However, with this code, scrappy is not receiving any response (because there is not a host back to a response) and when scrapi gives it such a domain encounter.

not found: [Errno 11001] getaddrinfo failed.

How can I not find this error and save the domain name?

Exception, and method, through the request and reaction objects, during processing of a request Is controlled.

All exceptions to the following (where IgnoreRequest has been raised) will be logged in the log file

  class ExceptionLog (object): def process_exception , Request, exception, spider): with open ('exceptions.log', 'a'): f.write (str (exception) + "\ n")

always To use the signals to call it, extend it to spider_opened for better file handling, or for settings from your settings.py file (such as a custom EXCEPTIONS_LOG =) () and spider_closed () ... ) .

Add this to your DOWNLOADER_MIDDLEWARES in your settings file where you note it while keeping it in the range of middleware, though! To stop the engine, and you can control the logging exceptions elsewhere. To remove from the engine, and you can try again the exceptions which can be tried again or otherwise resolved. Where you put it, will be based on where you will need it.

Search This Blog

Quick

Scrapy no such host crawler -

Comments

Post a Comment

Popular posts from this blog

mysql - How to enter php data into a html multiple select box -

java - Can't add JTree to JPanel of a JInternalFrame -

java - How to drag a JavaFX node and detect a drop event outside the JavaFX Windows? -