Fetching a lot of urls in python with google app engine -

- August 15, 2010

In my sub-category of RequestHandler, I am trying to get the URL range:

  class GetStats (webapp2.RequestHandler): Deaf Post (self): Last page for page in category = 50 (1, final page): tmpurl = url + str (page) response = urllib2.urlopen (tmpurl, timeout = 5) html = response (read) # some parsing HTML heap.append (result_of_parsing) self.response.write (heap)

but it works with ~ 30 urls (page Is getting taller but it works). In more than 30 cases, I get an error:

Error: Server Error

The server encountered an error and did not complete your request Could go

Please try again in 30 seconds.

Is there any way to get too many URLs? Can be more optimal or mouth? To hundreds of pages?

Update:

I'm using prettySupup to parse every single page. I found this traceback in the GAI log:

  traceback (most recent call final): file "/ base / data / home / runtime / python 27 / python 27_ lib / version / 1 / Google / appenagine / runtime /wsgi.py ", line 267, handle result = handler (dict.self._environ), self._StartResponse) file" / base / data / home / runtimes / python27 / python27_lib / versions / third_party / Webapp2-2.5.2 /webapp2.py ", line 1529, __call__ rv = self.router.dispatch (request, response) in the file" /base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5 .2 / webapp2. Py ", line 1278, default_dispatcher return path.handler_adapter (un Occlusion, feedback) "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py" file, line 1102, in __call__ returns handler. Dispatch () file "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, (* args, * * Kwargs) in the sending withdrawal method " /base/data/home/apps/s~gae/1.379703839015039430/main.py ", line 68, later hept = get_times (tmp_url, 160) file" /base/data/home/apps/s~gae/1.379703839015039430/ "Libs / bs4 / __ init__.py" in line 168, __init__ self._feed () file, line_6, in line 106, get_times soup = beautiful file (html) "libs / bs4 / _init__.py" 181, _feed self.builder.feed (self.markup) file "libs / bs4 / builder / _htmlparser.py", in line 4, Feed Super (HTMLParserTreeBuilder, itself) .feed (markup) file "/ base / data /home/runtimes/python27/python27_dist/lib/python2.7/HTMLParser.py ", line 114, feed (0) file in the self.goahead" / base / data / home / runtimes / python27 / python27_dist / lib / python2. 7 / HTMLParser.py ", line 155, goahead startswith = rawdata.startswith deadlineExceededError

Because you only have 60 seconds to return a response to the user it is to be unsuccessful and I think that going to do it is now taking that

You use this Want to:

There is a 10 minute time that to make a work out. You can then return to the user immediately and they can "lift" the subsequent results through the other handler (which you make). If all the URLs are collected then it takes 10 minutes, then you have to divide them into further tasks.

See this:

The reason for understanding is that you can not go for 60 seconds for a long time.

Search This Blog

Quick

Fetching a lot of urls in python with google app engine -

Comments

Post a Comment

Popular posts from this blog

mysql - How to enter php data into a html multiple select box -

java - Can't add JTree to JPanel of a JInternalFrame -

java - How to drag a JavaFX node and detect a drop event outside the JavaFX Windows? -