Fetching a lot of urls in python with google app engine -
In my sub-category of RequestHandler, I am trying to get the URL range:
class GetStats (webapp2.RequestHandler): Deaf Post (self): Last page for page in category = 50 (1, final page): tmpurl = url + str (page) response = urllib2.urlopen (tmpurl, timeout = 5) html = response (read) # some parsing HTML heap.append (result_of_parsing) self.response.write (heap)
but it works with ~ 30 urls (page Is getting taller but it works). In more than 30 cases, I get an error:
Error: Server Error
The server encountered an error and did not complete your request Could go
Please try again in 30 seconds.
Is there any way to get too many URLs? Can be more optimal or mouth? To hundreds of pages?
Update:
I'm using prettySupup to parse every single page. I found this traceback in the GAI log:
traceback (most recent call final): file "/ base / data / home / runtime / python 27 / python 27_ lib / version / 1 / Google / appenagine / runtime /wsgi.py ", line 267, handle result = handler (dict.self._environ), self._StartResponse) file" / base / data / home / runtimes / python27 / python27_lib / versions / third_party / Webapp2-2.5.2 /webapp2.py ", line 1529, __call__ rv = self.router.dispatch (request, response) in the file" /base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5 .2 / webapp2. Py ", line 1278, default_dispatcher return path.handler_adapter (un Occlusion, feedback) "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py" file, line 1102, in __call__ returns handler. Dispatch () file "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, (* args, * * Kwargs) in the sending withdrawal method " /base/data/home/apps/s~gae/1.379703839015039430/main.py ", line 68, later hept = get_times (tmp_url, 160) file" /base/data/home/apps/s~gae/1.379703839015039430/ "Libs / bs4 / __ init__.py" in line 168, __init__ self._feed () file, line_6, in line 106, get_times soup = beautiful file (html) "libs / bs4 / _init__.py" 181, _feed self.builder.feed (self.markup) file "libs / bs4 / builder / _htmlparser.py", in line 4, Feed Super (HTMLParserTreeBuilder, itself) .feed (markup) file "/ base / data /home/runtimes/python27/python27_dist/lib/python2.7/HTMLParser.py ", line 114, feed (0) file in the self.goahead" / base / data / home / runtimes / python27 / python27_dist / lib / python2. 7 / HTMLParser.py ", line 155, goahead startswith = rawdata.startswith deadlineExceededError
Because you only have 60 seconds to return a response to the user it is to be unsuccessful and I think that going to do it is now taking that
You use this Want to:
There is a 10 minute time that to make a work out. You can then return to the user immediately and they can "lift" the subsequent results through the other handler (which you make). If all the URLs are collected then it takes 10 minutes, then you have to divide them into further tasks.
See this:
The reason for understanding is that you can not go for 60 seconds for a long time.
Comments
Post a Comment