python - Scrapy trying to crawl information internal links of webpage -

June 15, 2012

i trying crawl page jobstreet

i'm able crawl information on main page issue comes when i'm trying crawl internal links of page example first posting here

this snippet of code:

import scrapy  scrapy.contrib.linkextractors.sgml import sgmllinkextractor scrapy.contrib.spiders import crawlspider, rule scrapy import item, field  class it(scrapy.spider):     name = 'it'      allowed_domains = ["www.jobstreet.com.sg"]     start_urls = [         'https://www.jobstreet.com.sg/en/job-search/job-vacancy.php?key=&specialization=191%2c192%2c193&area=&salary=&ojs=3&src=12',     ]      rules = (         rule(sgmllinkextractor(allow=[r'/en/job/*.'], restrict_xpaths=('//*[(@class = "position-title-link")]',)), callback='parse_info', follow=true)     )       def parse_info(self, response):          self.logger.info('response.url=%s' % response.url)

not able sort of response parse_info.

you may change

scrapy.spider

crawlspider

Search This Blog

Force Net

python - Scrapy trying to crawl information internal links of webpage -

Comments

Post a Comment

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -