python - Scrapy trying to crawl information internal links of webpage -
i trying crawl page jobstreet
i'm able crawl information on main page issue comes when i'm trying crawl internal links of page example first posting here
this snippet of code:
import scrapy scrapy.contrib.linkextractors.sgml import sgmllinkextractor scrapy.contrib.spiders import crawlspider, rule scrapy import item, field class it(scrapy.spider): name = 'it' allowed_domains = ["www.jobstreet.com.sg"] start_urls = [ 'https://www.jobstreet.com.sg/en/job-search/job-vacancy.php?key=&specialization=191%2c192%2c193&area=&salary=&ojs=3&src=12', ] rules = ( rule(sgmllinkextractor(allow=[r'/en/job/*.'], restrict_xpaths=('//*[(@class = "position-title-link")]',)), callback='parse_info', follow=true) ) def parse_info(self, response): self.logger.info('response.url=%s' % response.url)
not able sort of response parse_info.
you may change
scrapy.spider
to
crawlspider
Comments
Post a Comment