node.js - Image urls change while scraping in Node (works in browser console) -
i'm using artoo.js web scraping reason scraped image url's change when working cheerio in node . i.e original image url :
"https://images-na.ssl-images-amazon.com/images/m/mv5bnwu4nmy3mtmtmtbmmi00njfjltkwmmitywzhzwuwndg5m2exxkeyxkfqcgdeqxvynduyotg3njg@._v1_sx300.jpg"
however after scraping url turns url:
"http://ia.media-imdb.com/images/g/01/imdb/images/nopicture/156x231/tv-3797070466._cb522736147_.png@._v1_sx300.jpg"
if scrape while in chrome browser console using artoo.js bookmark. url stays same original. why changing when use in node?.any suggestions
update: update: think found issue not solution. seems scraper method runs before correct images have loaded on page. changed url placeholder image. how can wait till entire page loads.
it may caused js code. if using request
+cheerio
scrap page. when make request in node js code nothing (it's not interpreted). getting original url before lib or piece of code changes it. try @ source code of page in browser crtl+u
. if it's "http://ia.media-imdb.com/images/g/01/imdb/images/nopicture/156x231/tv-3797070466._cb522736147_.png@._v1_sx300.jpg"
know piece of code doing change it.
edit
if absolutly need run js obtain url. sould use phantomjs
. it's headless browser. imaes load. can use directly nodejs
or if want simpler way go casperjs
. assume you're not used scraping complicated web apps. if it's case go casperjs. it's easy , job. it's not fast using request
+ cheerio
works. , can put code run on server.
Comments
Post a Comment