Download picture from a java webpage with login using R -
for record: never scraped webpage before , don't have html knowledge.
i need download thousand of pictures urls require login. 6 months ago used able download single image simply:
url <- "https://customer.roamler.com/image/71138bc0-0ff3-405d-9406-5e643b018f7f" download.file(url = url,destfile = "myimage.jpg",mode = "wb",method = "libcurl") i don't know if @ time computer had different configuration fact code above doesn't work anymore. when use myimage.jpg not picture, it's source code (saved jpg). thought "maybe need scrape page in order image". that's why tried rvest forms , sessions in order login.
require("rvest") session <- html_session(url) form <- html_form(session)[[1]] but got error
> form <- html_form(session)[[1]] error in html_form(session)[[1]] : subscript out of bounds and realized session had different url 1 had provided
> session <session> https://customer.roamler.com/ status: 200 type: text/html; charset=utf-8 size: 10043 when read source code url before logging account don't find login form in it. funny thing when login page loaded , inspect (ctrl+shift+i on windows+chrome) find login form (xpath='//*[@id="login-form"]/form'). conclusion needed wait page load before getting login node , pass credentials. tried rselenium
require("rselenium") driver<- rsdriver(browser = "phantomjs") remdr <- driver[["client"]] remdr$open() remdr$setimplicitwaittimeout(milliseconds = 10000) remdr$navigate(url) now url seems kept correctly when @ page source still loading page source (the 1 without login form)
> remdr$getcurrenturl() [[1]] [1] "https://customer.roamler.com/#/login?returnurl=%2fimage%2f71138bc0-0ff3-405d-9406-5e643b018f7f" > remdr$findelements(using = "xpath", '//*[@id="login-form"]/form') list() can me out? don't know try next , feel i'm loosing time should relatively easy (download picture url).
Comments
Post a Comment