php - Parsing with Goutte - how to target an element after one containing a text string -
i using https://github.com/friendsofphp/goutte parse , extract data , doing well...
but stumbled upon unfriendly spot:
<tr> <th>website:</th> <td> <a href="http://www.adres.com" target="_blank">http://www.adres.com</a> </td> </tr> i trying text td element follows th element contains specific string, website: in case.
my php looks this:
$client3 = new \goutte\client(); $crawler3 = $client3->request('get', $supplierurl . 'contactinfo.html'); if($crawler3->filter('th:contains("+website+") + td a')->count() > 0) { $parsed_company_website_url = $crawler3->filter('th:contains("website:") + td')->text(); } else { $parsed_company_website_url = null; } return $parsed_company_website_url; problem
my code doesn't work.
attempts- i tried using both
"+website+","website:" - i tried smart targeting counting rows of table, each db entry on target site arranges items differently, no reliable pattern.
to do
make script extract text a
seems contains() jquery feature , not css selector. css, may inspect attribute value not text node inside markup.
so, in case, use xpath selector, especially: following-sibling (see https://stackoverflow.com/a/29380551/1997849)
Comments
Post a Comment