python - set returning duplicates? -
i'm pulling list of url's off census website putting them in set make sure don't end duplicates, exporting list of non-duplicate url's .csv file. however, set
continues return duplicate values, shouldn't possible. here's code:
import bs4 bs4 import beautifulsoup import requests import csv source_link = "https://www.census.gov/data/tables/2016/demo/popest/state-total.html" s = requests.get(source_link) usable_html = s.text setupsoup = beautifulsoup(usable_html, 'lxml') silver = csv.writer(open("wgucsv.csv", "r+")) silver.writerow(["url"]) set(gold) in setupsoup.findall('a', href=true): gold.add['href'] print (gold) silver.writerow(gold)
as bonus question, need way convert resulting relative url's absolute url's, preferably before sorting them non-duplicated list. thought adding them set
filter out duplicates on it's own.
if want add set, try
gold = set() x in setupsoup.findall('a', href=true): gold.add(x)
or more simply
gold = set(setupsout.findall('a', href=true))
Comments
Post a Comment