You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I tried to run the scraper this evening, I got an error that this 2018 US Census Excel file no longer exists. The file also fails to load when I paste the URL directly into a web browser.
Unfortunately this means we must halt daily scraper runs until this is resolved.
Do we have a local copy saved? Or, alternatively, could we modify the scraper so that it continues to pull the data while ignoring the unavailable Census file?
The error message is provided below.
2020-10-14 21:35:36,120 INFO covid19_scrapers.web_cache: Connecting web cache to DB: work/web_cache.db
Traceback (most recent call last):
File "run_scrapers.py", line 189, in <module>
main()
File "run_scrapers.py", line 165, in main
registry_args=dict(enable_beta_scrapers=opts.enable_beta_scrapers),
File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/__init__.py", line 61, in make_scraper_registry
census_api = CensusApi(census_api_key)
File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/census/census_api.py", line 31, in __init__
self.fips = FipsLookup()
File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/census/fips_lookup.py", line 22, in __init__
df = pd.read_excel(get_content_as_file(self.CODES_URL), skiprows=4)
File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/utils/http.py", line 100, in get_content_as_file
return BytesIO(get_content(url, **kwargs))
File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/utils/http.py", line 94, in get_content
r = get_cached_url(url, **kwargs)
File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/utils/http.py", line 59, in get_cached_url
return UTILS_WEB_CACHE.fetch(url, **kwargs)
File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/web_cache.py", line 263, in fetch
response.raise_for_status()
File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/covid19_data_test_003/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://www2.census.gov/programs-surveys/popest/geographies/2018/all-geocodes-v2018.xlsx
The text was updated successfully, but these errors were encountered:
Update: A few moments after I created the issue, subsequent refreshes of the website revealed a message saying that the system is down due to maintenance. Perhaps that explains why the file was unavailable. After a few additional moments, the file appeared to be back online and the scraper run resumed without incident.
I will leave this up so that we can work toward a solution that caches the 2018 data table and stores it in the repo for later reference.
@nkrishnaswami
When I tried to run the scraper this evening, I got an error that this 2018 US Census Excel file no longer exists. The file also fails to load when I paste the URL directly into a web browser.
https://www2.census.gov/programs-surveys/popest/geographies/2018/all-geocodes-v2018.xlsx
Unfortunately this means we must halt daily scraper runs until this is resolved.
Do we have a local copy saved? Or, alternatively, could we modify the scraper so that it continues to pull the data while ignoring the unavailable Census file?
The error message is provided below.
The text was updated successfully, but these errors were encountered: