Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper doesn't run due to Census data unavailability #157

Open
sydeaka opened this issue Oct 15, 2020 · 1 comment
Open

Scraper doesn't run due to Census data unavailability #157

sydeaka opened this issue Oct 15, 2020 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@sydeaka
Copy link
Collaborator

sydeaka commented Oct 15, 2020

@nkrishnaswami

When I tried to run the scraper this evening, I got an error that this 2018 US Census Excel file no longer exists. The file also fails to load when I paste the URL directly into a web browser.

https://www2.census.gov/programs-surveys/popest/geographies/2018/all-geocodes-v2018.xlsx

Unfortunately this means we must halt daily scraper runs until this is resolved.

Do we have a local copy saved? Or, alternatively, could we modify the scraper so that it continues to pull the data while ignoring the unavailable Census file?

The error message is provided below.

2020-10-14 21:35:36,120 INFO covid19_scrapers.web_cache:  Connecting web cache to DB: work/web_cache.db
Traceback (most recent call last):
  File "run_scrapers.py", line 189, in <module>
    main()
  File "run_scrapers.py", line 165, in main
    registry_args=dict(enable_beta_scrapers=opts.enable_beta_scrapers),
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/__init__.py", line 61, in make_scraper_registry
    census_api = CensusApi(census_api_key)
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/census/census_api.py", line 31, in __init__
    self.fips = FipsLookup()
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/census/fips_lookup.py", line 22, in __init__
    df = pd.read_excel(get_content_as_file(self.CODES_URL), skiprows=4)
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/utils/http.py", line 100, in get_content_as_file
    return BytesIO(get_content(url, **kwargs))
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/utils/http.py", line 94, in get_content
    r = get_cached_url(url, **kwargs)
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/utils/http.py", line 59, in get_cached_url
    return UTILS_WEB_CACHE.fetch(url, **kwargs)
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/workflow/python/covid19_scrapers/web_cache.py", line 263, in fetch
    response.raise_for_status()
  File "/Users/poisson/Documents/GitHub/COVID19_tracker_data_extraction/covid19_data_test_003/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://www2.census.gov/programs-surveys/popest/geographies/2018/all-geocodes-v2018.xlsx
@sydeaka sydeaka added the bug Something isn't working label Oct 15, 2020
@sydeaka
Copy link
Collaborator Author

sydeaka commented Oct 15, 2020

Update: A few moments after I created the issue, subsequent refreshes of the website revealed a message saying that the system is down due to maintenance. Perhaps that explains why the file was unavailable. After a few additional moments, the file appeared to be back online and the scraper run resumed without incident.

I will leave this up so that we can work toward a solution that caches the 2018 data table and stores it in the repo for later reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants