Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLS scraper has inaccurate number of records calculation #259

Open
JackGilmore opened this issue Nov 10, 2023 · 0 comments
Open

NLS scraper has inaccurate number of records calculation #259

JackGilmore opened this issue Nov 10, 2023 · 0 comments
Labels
bug Something isn't working data engineering Things related to data: scraping, cleaning, labelling, transformation

Comments

@JackGilmore
Copy link
Member

Describe the bug
The NLS scraper appears to incorrectly calculate the number of records field on opendata.scot based on the file contents description which just gives you the number of files in a zip. It also judges this based on the first file upload on a page and doesn't take into account multiple file uploads.

To Reproduce
See fetch_num_recs() method in nls_scraper.py

Expected behavior
Number of records should reflect the number of records in a dataset (e.g. number of rows in a CSV)

Screenshots
image
image

Hardware and software used
N/A

Additional context
This functionality was patched out in 59dca44 but the function still remains

@JackGilmore JackGilmore added bug Something isn't working data engineering Things related to data: scraping, cleaning, labelling, transformation labels Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data engineering Things related to data: scraping, cleaning, labelling, transformation
Projects
Development

No branches or pull requests

1 participant