Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move /data to a repo of its own #228

Open
5 tasks
KarenJewell opened this issue Feb 18, 2023 · 0 comments
Open
5 tasks

Move /data to a repo of its own #228

KarenJewell opened this issue Feb 18, 2023 · 0 comments
Labels
data engineering Things related to data: scraping, cleaning, labelling, transformation

Comments

@KarenJewell
Copy link
Member

Is your feature request related to a problem? Please describe.
Currently all data is being stored in /data and updated with the weekly pipeline refreshes. This is the output location of all scrapers but also the input for eventual dataset listings on opendata.scot.

However, the constant change in /data is causing unnecessary merge conflicts in development even when the actual code base hasn't changed.

Describe the solution you'd like

  • Create a new repo for /data and use it as the storage location.
  • All source scrapers to write to new repo
  • merge_data.py to read from new repo
  • update pipeline to write to new repo
  • delete /data in the_od_bods

Describe alternatives you've considered
The alternative is not to write and store any data at all in intermediary steps. But this is a big change in the way we process, not helpful for debugging and the loss of the intermediary step may be unhelpful for new contributors.

Additional context
None

@KarenJewell KarenJewell moved this from Backlog to Todo in Open Data Scotland 2024 Feb 18, 2023
@KarenJewell KarenJewell added data engineering Things related to data: scraping, cleaning, labelling, transformation and removed back end labels Sep 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data engineering Things related to data: scraping, cleaning, labelling, transformation
Projects
Development

No branches or pull requests

1 participant