Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Diabetes in Scotland as a source #230

Open
JackGilmore opened this issue Mar 11, 2023 · 1 comment
Open

Add Diabetes in Scotland as a source #230

JackGilmore opened this issue Mar 11, 2023 · 1 comment
Labels
data engineering Things related to data: scraping, cleaning, labelling, transformation new source Adding a new data source to the pipeline

Comments

@JackGilmore
Copy link
Member

Raising on the back of the following thread: https://twitter.com/NiVZ/status/1634199070548828168

The focus for this source would be to create dataset(s) for the annual Scottish Diabetes Survey

Link to the data source
https://www.diabetesinscotland.org.uk/publications/

Data source type (if known)
PDFs linked to from a static web page. File listing is in a HTML table.

Organization(s) the data belongs to
Diabetes in Scotland

What licences are applied to the data being published? (if known)
None specified 😢

Additional information
Add any other additional information that may be relevant.

@JackGilmore JackGilmore added data engineering Things related to data: scraping, cleaning, labelling, transformation back end new source Adding a new data source to the pipeline labels Mar 11, 2023
@paul-bradbeer
Copy link

Could possibly use AWS Textract https://aws.amazon.com/textract/ to pull tabular data from PDFs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data engineering Things related to data: scraping, cleaning, labelling, transformation new source Adding a new data source to the pipeline
Projects
Development

No branches or pull requests

3 participants