Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell #12 should have a better presentation of the tabular data #68

Open
shansen5 opened this issue Jul 22, 2020 · 2 comments
Open

Cell #12 should have a better presentation of the tabular data #68

shansen5 opened this issue Jul 22, 2020 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers research-needed For issues that require some research (not just code)

Comments

@shansen5
Copy link
Collaborator

No description provided.

@Frijol
Copy link
Contributor

Frijol commented Oct 12, 2021

Curious what you have in mind here. Here's what I'm noticing:

  • Would be nice to have Markdown that spells out some codes, like what NPDES_ID means
  • Would be really nice to change or spell out some of these mysterious column names & contents, like what does "V" mean under "HLRNC"?
  • Split out Quarter in to a separate column from Year for clarity

@Frijol Frijol added enhancement New feature or request good first issue Good for newcomers research-needed For issues that require some research (not just code) labels Oct 12, 2021
@skybristol
Copy link

What I'm working on in the new ETL process (pulling data from EPA's downloads, transforming a little bit for use, and loading elsewhere - Postgres, etc.) should help with this. I'm working through the slight variation in how each of the datasets we are tapping are documented via web pages and PDF files to bring back the full descriptions of field names. I'm putting this into a technical encoding called JSONSchema that includes some extra technical details about the properties that will let us better validate the data values when we pull a fresh file. It seems like we should review through all of EPA's data documentation and then decide if we might have some value added annotation we could layer on to help people make better sense of the data. A lot of things like the various codes that tie things together are kind of hard to figure out without putting several pieces of information together, so we can probably shed some better light on this.

Technically, each distinct logical property in our transformation of the data will have an @id value in the JSONSchema structure. This will facilitate driving things like primary and foreign key relationships across the data in a SQL context like Postgres. I think it would be cool to incorporate extra annotations on top of this structure referring to the @id values. We might use the simplicity of yet another Google sheet somewhere to store and manage this information and then pull that into various presentations of the data we put online. If we do end up with more complex information than what we would put into a few sentences, we can look at referencing off to markdown files. It would be good to keep all the dots connected together between source documentation and our own additions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers research-needed For issues that require some research (not just code)
Projects
None yet
Development

No branches or pull requests

3 participants