Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add state vs federal actions #25

Open
ericnost opened this issue Jun 3, 2020 · 5 comments
Open

Add state vs federal actions #25

ericnost opened this issue Jun 3, 2020 · 5 comments
Assignees
Labels
enhancement New feature or request hacktoberfest

Comments

@ericnost
Copy link
Member

ericnost commented Jun 3, 2020

See here for relevant fields: https://docs.google.com/document/d/1EWVy52R1Eqy5dc5mjXK7m5EOdVyBVXbI9hkL-ps4BlI/edit

Related to edgi-govdata-archiving/Environmental_Enforcement_Watch/issues/98

@ericnost ericnost self-assigned this Jun 3, 2020
@ericnost
Copy link
Member Author

ericnost commented Jun 4, 2020

Here is one way to do it:

Add an "agency_field" property to the data class. Add the appropriate agency_field for each of the tables (using the above doc). Then, for the aggregation and charts:

program_data['State'] = np.where(program_data[program.agency_field]=="S", 1,0)
program_data['Federal']= np.where(program_data[program.agency_field]=="E", 1,0)

g = program_data.groupby([pd.to_datetime(program_data[program.date_field], format=program.date_format)])[["State", "Federal"]].sum() #for each date, take the sum of state inspections and federal inspections
g = g.resample("Y").sum() #sum of inspections for the year
g.index = g.index.strftime('%Y') #pretty format
g

ax = g.plot(kind='bar', stacked=True, title = program.name, figsize=(20, 10), fontsize=16)
ax

Example output:
image

These feels rather hacky, creating new columns to count State inspections and Federal inspections. One thing to note here - and to always pay attention to - is using sum vs count to aggregate. In this case, I think we want to sum? This shows us the number of inspections, NOT the number of facilities inspected. If we wanted that, I think we would use count.

@ericnost
Copy link
Member Author

ericnost commented Jun 4, 2020

@shansen5 One thing I just noticed in the Cross-Program notebook. An error I probably made!

For the charts, we currently do this:

d = program_data.groupby(pd.to_datetime(program_data[program.date_field], format=program.date_format))[[program.date_field]].count()
d = d.resample("Y").count()
d.index = d.index.strftime('%Y')
        
ax = d.plot(kind='bar', title = chart_title, figsize=(20, 10), legend=False, fontsize=16)
 ax

But I think d = d.resample("Y").count() should be d = d.resample("Y").sum() We want to sum the number of inspections, enforcements, violations that occurred within the year.

Might have some relation to #22 though I think there we are seeing the problem before grouping and charting?

^ I believe this separate issue has been addressed

@ericnost ericnost changed the title Add state vs federal enforcement Add state vs federal actions Jun 4, 2020
@ericnost ericnost added the enhancement New feature or request label Jun 6, 2020
@ericnost
Copy link
Member Author

Challenges: for some things state/federal actions make sense to chart (inspections, enforcement actions, penalties), but for others (violations), not so much. This nuance just adds another layer of logic required for the notebook (if inspections/actions/penalties, chart state vs federal...else don't).

@ericnost
Copy link
Member Author

With respect to not all the info being in the same place for each program, it would be helpful for someone to list out for each of the following nine data tables:

  • The column in the CWA violations table that differentiates state vs federal actions (this may not exist)
  • The column in the CAA violations table that differentiates state vs federal actions (this may not exist)
  • The column in the RCRA violations table that differentiates state vs federal actions (this may not exist)
  • The column in the CWA inspections table that differentiates state vs federal actions (this may not exist) and the labels used for state vs federal actions (e.g. "S" and "E")
  • The column in the CAA inspections table that differentiates state vs federal actions (this may not exist) and the labels used for state vs federal actions (e.g. "S" and "E")
  • The column in the RCRA inspections table that differentiates state vs federal actions (this may not exist) and the labels used for state vs federal actions (e.g. "S" and "E")
  • The column in the CWA penalties table that differentiates state vs federal actions (this may not exist) and the labels used for state vs federal actions (e.g. "S" and "E")
  • The column in the CAA penalties table that differentiates state vs federal actions (this may not exist) and the labels used for state vs federal actions (e.g. "S" and "E")
  • The column in the RCRA penalties table that differentiates state vs federal actions (this may not exist) and the labels used for state vs federal actions (e.g. "S" and "E")

Some if not all of this is here: https://docs.google.com/document/d/1EWVy52R1Eqy5dc5mjXK7m5EOdVyBVXbI9hkL-ps4BlI/edit But it would be helpful to compile it based on my bullet points above.

Another thing to keep in mind is that it's more than just state vs federal - there are often local actions and tribal actions listed. How do we handle those? Do we just not count them?

What do we do about SDWA? That's another 3 or 4 tables. Maybe we should disable that because we don't have capacity to support it??

@ericnost
Copy link
Member Author

Specifically, we would need to modify at least the show_chart() function here: https://github.com/edgi-govdata-archiving/ECHO_modules/blob/b0a288e4cca2d3a46e5378fb51986c034d455c2c/DataSetResults.py#L22

If we wanted to be able to export CSVs that distinguish between state and federal actions, that would require something else....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hacktoberfest
Projects
None yet
Development

No branches or pull requests

2 participants