-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.1.0 #50
Conversation
Aggregate a set of records by facility IDs, using sum or count operations. Enables point symbol mapping. Other facilities in the selection (e.g. facilities in Snohomish County *without* reported CWA violations) can be identified and retrieved when the diff flag is True Parameters ---------- records : DataSetResults object The records to aggregate. records should be a DataSetResults object created from a database query. In the : ds = make_data_sets(["CWA Violations"]) # Create a DataSet for handling the data snohomish_cwa_violations = ds["CWA Violations"].store_results(region_type="County", region_value=("SNOHOMISH",) state="WA") # Store results for this DataSet as a DataSetResults object program : String The name of the program, usually available from records.dataset.name other_records : Boolean When True, will retrieve other facilities in the selection (e.g. facilities in Snohomish County *without* reported CWA violations) Returns ------- A dictionary containing: the aggregated results active facilities regulated under this program, but without recorded violations, inspections, or whatever the metric is (e.g. violations) the name of the new field that counts or sums up the relevant metric (e.g. violations)
minor fixes to variable names in example code
more minor fixes to example code
more minor changes to example code
add air emissions to `aggregate_by_facility` # Air emissions elif (program == "Greenhouse Gas Emissions" or program == "Toxic Releases"): data = data.groupby([records.dataset.idx_field, "FAC_NAME", "FAC_LAT", "FAC_LONG"]).agg({records.dataset.agg_col:'sum'}) data['sum'] = data[records.dataset.agg_col] data = data.reset_index() aggregator = "sum" # keep track of which field we use to aggregate data, which may differ from the preset
add `agg_col`, `agg_type` and `units` to DMRs
Fixes #57 ?
Fixes #53
attempt to deal with tabs/spaces issue
fix indendation?
debug sql and test choropleth (#55)
add aggregation to toxic releases table (this is tricky because of differing pollutants - but if the data are filtered to a specific pollutant, this makes sense) fix a problem with the units of GHG and TRI charts
fix error where unit was units in DMR data set presets
fix map in `choropleth()`
packaging branch is deleted point to main instead
units -> unit
add key_id argument to bring back together data separated for the choropleth
delete old choropleth variables
Currently when it is a list we mush it together to make it appear on charts. But that has some downstream consquences for DataSet.region_value. Instead, only smush together the multi-selections when we make charts or store DataSet.results
fix `get_active_facilities()` format string
zc_str
add the Jupyter Notebook for the tutorials
def differ(input, program): | ||
''' | ||
Helper function to sort facilities in this program (input) from the full list of faciliities regulated under the program (active) | ||
''' | ||
active = get_active_facilities(records.state, records.region_type, records.region_value ) | ||
|
||
diff = list( | ||
set(active[records.dataset.echo_type + "_IDS"]) - set(input[records.dataset.idx_field]) | ||
) | ||
|
||
# get rid of NaNs - probably no program IDs | ||
diff = [x for x in diff if str(x) != 'nan'] | ||
|
||
# ^ Not perfect given that some facilities have multiple NPDES_IDs | ||
# Below return the full ECHO_EXPORTER details for facilities without violations, penalties, or inspections | ||
diff = active.loc[active[records.dataset.echo_type + "_IDS"].isin(diff)] | ||
return diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a robust way of differentiating facilities because of the way EPA stores program IDs.
The differ function is meant to take data like:
all_facilities = [A, B, C, D, E]
facilities_with_inspections = [C, D]
and calculate:
facilities_without_inspections = [A, B, E]
But in reality, facility/program IDs are more like:
all_facilities = [A X, B Y, C, D Z, E]
facilities_with_inspections = [C, D]
So the resulting list of facilities without inspections would incorrectly be:
A X, B Y, D Z
even though D does have an inspection.
Just need a way to parse apart program IDs, ideally without having to call up the database to look at the EXP_PGM table.
add detailed notes about data interpretation
formatting quote/bullets
format example code
update tutorial notebook with interpretation and background sections of README
deletes `selector()` fixes #58
creates `choropleth()` fixes #55 This will break the missing data notebook, but we will make a note to fix that later
update zip code definition to match what's in the database
correct zip codes fields
change logic in `choropleth()`
fix `choropleth()` parameters
add tooltip to `choropleth()`
fixes #51 ?
quick fix on `choropleth()` tooltip
change numpy dependency to 1.23.5, which is Google Colab's default
These all look like good and important improvements to the package. |
Fixes #53, fixes #57, fixes #48, fixes #49, fixes #55, fixes #27, fixes #51, fixes #60, fixes #58
I'm proposing to repackage ECHO_modules as something that can be delivered through pypi (pip)
I don't think this will break much^ because the repo will still be installable as a package through
pip install git+url
^ replacing
state_choropleth_mapper()
withchoropleth()
will break the missing data notebook, but I can fix that afterwards.Other updates:
aggregate_by_facility()
function to support `point_mapper()state_choropleth_mapper()
withchoropleth()
reorganization
branchAfter merging, I will set this up with pypi (#49) and also change some settings here on GitHub to automate the process of sending packages to pypi.