v0.1.0 #50

ericnost · 2024-01-24T04:25:30Z

Fixes #53, fixes #57, fixes #48, fixes #49, fixes #55, fixes #27, fixes #51, fixes #60, fixes #58

I'm proposing to repackage ECHO_modules as something that can be delivered through pypi (pip)

I don't think this will break much^ because the repo will still be installable as a package through pip install git+url

^ replacing state_choropleth_mapper() with choropleth() will break the missing data notebook, but I can fix that afterwards.

Other updates:

adds aggregate_by_facility() function to support `point_mapper()
replaces state_choropleth_mapper() with choropleth()
expands the README
provides a notebook full of tutorials/example code and queries
basically incorporates useful tools from the work I did on the reorganization branch

After merging, I will set this up with pypi (#49) and also change some settings here on GitHub to automate the process of sending packages to pypi.

Aggregate a set of records by facility IDs, using sum or count operations. Enables point symbol mapping. Other facilities in the selection (e.g. facilities in Snohomish County *without* reported CWA violations) can be identified and retrieved when the diff flag is True Parameters ---------- records : DataSetResults object The records to aggregate. records should be a DataSetResults object created from a database query. In the : ds = make_data_sets(["CWA Violations"]) # Create a DataSet for handling the data snohomish_cwa_violations = ds["CWA Violations"].store_results(region_type="County", region_value=("SNOHOMISH",) state="WA") # Store results for this DataSet as a DataSetResults object program : String The name of the program, usually available from records.dataset.name other_records : Boolean When True, will retrieve other facilities in the selection (e.g. facilities in Snohomish County *without* reported CWA violations) Returns ------- A dictionary containing: the aggregated results active facilities regulated under this program, but without recorded violations, inspections, or whatever the metric is (e.g. violations) the name of the new field that counts or sums up the relevant metric (e.g. violations)

minor fixes to variable names in example code

more minor fixes to example code

more minor changes to example code

add air emissions to `aggregate_by_facility` # Air emissions elif (program == "Greenhouse Gas Emissions" or program == "Toxic Releases"): data = data.groupby([records.dataset.idx_field, "FAC_NAME", "FAC_LAT", "FAC_LONG"]).agg({records.dataset.agg_col:'sum'}) data['sum'] = data[records.dataset.agg_col] data = data.reset_index() aggregator = "sum" # keep track of which field we use to aggregate data, which may differ from the preset

add `agg_col`, `agg_type` and `units` to DMRs

Fixes #57 ?

Fixes #53

attempt to deal with tabs/spaces issue

fix indendation?

debug sql and test choropleth (#55)

add aggregation to toxic releases table (this is tricky because of differing pollutants - but if the data are filtered to a specific pollutant, this makes sense) fix a problem with the units of GHG and TRI charts

fix error where unit was units in DMR data set presets

fix map in `choropleth()`

packaging branch is deleted point to main instead

units -> unit

add key_id argument to bring back together data separated for the choropleth

delete old choropleth variables

Currently when it is a list we mush it together to make it appear on charts. But that has some downstream consquences for DataSet.region_value. Instead, only smush together the multi-selections when we make charts or store DataSet.results

fix `get_active_facilities()` format string

zc_str

add the Jupyter Notebook for the tutorials

ericnost · 2024-01-25T17:03:06Z

ECHO_modules/utilities.py

+  def differ(input, program):
+    '''
+    Helper function to sort facilities in this program (input) from the full list of faciliities regulated under the program (active)
+    '''
+    active = get_active_facilities(records.state, records.region_type, records.region_value )
+
+    diff = list(
+        set(active[records.dataset.echo_type + "_IDS"]) - set(input[records.dataset.idx_field])
+        ) 
+
+    # get rid of NaNs - probably no program IDs
+    diff = [x for x in diff if str(x) != 'nan']
+
+    # ^ Not perfect given that some facilities have multiple NPDES_IDs
+    # Below return the full ECHO_EXPORTER details for facilities without violations, penalties, or inspections
+    diff = active.loc[active[records.dataset.echo_type + "_IDS"].isin(diff)] 
+    return diff


This isn't a robust way of differentiating facilities because of the way EPA stores program IDs.

The differ function is meant to take data like:
all_facilities = [A, B, C, D, E]
facilities_with_inspections = [C, D]
and calculate:
facilities_without_inspections = [A, B, E]

But in reality, facility/program IDs are more like:
all_facilities = [A X, B Y, C, D Z, E]
facilities_with_inspections = [C, D]
So the resulting list of facilities without inspections would incorrectly be:
A X, B Y, D Z
even though D does have an inspection.

Just need a way to parse apart program IDs, ideally without having to call up the database to look at the EXP_PGM table.

add detailed notes about data interpretation

formatting quote/bullets

format example code

update tutorial notebook with interpretation and background sections of README

deletes `selector()` fixes #58

creates `choropleth()` fixes #55 This will break the missing data notebook, but we will make a note to fix that later

update zip code definition to match what's in the database

correct zip codes fields

change logic in `choropleth()`

fix `choropleth()` parameters

add tooltip to `choropleth()`

fixes #51 ?

quick fix on `choropleth()` tooltip

change numpy dependency to 1.23.5, which is Google Colab's default

shansen5 · 2024-01-30T05:12:23Z

These all look like good and important improvements to the package.

ericnost added 3 commits January 23, 2024 17:40

ready for package distribution

545d18d

Update utilities.py

c1a224b

ericnost mentioned this pull request Jan 24, 2024

test flexible dependencies #51

Closed

ericnost added 3 commits January 24, 2024 13:30

Update README.md

5279ece

minor fixes to variable names in example code

Update README.md

5adf09d

more minor fixes to example code

Update README.md

9fcb3bb

more minor changes to example code

ericnost mentioned this pull request Jan 24, 2024

Add bounds to point mapper #27

Closed

ericnost and others added 22 commits January 24, 2024 18:15

Update data_set_presets.py

24c5048

add `agg_col`, `agg_type` and `units` to DMRs

Update get_data.py

a505fd5

Fixes #57 ?

Update utilities.py

ab2e0f1

Fixes #53

Update utilities.py

adc5d2d

attempt to deal with tabs/spaces issue

Update utilities.py

5492474

fix indendation?

Update utilities.py

1315f60

debug sql and test choropleth (#55)

update GHG and TRI

e7a71af

add aggregation to toxic releases table (this is tricky because of differing pollutants - but if the data are filtered to a specific pollutant, this makes sense) fix a problem with the units of GHG and TRI charts

Update data_set_presets.py

172bead

fix error where unit was units in DMR data set presets

Update utilities.py

93fbf06

fix map in `choropleth()`

Update utilities.py

e023828

packaging branch is deleted point to main instead

Update DataSetResults.py

d84332b

units -> unit

fix choropleth()

1338219

add key_id argument to bring back together data separated for the choropleth

Update utilities.py

c6a5419

delete old choropleth variables

temporary debugging

5b4c20c

reorder the flow of region_value variable

ca5787b

Currently when it is a list we mush it together to make it appear on charts. But that has some downstream consquences for DataSet.region_value. Instead, only smush together the multi-selections when we make charts or store DataSet.results

Update utilities.py

afd8290

fix `get_active_facilities()` format string

Update DataSetResults.py

fb40dff

Update utilities.py

693da83

zc_str

format string in get_active_facilities...

9c8fa78

de-debug

c9f3367

Create ECHO_modules_Tutorials.ipynb

90b352f

add the Jupyter Notebook for the tutorials

ericnost commented Jan 25, 2024

View reviewed changes

ericnost mentioned this pull request Jan 25, 2024

Differ() isn't a robust way of differentiating facilities because of the way EPA stores program IDs. #62

Open

ericnost added this to the v0.1.0 milestone Jan 25, 2024

ericnost added 17 commits January 28, 2024 11:39

Update README.md

8582fd9

add detailed notes about data interpretation

Update README.md

54d8115

formatting quote/bullets

Update README.md

7f25798

format example code

Update ECHO_modules_Tutorials.ipynb

e0685ec

update tutorial notebook with interpretation and background sections of README

Update get_data.py

150900a

deletes `selector()` fixes #58

Update utilities.py

3658434

creates `choropleth()` fixes #55 This will break the missing data notebook, but we will make a note to fix that later

Update data_set_presets.py

769ee53

update zip code definition to match what's in the database

Update geographies.py

f340874

correct zip codes fields

Update utilities.py

9c15f48

change logic in `choropleth()`

Update utilities.py

eab4d01

fix `choropleth()` parameters

Update utilities.py

a9e71f9

add tooltip to `choropleth()`

re-package

265480a

Create .gitignore

cca1058

update requirements to >=

779b91c

fixes #51 ?

Update utilities.py

c5ad01a

quick fix on `choropleth()` tooltip

Update pyproject.toml

a3bf506

change numpy dependency to 1.23.5, which is Google Colab's default

distributions

65d50b5

ericnost marked this pull request as ready for review January 28, 2024 19:02

ericnost requested a review from shansen5 January 28, 2024 19:02

shansen5 merged commit 200c9bd into main Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0 #50

v0.1.0 #50

ericnost commented Jan 24, 2024 •

edited

Loading

ericnost Jan 25, 2024

shansen5 commented Jan 30, 2024

v0.1.0 #50

v0.1.0 #50

Conversation

ericnost commented Jan 24, 2024 • edited Loading

ericnost Jan 25, 2024

Choose a reason for hiding this comment

shansen5 commented Jan 30, 2024

ericnost commented Jan 24, 2024 •

edited

Loading