-
Notifications
You must be signed in to change notification settings - Fork 31
Interface: Add DataFrame to NestedDataDict conversion #876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hmgaudecker Also, could you have a look at the type hints? I tried many versions but can't get it to work... When I run ERROR: while parsing the following warning configuration:
ignore::ttsim.compute_taxes_and_transfers.FunctionsAndColumnsOverlapWarning
This error occurred:
Traceback (most recent call last):
File "/Users/marvin/GitHub/gettsim/.pixi/envs/default/lib/python3.12/site-packages/_pytest/config/__init__.py", line 1918, in parse_warning_filter
category: type[Warning] = _resolve_warning_category(category_)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marvin/GitHub/gettsim/.pixi/envs/default/lib/python3.12/site-packages/_pytest/config/__init__.py", line 1956, in _resolve_warning_category
m = __import__(module, None, None, [klass])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marvin/GitHub/gettsim/src/ttsim/__init__.py", line 15, in <module>
from ttsim.prepare_data import create_data_tree_from_df
File "/Users/marvin/GitHub/gettsim/src/ttsim/prepare_data.py", line 5, in <module>
from ttsim.typing import NestedDataDict, NestedInputToSeriesNameDict
ImportError: cannot import name 'NestedDataDict' from 'ttsim.typing' (/Users/marvin/GitHub/gettsim/src/ttsim/typing.py)Edit: (I removed the types from |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## collect-components-of-namespaces #876 +/- ##
====================================================================
- Coverage 83.07% 82.84% -0.24%
====================================================================
Files 145 148 +3
Lines 5713 5787 +74
====================================================================
+ Hits 4746 4794 +48
- Misses 967 993 +26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the input template: I think it makes sense to wait with this until we have broken up
compute_taxes_and_transfers. We basically need to calldags.tree.create_input_structure_treeon the functions tree with partialled parameters.
Yes!
Also, could you have a look at the type hints? I tried many versions but can't get it to work...
The trick is to use from __future__ import annotations. Here is what you.com came up with:
- Use from future import annotations
If you are using Python 3.7+ and want to keep the import inside the TYPE_CHECKING block, you can enable deferred evaluation of type annotations using from future import annotations. This makes all type hints lazy (i.e., they are not evaluated at runtime), which can help avoid runtime import issues. For example:
src/_gettsim/interface.py
Outdated
| from ttsim.typing import NestedDataDict, NestedInputToSeriesNameDict | ||
|
|
||
|
|
||
| def quickrun( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def quickrun( | |
| def oss( |
For "one stop shop"? Not married to that, but it most definitely will be a very slow way of running (GE)TTSIM, so the name will have to change. (I know you imply "getting it to run quickly", but there is too much ambiguity IMO)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with oss; seems to be a common abbreviation: https://en.wikipedia.org/wiki/One-stop_shop
hmgaudecker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, looks great!!! And apologies for the piecemeal review, I pressed the wrong buttons in Cursor's interface.
Just a remark explaining my changes to imports:
- Within the ttsim package, always use imports from modules themselves. I.e., never use
from ttsim import ... - From tests/ttsim, use
from ttsim import ...whenever possible, i.e., use module-specific imports only for objects not exported via the mainttsimnamespace. - From anything that is
gettsim, only ever usefrom ttsim import .... If tempted to use something else, we'll need to adjust the global namespace ofttsim. Exception:ttsim.typing.
|
|
||
| # Specialise from dags' NestedInputDict to GETTSIM's types. | ||
| NestedInputToSeriesNameDict = Mapping[str, Any | "NestedInputToSeriesNameDict"] | ||
| NestedDataDict = Mapping[str, pd.Series | "NestedDataDict"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a heads-up that the current type NestedDataDict is nonsense (we never use series) and will change via #879.
hmgaudecker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thank you!
7ff0b24
into
collect-components-of-namespaces
|
(just to be sure, I think the two to-dos left in the original post should be tackled elsewhere) |
What problem do you want to solve?
Users can easily create a NestedDataDict by providing a mapper from the paths used in the TTSIM instance to a column in the DataFrame or a pandas Series or a single value.
Todo:
df_to_data_treesuch that it callscompute_taxes_and_transfersdirectly.pd.Seriesinputs and which types of data we want to allow for broadcasting. Probably we want to be way more restrictive than I currently am.