Skip to content

Conversation

@MImmesberger
Copy link
Collaborator

@MImmesberger MImmesberger commented Apr 16, 2025

What problem do you want to solve?

Users can easily create a NestedDataDict by providing a mapper from the paths used in the TTSIM instance to a column in the DataFrame or a pandas Series or a single value.

Todo:

  • Add input template for users. Args: date, NestedTargetDict
  • Refactor df_to_data_tree such that it calls compute_taxes_and_transfers directly.
  • Decide whether we want to allow pd.Series inputs and which types of data we want to allow for broadcasting. Probably we want to be way more restrictive than I currently am.

@MImmesberger
Copy link
Collaborator Author

MImmesberger commented Apr 25, 2025

@hmgaudecker
Regarding the input template: I think it makes sense to wait with this until we have broken up compute_taxes_and_transfers. We basically need to call dags.tree.create_input_structure_tree on the functions tree with partialled parameters.

Also, could you have a look at the type hints? I tried many versions but can't get it to work... When I run pytest, I get the error below. If it helps, you can look at the diff of 1accf2e (version without the type hints, works) and d8b859d (version with type hints).

ERROR: while parsing the following warning configuration:

  ignore::ttsim.compute_taxes_and_transfers.FunctionsAndColumnsOverlapWarning

This error occurred:

Traceback (most recent call last):
  File "/Users/marvin/GitHub/gettsim/.pixi/envs/default/lib/python3.12/site-packages/_pytest/config/__init__.py", line 1918, in parse_warning_filter
    category: type[Warning] = _resolve_warning_category(category_)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marvin/GitHub/gettsim/.pixi/envs/default/lib/python3.12/site-packages/_pytest/config/__init__.py", line 1956, in _resolve_warning_category
    m = __import__(module, None, None, [klass])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marvin/GitHub/gettsim/src/ttsim/__init__.py", line 15, in <module>
    from ttsim.prepare_data import create_data_tree_from_df
  File "/Users/marvin/GitHub/gettsim/src/ttsim/prepare_data.py", line 5, in <module>
    from ttsim.typing import NestedDataDict, NestedInputToSeriesNameDict
ImportError: cannot import name 'NestedDataDict' from 'ttsim.typing' (/Users/marvin/GitHub/gettsim/src/ttsim/typing.py)

Edit: (I removed the types from ttsim/__init__.py in c3d8656 because I thought the problem may be a circular import, but still no luck)

@codecov
Copy link

codecov bot commented Apr 25, 2025

Codecov Report

Attention: Patch coverage is 74.76636% with 27 lines in your changes missing coverage. Please review.

Project coverage is 82.84%. Comparing base (908e272) to head (5838bba).
Report is 1 commits behind head on collect-components-of-namespaces.

Files with missing lines Patch % Lines
src/ttsim/prepare_data.py 71.05% 11 Missing ⚠️
tests/ttsim/utils.py 41.17% 10 Missing ⚠️
src/_gettsim/interface.py 50.00% 6 Missing ⚠️
Additional details and impacted files
@@                         Coverage Diff                          @@
##           collect-components-of-namespaces     #876      +/-   ##
====================================================================
- Coverage                             83.07%   82.84%   -0.24%     
====================================================================
  Files                                   145      148       +3     
  Lines                                  5713     5787      +74     
====================================================================
+ Hits                                   4746     4794      +48     
- Misses                                  967      993      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the input template: I think it makes sense to wait with this until we have broken up compute_taxes_and_transfers. We basically need to call dags.tree.create_input_structure_tree on the functions tree with partialled parameters.

Yes!

Also, could you have a look at the type hints? I tried many versions but can't get it to work...

The trick is to use from __future__ import annotations. Here is what you.com came up with:

  1. Use from future import annotations

If you are using Python 3.7+ and want to keep the import inside the TYPE_CHECKING block, you can enable deferred evaluation of type annotations using from future import annotations. This makes all type hints lazy (i.e., they are not evaluated at runtime), which can help avoid runtime import issues. For example:

from ttsim.typing import NestedDataDict, NestedInputToSeriesNameDict


def quickrun(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def quickrun(
def oss(

For "one stop shop"? Not married to that, but it most definitely will be a very slow way of running (GE)TTSIM, so the name will have to change. (I know you imply "getting it to run quickly", but there is too much ambiguity IMO)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with oss; seems to be a common abbreviation: https://en.wikipedia.org/wiki/One-stop_shop

Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, looks great!!! And apologies for the piecemeal review, I pressed the wrong buttons in Cursor's interface.

Just a remark explaining my changes to imports:

  • Within the ttsim package, always use imports from modules themselves. I.e., never use from ttsim import ...
  • From tests/ttsim, use from ttsim import ... whenever possible, i.e., use module-specific imports only for objects not exported via the main ttsim namespace.
  • From anything that is gettsim, only ever use from ttsim import .... If tempted to use something else, we'll need to adjust the global namespace of ttsim. Exception: ttsim.typing.


# Specialise from dags' NestedInputDict to GETTSIM's types.
NestedInputToSeriesNameDict = Mapping[str, Any | "NestedInputToSeriesNameDict"]
NestedDataDict = Mapping[str, pd.Series | "NestedDataDict"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads-up that the current type NestedDataDict is nonsense (we never use series) and will change via #879.

Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thank you!

@hmgaudecker hmgaudecker merged commit 7ff0b24 into collect-components-of-namespaces Apr 26, 2025
5 of 9 checks passed
@hmgaudecker hmgaudecker deleted the df-to-tree branch April 26, 2025 17:10
@hmgaudecker
Copy link
Collaborator

(just to be sure, I think the two to-dos left in the original post should be tackled elsewhere)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants