-
Notifications
You must be signed in to change notification settings - Fork 31
Namespaces #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Namespaces #804
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #804 +/- ##
==========================================
- Coverage 87.82% 78.03% -9.80%
==========================================
Files 56 180 +124
Lines 3976 8039 +4063
==========================================
+ Hits 3492 6273 +2781
- Misses 484 1766 +1282 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This was referenced Jan 1, 2025
This PR adds the namespace infrastructure to GETTSIM. - [x] Write `policy_function` decorator (rename `policy_info` and change behavior so that a `PolicyFunction` instance is returned). ~Apply to all TT functions.~ (that should be part of renamings) - [x] Check that functions in module with same simple_name have the correct start_date, end_date specs (this was removed from the policy_info decorator). - [x] Remove doubled levels in the functions tree automatically (to avoid writing functions in `__init__.py`). - [x] Go over type hints for aggregation functions. - [x] Refactor interface module. - [x] Implement some safety checks - [x] No function should have the same name as a module in the same directory - [x] No trailing underscores in module names (for [DAGS PR](OpenSourceEconomics/dags#17)) --------- Co-authored-by: Marvin Immesberger <[email protected]> Co-authored-by: Tim Mensinger <[email protected]> Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
The way we implemented the loading of namespaces in #780 does not quite work. We want to have them at the directory level to balance use of namespaces and reducing the amount of qualified names. Additionally, we had to change the order of the upsert operations in `combine_policy_functions_and_derived_functions`. Doesn't affect the happy path, but in case of conflicts the previous behaviour did not make sense. --------- Co-authored-by: Marvin Immesberger <[email protected]>
2 tasks
This reverts commit fd2d696.
### What problem do you want to solve? Uses the qualified name instead of the leaf name to look for rounding specs in the params file. This is a temporary solution until we have tackled #823.
### What problem do you want to solve? This PR provides the necessary renamings of taxes and transfers functions for #804. ToDo: - [x] Create new directory structure - [x] Rename all function arguments - [x] Set namespace of basic input variables - [x] Update `pyproject.toml` to reflect new file structure - [x] Make sure tests run (#841) - [x] `kinderfreibetragempfänger` $\rightarrow$ `kinderfreibetragsempfänger` - [x] Link issue #842 in relevant docstrings --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
### What problem do you want to solve? This PR implementes the distinction between TTSIM (basically the infrastructure) and DE (the German taxes and transfers) components of GETTSIM. This was discussed [here](#780 (comment)). In particular, I - Move modules from `_gettsim` to `ttsim/` or leave them in `_gettsim` - Remove the `taxes` and `transfers` subdirs - Split up `config.py` into a TTSIM and a DE part - Adjust the loader accordingly - Also split up tests in TTSIM and DE parts. - Introduce quarters For tests, the distinction is not always super sharp. There are some tests that test a specific feature of the infrastructure (e.g. vectorization), but do this by loading the functions tree from the DE part. Still, I chose to label those tests as `ttsim`. Similarly, we don't test `aggregate_by_p_id` directly in the `ttsim` part, but do it by testing specific components of the TT system. I put them in the `de` dir. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Hans-Martin von Gaudecker <[email protected]> Co-authored-by: Tim Mensinger <[email protected]>
### What problem do you want to solve? Will close #852. Adapts tests to match GETTSIM src structure.
### What problem do you want to solve? This PR makes a step towards separating TTSIM and GETTSIM by testing the TTSIM infrastructure with its own instance of a fictitious taxes and transfers system that makes use of all features. --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]> Co-authored-by: Tim Mensinger <[email protected]>
`fg_id` creation did not work correctly for some orderings of adults (#801). Now adds fg_id for both the einstandspartner and his children at the same time. - [x] Fix loop - [x] Add test case for special case mentioned in #801 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This is a huge PR, which started innocently as a fix to #833. In the end, it turned out to be very difficult to change things locally, so in the process of the intense sprint during the week 7-12 April 2025, this ended up including the following: - Updated type hierarchy (`TTSIMObject` as basic building block, `PolicyInput` and `TTSIMFunction` inheriting from that, `TTSIMFunction` has further subclasses for policy, aggregation, ...). - Further separation of tests in ttsim / _gettsim. Including Middle Earth Taxes an Transfers SIMulator METTSIM as tiny example for the ttsim-side of tests (#856) and sensible structure for `_gettsim_tests` (#858) - Sensible treatment of Einnahmen / Einkünfte (#862) - Specify rounding in a dataclass to be provided in the decorators rather than referencing the yaml files from there (#859) - Improve structure for AggregationSpecs, including an Enum for the type of Aggregations (#860) --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]> Co-authored-by: Marvin Immesberger <[email protected]> Co-authored-by: Marvin Immesberger <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
# What problem do you want to solve? Unify handling of dates to remove ambiguity and code duplication. --------- Co-authored-by: Marvin Immesberger <[email protected]>
We put some effort into trying to convert types. However, the code was a mess and it would be a pain to maintain it. What Python/Pandas/Numpy/Jax do is more than good enough for GETTSIM, too. Now that we have the explicitly annotated `policy_inputs`, it will be easy to check and throw errors if users want to be strict. This PR removes the code which has been stale for the last week, anyhow. --------- Co-authored-by: Marvin Immesberger <[email protected]>
### What problem do you want to solve? Fix #870 and related things. In particular, defer some checks so that they are only done for variables that are present / set start/end dates of explicit aggregation functions so they are derived from source object. --------- Co-authored-by: Marvin Immesberger <[email protected]> Co-authored-by: Tim Mensinger <[email protected]> Co-authored-by: Max Jahn <[email protected]>
### What problem do you want to solve? Tests in `test_jax_jit_kindergeld.py` were failing because policy functions were not jittable. ### Problems and Solutions #### Non-Hashable Function in `jit` The policy functions were non-jittable because the dataclasses were non-frozen and had the equality argument set to True. This implies that the dataclass get an equality method which compares the fields. To not break the equality/hash contract (a == b implies hash(a) == hash(b)), a dataclass with equality method that is not frozen has a deactivated hash. This does not work with `jax.jit`, because for caching JAX requires a hash of the object. By freezing the dataclasses they get their hash back, and everything works nicely again with JAX. > [!NOTE] > Frozen dataclasses cannot have standard assignments in the post init method. For this I had to implement a `frozen_safe_update_wrapper`. ### Todo - [x] Freeze ttsim_objects dataclasses and update post init of `TTSIMFunction` to be compatible - [x] Understand why list `single_test` in `kindergeld_policy_test` fixture has only one entry, although the yaml file says there are two outputs - [x] Fix `test_compute_taxes_and_transfers_kindergeld` --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
In limited set of experiments, it produced exactly the same result. `ast.unparse` is available since Python 3.9, so it's fine to use.
- [x] Add a json (yaml) schema based on GEP-03 - [x] Make sure manual validation of parameters passes - [x] make a pre-commit hook out of this
### What problem do you want to solve? Closes #1025 ### Todo Add tests via `main` for - [x] input_data_tree_is_invalid - [x] environment_is_invalid - [x] input_df_mapper_columns_missing_in_df - [x] targets_tree_is_invalid --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
### What problem do you want to solve? Enable - [x] `INP001` (implicit namespace packages without init.) - [x] `PLR2004` (Magic values used in comparison) - [x] `PT006` (Allows only lists of tuples in parametrize, even if single argument) - [x] `PT007` (wrong type in parametrize) - [x] `S101` (use of asserts outside of tests) - [x] some more checks on individual files --------- Co-authored-by: Marvin Immesberger <[email protected]>
### What problem do you want to solve? Closes #893 Changes: - Change namespace to Einkommensteuer/Einkünfte/Sonstige/Renten - Add three types of private pension income: gefördert / betrieblich / regulär - Implement the current state of law regarding their treatment for SV contributions and taxation Issue for historical support of TT rules: #1030
…lts are requested (#1031) ### What problem do you want to solve? Closes #1006 --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
### What problem do you want to solve? `processed_data` uses an $O(n^2)$ approach to link original and internal IDs. This PR implements an $O(n\cdot \log(n))$ approach. ## Benchmarks ### On `gep-07` (3525917): ```cmd ==================================================================== SUMMARY TABLE ==================================================================== Dataset numpy_time numpy_hash jax_time jax_hash -------------------------------------------------------------------- df_5000.parquet 1.2681 13106402 15.5897 bf85cb3d df_10000.parquet 4.6791 308ca129 30.7932 57ba7579 df_20000.parquet 15.7451 51e8d0b4 62.4070 21636ea4 df_40000.parquet 54.0340 6ae704d8 137.1975 30bbf3ea ``` ### This PR: **[EDIT: updated results after cf37b75]** ```cmd ==================================================================== SUMMARY TABLE ==================================================================== Dataset numpy_time numpy_hash jax_time jax_hash -------------------------------------------------------------------- df_5000.parquet 0.0378 13106402 0.8950 bf85cb3d df_10000.parquet 0.0402 308ca129 0.8108 57ba7579 df_20000.parquet 0.1107 51e8d0b4 1.1354 21636ea4 df_40000.parquet 0.0853 6ae704d8 1.8208 30bbf3ea ``` The benchmark essentially runs ```python result = main( date_str=None, input_data=InputData.df_and_mapper( df=data, mapper=MAPPER, ), main_targets=[MainTarget.processed_data], tt_targets=TTTargets(tree=TT_TARGETS), backend=backend, ) ``` on the targets defined in `interface_playground.ipynb` with differently sized datasets that replicate the example household from the same notebook `N` times (i.e., `N*3` persons in each dataset). The hashes demonstrate that this PR creates `result` objects that are identical to the ones created with the $O(n^2)$ approach. To reproduce the benchmarks: - Run `make_data.py` (see attached .zip) to create example datasets - Run `benchmark_comparison.py` to create tables above [benchmark.zip](https://github.com/user-attachments/files/21327575/benchmark.zip) --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]> Co-authored-by: mj023 <[email protected]>
### What problem do you want to solve? Clarifies the meaning of `ist_selbstständig` by renaming to `ist_hauptberuflich_selbstständig` as discussed in #892.
… exemptions to social insurance contributions (#1032)
#1035) Following feedback on GEP 7, we got rid of the `date` / `date_str` inputs to main. Instead: - `policy_date` / `policy_date_str` is required to set up the policy environment and will be stored in there. - `evaluation_date` / `evaluation_date_str` is an optional input to `main` The evaluation date will be used from the following sources: 1. If present in the input data, that will be used 2. Unless 1., the variable passed to `main` will be used 3. Unless 2., the policy date will be used If more than one option is specified, a warning will be issued. --------- Co-authored-by: Marvin Immesberger <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
### What problem do you want to solve? This PR gets the input names (+docstrings) of "vorjahr" and similar inputs straight and puts them in the correct namespaces.
### What problem do you want to solve? Closes #757 In fact, any occurences of automatically created cycles reported in #757 have already been solved via the namespaces structure. In ALG2 we have ```python @policy_function(start_date="2005-01-01") def wohnfläche( wohnen__wohnfläche_hh: float, anzahl_personen_hh: int, ) -> float: """Share of household's dwelling size attributed to a single person.""" return wohnen__wohnfläche_hh / anzahl_personen_hh ``` So we don't create a cycle anymore as we have `wohnen__wohnfläche -> wohnen__wohnfläche_hh` but `wohnen__wohnfläche_hh -> arbeitslosengeld_2__wohnfläche`. Still, thanks to the work done by Lars in the past, a general TTSIM solution was easy to implement because it just copies the logic done for time-conversion functions.
*Leaving the almost-unedited stuff from Claude Code here for demonstration purposes* ## Summary - Implements `copy_environment` function to address issue #1008 - Provides proper copying of policy environments containing unpickleable function objects - Available in both `ttsim` and `gettsim` namespaces with full type safety ## Implementation Details - **Function**: Uses `optree.tree_map(copy, tree)` for shallow copying of tree leaves while recreating structure - **Type Safety**: Proper type hints with `@overload` decorators for `PolicyEnvironment` and specialized environment types - **Error Handling**: Solves `copy.deepcopy` failures on policy environments containing function objects ## Key Features - ✅ **Type-safe copying** with specific return types for each environment type - ✅ **Independence guarantee** - modifications to copy don't affect original - ✅ **Performance optimized** using optree for tree operations - ✅ **Comprehensive testing** with human-readable test functions - ✅ **Full documentation** with examples and technical explanations ## Usage Example ```python from gettsim import main, copy_environment, MainTarget from ttsim.tt_dag_elements.param_objects import ScalarParam # Load and copy policy environment policy_env = main(date_str="2025-01-01", main_target=MainTarget.policy_environment) copied_env = copy_environment(policy_env) # Modify copy without affecting original copied_env["sozialversicherung"]["rente"]["beitrag"]["beitragssatz"] = ScalarParam(value=0.3) ``` ## Test Coverage - ✅ Single parameter copying - ✅ Nested dictionary structures - ✅ Full policy environment integration - ✅ Error conditions and edge cases - ✅ Type inference verification - ✅ Independence testing Closes #1008 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Marvin Immesberger <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
cb927be to
cbf95c1
Compare
This reverts commit 6f12a59.
### What problem do you want to solve? Closes #999 Fails if any `param_function` depends on a `ColumnObject`, with the exception being `evaluation_x` and `policy_x` (they are `PolicyInputs`). --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
Fixes #869 - Raise an error when type-conversion calls like float, int, etc. are encountered - Raise an error when augmented assignment (+= and -= and *= and /= and friends) are encountered - Remove some tests that actually relied on that behavior, make test numbering consistent.
Add `fail_if.backend_has_changed`. Lessons learned: - Numpy can handle Jax arrays (see test) - Jax can handle NumPy arrays that are passed as the processed data (see test) - The problematic case are parameters that are partialled to functions. Unfortunately, these are typically custom objects. We to loop over them and check whether any of them happens to be a numpy array
(#1048) Check whether the structure of the paths matches. E.g.: - `input_data={"df_and_mapper": None}`: Fails because there needs to be a dict below "df_and_mapper" - `input_data={"not_around": None}`: Fails because `not_around` is not a valid child of `input_data` - `not_around=None`: Fails because not around is not a valid root node (already taken care of by Python itself when calling `main`, but let's be pedantic...)
…omatically created function (#1050) ### What problem do you want to solve? Closes #1049 --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
### What problem do you want to solve? - [x] Add a GEP for the revamped interface - [x] Update earlier GEPs to reflect the changes that have become necessary after GEP 6 (since our documentation is small, it does not make sense to keep outdated things around). - [x] Add the finalised schema from #880 as an appendix to GEP 3 [Resolution on Zulip.](https://gettsim.zulipchat.com/#narrow/channel/309998-GEPs/topic/GEP.2007/near/530389224) --------- Co-authored-by: Marvin Immesberger <[email protected]>
In sync with [TTSIM PR 1](ttsim-dev/ttsim#1), this leaves just GETTSIM in here. Also includes the renamings in [TTSIM PR 3](ttsim-dev/ttsim#3), which are on PyPI as 1.0a1 Fixes #1003.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR collects the different components of changing GETTSIM's internal DAG from a flat namespace to a nested one. This is a very large change that will be made in multiple PRs. Each PR individually should not change the main branch, however.
The different PRs are (to be updated)
_gettsimimports fromttsimandtests/ttsim#853test_policyinfrastructure cannot handle tests with only a single output column #918, Harmonize tests across ttsim/gettsim #883