Skip to content

Conversation

@hmgaudecker
Copy link
Collaborator

@hmgaudecker hmgaudecker commented Dec 12, 2024

This PR collects the different components of changing GETTSIM's internal DAG from a flat namespace to a nested one. This is a very large change that will be made in multiple PRs. Each PR individually should not change the main branch, however.

The different PRs are (to be updated)

@codecov
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

Attention: Patch coverage is 61.94462% with 591 lines in your changes missing coverage. Please review.

Project coverage is 78.03%. Comparing base (0e21352) to head (ad1d40e).

Files with missing lines Patch % Lines
src/_gettsim/arbeitslosengeld_2/regelbedarf.py 51.09% 67 Missing ⚠️
src/_gettsim/erziehungsgeld/erziehungsgeld.py 44.44% 55 Missing ⚠️
src/_gettsim/einkommensteuer/abzüge/vorsorge.py 43.75% 36 Missing ⚠️
src/_gettsim/lohnsteuer/lohnsteuer.py 47.82% 36 Missing ⚠️
src/_gettsim/kinderzuschlag/einkommen.py 55.40% 33 Missing ⚠️
src/_gettsim/elterngeld/elterngeld.py 52.45% 29 Missing ⚠️
src/_gettsim/kinderzuschlag/kinderzuschlag.py 52.63% 27 Missing ⚠️
...gettsim/arbeitslosengeld_2/freibeträge_vermögen.py 32.43% 25 Missing ⚠️
src/_gettsim/lohnsteuer/einkommen.py 45.65% 25 Missing ⚠️
src/_gettsim/einkommensteuer/einkommensteuer.py 62.29% 23 Missing ⚠️
... and 35 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #804      +/-   ##
==========================================
- Coverage   87.82%   78.03%   -9.80%     
==========================================
  Files          56      180     +124     
  Lines        3976     8039    +4063     
==========================================
+ Hits         3492     6273    +2781     
- Misses        484     1766    +1282     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MImmesberger MImmesberger mentioned this pull request Dec 19, 2024
7 tasks
@hmgaudecker hmgaudecker mentioned this pull request Dec 21, 2024
8 tasks
@timmens timmens mentioned this pull request Jan 23, 2025
18 tasks
lars-reimann and others added 3 commits February 15, 2025 20:32
This PR adds the namespace infrastructure to GETTSIM.

- [x] Write `policy_function` decorator (rename `policy_info` and change
behavior so that a `PolicyFunction` instance is returned). ~Apply to all
TT functions.~ (that should be part of renamings)
- [x] Check that functions in module with same simple_name have the
correct start_date, end_date specs (this was removed from the
policy_info decorator).
- [x] Remove doubled levels in the functions tree automatically (to
avoid writing functions in `__init__.py`).
- [x] Go over type hints for aggregation functions.
- [x] Refactor interface module.
- [x] Implement some safety checks 
- [x] No function should have the same name as a module in the same
directory
- [x] No trailing underscores in module names (for [DAGS
PR](OpenSourceEconomics/dags#17))

---------

Co-authored-by: Marvin Immesberger <[email protected]>
Co-authored-by: Tim Mensinger <[email protected]>
Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
The way we implemented the loading of namespaces in #780 does not quite
work.

We want to have them at the directory level to balance use of namespaces
and reducing the amount of qualified names.

Additionally, we had to change the order of the upsert operations in `combine_policy_functions_and_derived_functions`. Doesn't affect the happy path, but in case of conflicts the previous behaviour did not make sense.

---------

Co-authored-by: Marvin Immesberger <[email protected]>
MImmesberger and others added 11 commits March 11, 2025 11:34
### What problem do you want to solve?

Uses the qualified name instead of the leaf name to look for rounding
specs in the params file. This is a temporary solution until we have
tackled #823.
### What problem do you want to solve?

This PR provides the necessary renamings of taxes and transfers
functions for #804.

ToDo:
- [x] Create new directory structure
- [x] Rename all function arguments
- [x] Set namespace of basic input variables
- [x] Update `pyproject.toml` to reflect new file structure
- [x] Make sure tests run (#841)
- [x] `kinderfreibetragempfänger` $\rightarrow$
`kinderfreibetragsempfänger`
- [x] Link issue #842 in relevant docstrings

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
### What problem do you want to solve?

This PR implementes the distinction between TTSIM (basically the
infrastructure) and DE (the German taxes and transfers) components of
GETTSIM. This was discussed
[here](#780 (comment)).

In particular, I

- Move modules from `_gettsim` to `ttsim/` or leave them in `_gettsim`
- Remove the `taxes` and `transfers` subdirs
- Split up `config.py` into a TTSIM and a DE part
- Adjust the loader accordingly
- Also split up tests in TTSIM and DE parts.
- Introduce quarters

For tests, the distinction is not always super sharp. There are some
tests that test a specific feature of the infrastructure (e.g.
vectorization), but do this by loading the functions tree from the DE
part. Still, I chose to label those tests as `ttsim`.

Similarly, we don't test `aggregate_by_p_id` directly in the `ttsim`
part, but do it by testing specific components of the TT system. I put
them in the `de` dir.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
Co-authored-by: Tim Mensinger <[email protected]>
### What problem do you want to solve?

Will close #852. Adapts tests to match GETTSIM src structure.
AggregationType-s instead of strings.
### What problem do you want to solve?

This PR makes a step towards separating TTSIM and GETTSIM by testing the
TTSIM infrastructure with its own instance of a fictitious taxes and
transfers system that makes use of all features.

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
Co-authored-by: Tim Mensinger <[email protected]>
`fg_id` creation did not work correctly for some orderings of adults
(#801). Now adds fg_id for both the einstandspartner and his children at
the same time.

- [x] Fix loop
- [x] Add test case for special case mentioned in #801

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
timmens and others added 7 commits April 12, 2025 09:10
This is a huge PR, which started innocently as a fix to #833.

In the end, it turned out to be very difficult to change things locally,
so in the process of the intense sprint during the week 7-12 April 2025, 
this ended up including the following:
- Updated type hierarchy (`TTSIMObject` as basic building block,
`PolicyInput` and `TTSIMFunction` inheriting from that, `TTSIMFunction`
has further subclasses for policy, aggregation, ...).
- Further separation of tests in ttsim / _gettsim. Including Middle
Earth Taxes an Transfers SIMulator METTSIM as tiny example for the
ttsim-side of tests (#856) and sensible structure for `_gettsim_tests`
(#858)
- Sensible treatment of Einnahmen / Einkünfte (#862)
- Specify rounding in a dataclass to be provided in the decorators
rather than referencing the yaml files from there (#859)
- Improve structure for AggregationSpecs, including an Enum for the type
of Aggregations (#860)

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
Co-authored-by: Marvin Immesberger <[email protected]>
Co-authored-by: Marvin Immesberger <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
# What problem do you want to solve?

Unify handling of dates to remove ambiguity and code duplication.

---------

Co-authored-by: Marvin Immesberger <[email protected]>
We put some effort into trying to convert types. However, the code was a
mess and it would be a pain to maintain it. What Python/Pandas/Numpy/Jax
do is more than good enough for GETTSIM, too. Now that we have the
explicitly annotated `policy_inputs`, it will be easy to check and throw
errors if users want to be strict.

This PR removes the code which has been stale for the last week, anyhow.

---------

Co-authored-by: Marvin Immesberger <[email protected]>
### What problem do you want to solve?

Fix #870 and related things. In particular, defer some checks so that
they are only done for variables that are present / set start/end dates
of explicit aggregation functions so they are derived from source
object.

---------

Co-authored-by: Marvin Immesberger <[email protected]>
Co-authored-by: Tim Mensinger <[email protected]>
Co-authored-by: Max Jahn <[email protected]>
### What problem do you want to solve?

Tests in `test_jax_jit_kindergeld.py` were failing because policy
functions were not jittable.

### Problems and Solutions

#### Non-Hashable Function in `jit`

The policy functions were non-jittable because the dataclasses were
non-frozen and had the equality argument set to True. This implies that
the dataclass get an equality method which compares the fields. To not
break the equality/hash contract (a == b implies hash(a) == hash(b)), a
dataclass with equality method that is not frozen has a deactivated
hash. This does not work with `jax.jit`, because for caching JAX
requires a hash of the object. By freezing the dataclasses they get
their hash back, and everything works nicely again with JAX.

> [!NOTE]
> Frozen dataclasses cannot have standard assignments in the post init
method. For this I had to implement a `frozen_safe_update_wrapper`.

### Todo

- [x] Freeze ttsim_objects dataclasses and update post init of
`TTSIMFunction` to be compatible
- [x] Understand why list `single_test` in `kindergeld_policy_test`
fixture has only one entry, although the yaml file says there are two
outputs
- [x] Fix `test_compute_taxes_and_transfers_kindergeld`

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
In limited set of experiments, it produced exactly the same result.
`ast.unparse` is available since Python 3.9, so it's fine to use.
- [x] Add a json (yaml) schema based on GEP-03 
- [x] Make sure manual validation of parameters passes
- [x] make a pre-commit hook out of this
MImmesberger and others added 16 commits July 17, 2025 11:17
### What problem do you want to solve?

Closes #1025 

### Todo

Add tests via `main` for

- [x] input_data_tree_is_invalid
- [x] environment_is_invalid
- [x] input_df_mapper_columns_missing_in_df
- [x] targets_tree_is_invalid

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
### What problem do you want to solve?

Enable

- [x] `INP001` (implicit namespace packages without init.)
- [x] `PLR2004` (Magic values used in comparison)
- [x] `PT006` (Allows only lists of tuples in parametrize, even if
single argument)
- [x] `PT007` (wrong type in parametrize)
- [x] `S101` (use of asserts outside of tests)
- [x] some more checks on individual files

---------

Co-authored-by: Marvin Immesberger <[email protected]>
### What problem do you want to solve?

Closes #893 

Changes:
- Change namespace to Einkommensteuer/Einkünfte/Sonstige/Renten
- Add three types of private pension income: gefördert / betrieblich /
regulär
- Implement the current state of law regarding their treatment for SV
contributions and taxation

Issue for historical support of TT rules: #1030
…lts are requested (#1031)

### What problem do you want to solve?

Closes #1006

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
### What problem do you want to solve?

`processed_data` uses an $O(n^2)$ approach to link original and internal
IDs. This PR implements an $O(n\cdot \log(n))$ approach.

## Benchmarks

### On `gep-07` (3525917):

```cmd
====================================================================
SUMMARY TABLE
====================================================================
Dataset             numpy_time  numpy_hash  jax_time    jax_hash
--------------------------------------------------------------------
df_5000.parquet     1.2681      13106402    15.5897     bf85cb3d
df_10000.parquet    4.6791      308ca129    30.7932     57ba7579
df_20000.parquet    15.7451     51e8d0b4    62.4070     21636ea4
df_40000.parquet    54.0340     6ae704d8    137.1975    30bbf3ea
```

### This PR:

**[EDIT: updated results after cf37b75]**
```cmd
====================================================================
SUMMARY TABLE
====================================================================
Dataset             numpy_time  numpy_hash  jax_time    jax_hash
--------------------------------------------------------------------
df_5000.parquet     0.0378      13106402    0.8950      bf85cb3d
df_10000.parquet    0.0402      308ca129    0.8108      57ba7579
df_20000.parquet    0.1107      51e8d0b4    1.1354      21636ea4
df_40000.parquet    0.0853      6ae704d8    1.8208      30bbf3ea

```

The benchmark essentially runs

```python
        result = main(
            date_str=None,
            input_data=InputData.df_and_mapper(
                df=data,
                mapper=MAPPER,
            ),
            main_targets=[MainTarget.processed_data],
            tt_targets=TTTargets(tree=TT_TARGETS),
            backend=backend,
        )
```

on the targets defined in `interface_playground.ipynb` with differently
sized datasets that replicate the example household from the same
notebook `N` times (i.e., `N*3` persons in each dataset). The hashes
demonstrate that this PR creates `result` objects that are identical to
the ones created with the $O(n^2)$ approach.

To reproduce the benchmarks:
- Run `make_data.py` (see attached .zip) to create example datasets
- Run `benchmark_comparison.py` to create tables above


[benchmark.zip](https://github.com/user-attachments/files/21327575/benchmark.zip)

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
Co-authored-by: mj023 <[email protected]>
### What problem do you want to solve?

Clarifies the meaning of `ist_selbstständig` by renaming to
`ist_hauptberuflich_selbstständig` as discussed in #892.
… exemptions to social insurance contributions (#1032)
#1035)

Following feedback on GEP 7, we got rid of the `date` / `date_str` inputs to main. Instead:

- `policy_date` / `policy_date_str` is required to set up the policy environment and will be stored in there.
- `evaluation_date` / `evaluation_date_str` is an optional input to `main`

The evaluation date will be used from the following sources:
1. If present in the input data, that will be used
2. Unless 1., the variable passed to `main` will be used
3. Unless 2., the policy date will be used

If more than one option is specified, a warning will be issued.

---------

Co-authored-by: Marvin Immesberger <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
### What problem do you want to solve?

This PR gets the input names (+docstrings) of "vorjahr" and similar inputs straight
and puts them in the correct namespaces.
### What problem do you want to solve?

Closes #757

In fact, any occurences of automatically created cycles reported in #757
have already been solved via the namespaces structure.

In ALG2 we have

```python
@policy_function(start_date="2005-01-01")
def wohnfläche(
    wohnen__wohnfläche_hh: float,
    anzahl_personen_hh: int,
) -> float:
    """Share of household's dwelling size attributed to a single person."""
    return wohnen__wohnfläche_hh / anzahl_personen_hh
```

So we don't create a cycle anymore as we have `wohnen__wohnfläche ->
wohnen__wohnfläche_hh` but `wohnen__wohnfläche_hh ->
arbeitslosengeld_2__wohnfläche`.

Still, thanks to the work done by Lars in the past, a general TTSIM
solution was easy to implement because it just copies the logic done for
time-conversion functions.
*Leaving the almost-unedited stuff from Claude Code here for
demonstration purposes*

## Summary

- Implements `copy_environment` function to address issue #1008
- Provides proper copying of policy environments containing unpickleable
function objects
- Available in both `ttsim` and `gettsim` namespaces with full type
safety

## Implementation Details
- **Function**: Uses `optree.tree_map(copy, tree)` for shallow copying
of tree leaves while recreating structure
- **Type Safety**: Proper type hints with `@overload` decorators for
`PolicyEnvironment` and specialized environment types
- **Error Handling**: Solves `copy.deepcopy` failures on policy
environments containing function objects

## Key Features
- ✅ **Type-safe copying** with specific return types for each
environment type
- ✅ **Independence guarantee** - modifications to copy don't affect
original
- ✅ **Performance optimized** using optree for tree operations
- ✅ **Comprehensive testing** with human-readable test functions
- ✅ **Full documentation** with examples and technical explanations

## Usage Example
```python
from gettsim import main, copy_environment, MainTarget
from ttsim.tt_dag_elements.param_objects import ScalarParam

# Load and copy policy environment
policy_env = main(date_str="2025-01-01", main_target=MainTarget.policy_environment)
copied_env = copy_environment(policy_env)

# Modify copy without affecting original
copied_env["sozialversicherung"]["rente"]["beitrag"]["beitragssatz"] = ScalarParam(value=0.3)
```

## Test Coverage
- ✅ Single parameter copying
- ✅ Nested dictionary structures  
- ✅ Full policy environment integration
- ✅ Error conditions and edge cases
- ✅ Type inference verification
- ✅ Independence testing

Closes #1008

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Marvin Immesberger <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
@hmgaudecker hmgaudecker force-pushed the collect-components-of-namespaces branch from cb927be to cbf95c1 Compare July 22, 2025 14:02
MImmesberger and others added 11 commits July 22, 2025 17:44
### What problem do you want to solve?

Closes #999 

Fails if any `param_function` depends on a `ColumnObject`, with the
exception being `evaluation_x` and `policy_x` (they are `PolicyInputs`).

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
Fixes #869

- Raise an error when type-conversion calls like float, int, etc. are encountered
- Raise an error when augmented assignment (+= and -= and *= and /= and friends) are encountered
- Remove some tests that actually relied on that behavior, make test numbering consistent.
Add `fail_if.backend_has_changed`. 

Lessons learned:
- Numpy can handle Jax arrays (see test)
- Jax can handle NumPy arrays that are passed as the processed data (see
test)
- The problematic case are parameters that are partialled to functions.
Unfortunately, these are typically custom objects. We to loop over them and
check whether any of them happens to be a numpy array
 (#1048)

Check whether the structure of the paths matches. E.g.:

- `input_data={"df_and_mapper": None}`: Fails because there needs to be
a dict below "df_and_mapper"
- `input_data={"not_around": None}`: Fails because `not_around` is not a
valid child of `input_data`
- `not_around=None`: Fails because not around is not a valid root node
(already taken care of by Python itself when calling `main`, but let's
be pedantic...)
…omatically created function (#1050)

### What problem do you want to solve?

Closes #1049

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
### What problem do you want to solve?

- [x] Add a GEP for the revamped interface
- [x] Update earlier GEPs to reflect the changes that have become
necessary after GEP 6 (since our documentation is small, it does not
make sense to keep outdated things around).
- [x] Add the finalised schema from #880 as an appendix to GEP 3

[Resolution on Zulip.](https://gettsim.zulipchat.com/#narrow/channel/309998-GEPs/topic/GEP.2007/near/530389224)

---------

Co-authored-by: Marvin Immesberger <[email protected]>
In sync with [TTSIM PR 1](ttsim-dev/ttsim#1),
this leaves just GETTSIM in here. Also includes the renamings in 
[TTSIM PR 3](ttsim-dev/ttsim#3), which are on
PyPI as 1.0a1

Fixes #1003.
@hmgaudecker hmgaudecker merged commit 6319e73 into main Jul 24, 2025
13 checks passed
@hmgaudecker hmgaudecker deleted the collect-components-of-namespaces branch July 24, 2025 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Interface, 2024 edition

8 participants