Skip to content
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
353 commits
Select commit Hold shift + click to select a range
f674987
UV.
MImmesberger May 12, 2025
d0b5e34
Unterhalt.
MImmesberger May 12, 2025
d3423e9
Merge branch 'rename-gettsim-params-fix-yaml-validation' of https://g…
MImmesberger May 12, 2025
f394912
Typos.
MImmesberger May 12, 2025
677e8df
Add parameters for aRW calculation back in.
MImmesberger May 12, 2025
e4986e4
Fix reference.
MImmesberger May 12, 2025
dbb956a
Make unit and reference period required (#904)
hmgaudecker May 13, 2025
449873f
Went through changes, fixed inconsistencies and typos.
hmgaudecker May 13, 2025
9e1fec0
Merge branch 'rename-gettsim-params-fix-yaml-validation' of github.co…
hmgaudecker May 13, 2025
9d8ae7a
Remaining files.
hmgaudecker May 13, 2025
a881e94
Remove 'scalar' as a possible key in the params files, use 'value' in…
hmgaudecker May 13, 2025
91615e0
Abgeltungssteuer.
hmgaudecker May 13, 2025
8877ff8
Rename GETTSIM params and fix yaml validation issues (#900)
hmgaudecker May 13, 2025
5be2f4e
Merge branch 'collect-unify-parsing-of-params' into move-gettsim-para…
hmgaudecker May 13, 2025
a131227
Add a few safety checks and modifications to behavior.
hmgaudecker May 13, 2025
355d4d0
Kinderfreibetrag.
hmgaudecker May 13, 2025
76a06e6
Kindergeld.
hmgaudecker May 13, 2025
930ba79
A bit of Einkommensteuer / Abzüge. Not working, but switching machine…
hmgaudecker May 13, 2025
3715950
Make tests pass by adding a somewhat ad-hoc Evaluationsjahr.
hmgaudecker May 13, 2025
c7a077b
Moved on with Abzügen von Einkünften/Einnahmen.
hmgaudecker May 13, 2025
7197b6e
Altersfreibetrag.
hmgaudecker May 14, 2025
20eebcb
Alleinerziehendenfreibetrag.
hmgaudecker May 14, 2025
76ee21b
Behindertenpauschbetrag.
hmgaudecker May 14, 2025
4af3204
Finish converting eink_st_abzuege.yaml.
hmgaudecker May 14, 2025
734720a
Simplify calculation of Lohnsteuer / Vorsorgeaufwendungen.
hmgaudecker May 14, 2025
4a13dbd
Be explicit in name.
hmgaudecker May 14, 2025
2dbdc72
Einkommensteuer parameters.
hmgaudecker May 14, 2025
a43fe85
Solidaritätszuschlag parameters.
hmgaudecker May 14, 2025
491e58c
AV.
MImmesberger May 16, 2025
23541af
PV.
MImmesberger May 16, 2025
bfb0a88
Add add_jahresanfang keyword to params-schema.
MImmesberger May 16, 2025
c836db2
Fix jahresanfang in AV.2
MImmesberger May 16, 2025
52899ba
Fix wrong namespace for AV.
MImmesberger May 16, 2025
a369bd2
KV.
MImmesberger May 16, 2025
989f876
Split KV params into multiple files.
MImmesberger May 16, 2025
4084081
Rename PV params to beitragssatz.
MImmesberger May 16, 2025
5ab7d04
Fix Soli implementation.
MImmesberger May 16, 2025
f622f48
RV beitrag params.
MImmesberger May 16, 2025
0b9b259
Forogt to add type hints in commit.
MImmesberger May 16, 2025
1eb63d6
Move ALG 1 + 2 params to namespace (#912)
MImmesberger May 17, 2025
fe046aa
Merge branch 'move-gettsim-params-files' into move-sozialversicherung…
MImmesberger May 17, 2025
4904bac
Merge branch 'move-gettsim-params-files' into move-sozialversicherung…
MImmesberger May 17, 2025
8f74316
Split up parameters if their type changes.
MImmesberger May 17, 2025
a8f05ae
Split up BBM params for RV.
MImmesberger May 17, 2025
5918731
Stopover. Need to continue with Rente Altersgrenzen.
MImmesberger May 17, 2025
99ba5af
Fix some small bugs.
MImmesberger May 17, 2025
55afbd8
Some more small bugs
MImmesberger May 17, 2025
48220d5
Some docstrings.
MImmesberger May 17, 2025
13bdffa
Everything but Altersgrenzen should be done.
MImmesberger May 17, 2025
32451f8
Improve some docstrings and get rid of duplicate policy functions.
MImmesberger May 17, 2025
3904012
Don't duplicate params_functions with policy_functions.
MImmesberger May 17, 2025
18061ed
Remove params function from top of files.
MImmesberger May 17, 2025
1f362ee
Bug fixes and typos.
MImmesberger May 17, 2025
05d3f81
Some review comments. Stopped because of ConflictingNamesError.
MImmesberger May 18, 2025
c320e49
Merge branch collect-unify-parsing-of-params.
hmgaudecker May 19, 2025
47e8ac8
Merge branch 'move-gettsim-params-files' into gep-07
hmgaudecker May 19, 2025
c9d8330
Merge branch 'move-sozialversicherung-params' into gep-07
hmgaudecker May 19, 2025
1e9cb45
Move on with updating GEP 3.
hmgaudecker May 19, 2025
b5888a9
Active periods for params (#916)
hmgaudecker May 19, 2025
62bae60
More review comments.
MImmesberger May 19, 2025
f41601d
Merge branch 'move-sozialversicherung-params' of https://github.com/i…
MImmesberger May 19, 2025
177e578
Some updates to GEP 2
hmgaudecker May 19, 2025
f0b793a
Unskip minijobgrenze tests and use parameter directly from yaml.
MImmesberger May 19, 2025
675ea9b
Rente: parameter_beitragssatz -> beitragssatz.
MImmesberger May 19, 2025
410c850
arbeitslosen: parameter_beitragssatz -> beitragssatz.
MImmesberger May 19, 2025
3e3dbd6
Consistent file names.
hmgaudecker May 19, 2025
114e5ff
Replace misleading function name.
hmgaudecker May 19, 2025
f0fd6dd
kranken: partly parameter_beitragssatz -> beitragssatz.
MImmesberger May 19, 2025
80e7934
Regression test Grundrente married couples.
MImmesberger May 19, 2025
e8694fe
Remove references to gettsim from almost all parts of TTSIM (just par…
hmgaudecker May 19, 2025
a9c5676
Style.
MImmesberger May 19, 2025
757b252
Merge branch 'move-sozialversicherung-params' of https://github.com/i…
MImmesberger May 19, 2025
b9d81f8
Merge branch 'move-sozialversicherung-params' into gep-07
hmgaudecker May 19, 2025
5befc78
More examples for GEP 3.
hmgaudecker May 19, 2025
b8039b4
Simplify.
hmgaudecker May 19, 2025
2625bf9
Remove midijobgrenze from minijob.yaml.
hmgaudecker May 19, 2025
ec8c4a7
Rest of review comments.
MImmesberger May 19, 2025
0767a64
Regelaltersgrenze.
MImmesberger May 19, 2025
7049c30
Rewrite Altersrente wg. AL yaml.
MImmesberger May 19, 2025
29edc99
Move Vertrauensschutz out of regular Altersgrenzen dicts.
MImmesberger May 19, 2025
4ad271e
Fix some typos, make params in .yaml file irrelevant for now.
MImmesberger May 19, 2025
f142427
Allow params as targets (#922)
hmgaudecker May 20, 2025
11a83ad
Convert policy functions to params functions.
hmgaudecker May 20, 2025
c6b4f1e
Simplify.
hmgaudecker May 20, 2025
6dde751
Simplify Pflegeversicherung.
hmgaudecker May 20, 2025
c73e052
Clarify parameter name.
hmgaudecker May 20, 2025
72c77f4
Language, harmonise.
hmgaudecker May 20, 2025
ed66e19
Enforce active periods in Rente für Frauen / wegen Arbeitslosigkeit.
hmgaudecker May 20, 2025
2c2fe7f
Fix typo in file name.
hmgaudecker May 20, 2025
f76935e
Merge branch 'move-sozialversicherung-params' into gep-07
hmgaudecker May 20, 2025
5ab4fb5
Update GEPs 3 & 4.
hmgaudecker May 20, 2025
4ce0b2c
Tiny updates to GEP 5
hmgaudecker May 20, 2025
7136e00
Tiny updates to GEP 6 after the fact, mostly name changes.
hmgaudecker May 20, 2025
1956bda
More drafting of GEP 7.
hmgaudecker May 20, 2025
e0db8be
Fix name/description.
hmgaudecker May 20, 2025
8f3d109
Suggested change for oss interface; better docs of type hints (and sm…
hmgaudecker May 21, 2025
cf8a544
Restructure types (#923)
hmgaudecker May 21, 2025
2e82570
Some more renamings that should have really gone into #923. Add 'type…
hmgaudecker May 21, 2025
e545a38
Renamings.
hmgaudecker May 22, 2025
623d3cb
Add LookUpTableParam.
hmgaudecker May 22, 2025
a385995
Simplify.
hmgaudecker May 22, 2025
68d794d
Use dags with annotations (#909)
hmgaudecker May 22, 2025
b1aa1af
Merge branch 'collect-unify-parsing-of-params' into move-gettsim-para…
hmgaudecker May 22, 2025
db4a80c
Merge branch 'move-gettsim-params-files' into move-sozialversicherung…
hmgaudecker May 22, 2025
a5bd0ad
Merge branch 'move-sozialversicherung-params' into gep-07
hmgaudecker May 22, 2025
b2b08bb
Move Altersgrenze besonders langjährig Versicherte.
hmgaudecker May 22, 2025
161034c
Start with birth month-specific phase-in, switching machines.
hmgaudecker May 22, 2025
9f99056
Further clarification to typing.
hmgaudecker May 22, 2025
c7d5cf5
Make ID creation functions jittable (#905)
mj023 May 22, 2025
7c1e855
Trivial change that did not make it into #905.
hmgaudecker May 22, 2025
57ff343
Merge branch 'move-gettsim-params-files' into move-sozialversicherung…
hmgaudecker May 22, 2025
bc3f00f
Get tests from #905 back, which just changed outputs, not inputs.
hmgaudecker May 23, 2025
66f8d74
Merge branch 'move-sozialversicherung-params' of github.com:iza-insti…
hmgaudecker May 23, 2025
48599c4
Add 'type: birth_month_based_phase_in', not fully working yet.
hmgaudecker May 23, 2025
25994d5
Fix mypy errors.
hmgaudecker May 23, 2025
11120f2
Remove superfluous `updates_previous`.
hmgaudecker May 23, 2025
9e02a73
altersgrenze_abschlagsfrei -> altersgrenze everywhere.
hmgaudecker May 23, 2025
6b2cfe7
Small changes in EM Rente and Minijob based on review comments.
MImmesberger May 23, 2025
74d71c9
Merge branch 'move-sozialversicherung-params' of https://github.com/i…
MImmesberger May 23, 2025
19105e0
Forgot to change namespace in last commit.
MImmesberger May 23, 2025
683ffe7
Convert Arbeitslosenversicherung.
hmgaudecker May 23, 2025
78139f8
Add todos regarding Ost/West difference in EP.
MImmesberger May 23, 2025
0f58adf
Finish Rente wg. AL.
MImmesberger May 23, 2025
26596d0
Converted Pflege, but looks like I simplified too much.
hmgaudecker May 23, 2025
00d2dc0
Rewrite Altersgrenze f. langj. Vers yaml file.
MImmesberger May 23, 2025
197d241
Re-order, fix an omission of dividing by 2.
hmgaudecker May 23, 2025
3cc68bc
Rente f langj Vers.
MImmesberger May 23, 2025
0e6ceb0
Add DATEV test cases.
MImmesberger May 23, 2025
e83a923
Fix Pflegeversicherung 2023, but test failures for 2025.
hmgaudecker May 23, 2025
1df50e8
Fix sign error.
hmgaudecker May 23, 2025
f762c25
Merge branch 'move-sozialversicherung-params' into fix-contribution-r…
hmgaudecker May 23, 2025
7724399
Convert Rente.
hmgaudecker May 23, 2025
3cc54e6
Merge branch 'fix-contribution-rates' into gep-07
hmgaudecker May 24, 2025
cb6a3e8
Get rid of vectorization_strategy='loop' in kinderzuschlag.
hmgaudecker May 24, 2025
8be248c
Merge branch 'fix-contribution-rates' into gep-07
hmgaudecker May 24, 2025
32ee7c8
Add running example, first complete draft of GEP 7.
hmgaudecker May 24, 2025
bdb79f1
Include GEP 7 in doc build.
hmgaudecker May 24, 2025
1bdb0c0
Make some (hopefully) uncontroversial changes to GEPs.
MImmesberger May 25, 2025
387622f
Fix some docstrings and remove duplicate parameters.
MImmesberger May 25, 2025
b0a249b
Move test file to 2024, update for 2025.
MImmesberger May 25, 2025
94d77c5
Fix PV AG formula after October 2022. Unskip 2024 test.
MImmesberger May 25, 2025
bd8e202
Use beitragssatz[standard] instead of beitragssatz_arbeitgeber * 2.
MImmesberger May 25, 2025
484361d
Apply suggestions from code review
hmgaudecker May 26, 2025
423dd93
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 26, 2025
0845d94
Missed a suggestion in previous commit.
hmgaudecker May 26, 2025
0abfbaf
Fix phase-in / phase-out to incorporate both in years / months.
hmgaudecker May 26, 2025
d4fc83a
Merge branch 'fix-contribution-rates' into gep-07
hmgaudecker May 26, 2025
5725868
Improve a sentence in GEP 2.
hmgaudecker May 26, 2025
85dfbb4
Add paragraphs based on review comments.
hmgaudecker May 26, 2025
fd5eb1a
Add json schema to GEP 3.
hmgaudecker May 26, 2025
f34d3a0
Unify contribution rates to Sozialversicherung, fixes #921 (#926)
hmgaudecker May 26, 2025
29467a9
Merge branch 'move-sozialversicherung-params' into gep-07
hmgaudecker May 26, 2025
74caeed
Move Sozialversicherung params to namespace (#914)
MImmesberger May 26, 2025
b9a2351
Merge branch 'move-gettsim-params-files' into gep-07
hmgaudecker May 26, 2025
ad769a3
Move GETTSIM params files, use new machinery (#908)
hmgaudecker May 28, 2025
9cb283f
Transfer parameter_behindertenpauschbetrag to piecewise_constant.
MImmesberger May 28, 2025
a9aebd3
Apply phase in parameters for EM Rente (#940)
MImmesberger May 29, 2025
f5e12f8
Apply `ConsecutiveInt`-type to EstG Abzüge and Kindergeld (#939)
MImmesberger May 29, 2025
39794db
Single tree in policy environment (#941)
hmgaudecker May 29, 2025
2289d30
Add nested_data_to_dataframe function.
MImmesberger May 31, 2025
84b833e
Main adjustments are made, backup commit before letting Cursor go wild.
hmgaudecker Jun 2, 2025
a6da9a8
Fix missing imports / renamings.
hmgaudecker Jun 2, 2025
cf12e47
Adjust test_compute_taxes_and_transfers.
hmgaudecker Jun 2, 2025
257d07b
Port METTSIM and GETTSIM tests, except for oss. Fixes most of #883 (n…
hmgaudecker Jun 2, 2025
fe2b150
Port test_policy_environment.
hmgaudecker Jun 2, 2025
d3ab045
Port test_rounding.
hmgaudecker Jun 2, 2025
b5170fa
Merge branch 'collect-components-of-namespaces' into new-interface
hmgaudecker Jun 3, 2025
7408977
Merge branch 'collect-components-of-namespaces' into new-interface
hmgaudecker Jun 3, 2025
67eb995
Merge branch 'new-interface' into gep-07
hmgaudecker Jun 3, 2025
e8a7d96
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jun 5, 2025
4e0f7d5
Revert some accidental changes made here, which I noted in #953.
hmgaudecker Jun 5, 2025
8f0116a
Add a test whether GETTSIM functions can be jitted.
hmgaudecker Jun 27, 2025
7409034
Correct date.
hmgaudecker Jun 27, 2025
934bc34
Update dags branch after merge over there.
hmgaudecker Jun 27, 2025
4b1039c
Add KV Beitragssatz for early 2005.
MImmesberger Jun 28, 2025
6d2ea17
Fix missing Jahresanfang and Sonderbeitrag params for KV.
MImmesberger Jun 28, 2025
0436e40
Remove dummy param, add AG _jahresanfang function for 2005.
MImmesberger Jun 29, 2025
2ecb6e8
Skip NotImplementedError.
MImmesberger Jun 29, 2025
2053d2f
Make funcs jittable
mj023 Jun 29, 2025
719ffc2
Change Input Types
mj023 Jun 29, 2025
bbc3ee7
Update environment, include Python 3.13 also in GHA, add jax to GHA.
hmgaudecker Jun 30, 2025
841eec0
Remove unused 'geburtsdatum' and exclude jax-datetime for now.
hmgaudecker Jun 30, 2025
8fe60a5
Merge branch 'collect-components-of-namespaces' into test-jittability
hmgaudecker Jun 30, 2025
12fbcf2
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jun 30, 2025
d62f2d6
Go back to dags / main after lexsort branch has been merged.
hmgaudecker Jun 30, 2025
54b788f
Fix copy/paste error.
hmgaudecker Jun 30, 2025
34142f9
Get rid of xnp.squeeze in piecewise_polynomial, which caused issues w…
hmgaudecker Jul 1, 2025
ed3c9e7
Simplify code.
hmgaudecker Jul 1, 2025
0ae9a98
Make sure we have the correct Kaleido version everywhere, but do not …
hmgaudecker Jul 1, 2025
9058e41
Fix remaining tests by making sure subtraction operations on ints ret…
hmgaudecker Jul 1, 2025
dcd8539
Enable running Jax tests on Windows and MacOS.
hmgaudecker Jul 1, 2025
51b5d20
Improve readability.
hmgaudecker Jul 1, 2025
6a16792
Get rid of dtype truncation warnings, not an issue for lookup tables.
hmgaudecker Jul 1, 2025
9178019
Get rid of outdated parameter.
hmgaudecker Jul 1, 2025
22e42db
Revert "Fix remaining tests by making sure subtraction operations on …
mj023 Jul 1, 2025
fc3b135
Fix dtype of grouped count in jax
mj023 Jul 1, 2025
2e368dc
Add CUDA environment and run tests there.
hmgaudecker Jul 1, 2025
4e053b7
Merge branch 'test-jittability' into gep-07
hmgaudecker Jul 2, 2025
14c26c6
Use pixi for rtd build.
hmgaudecker Jul 2, 2025
b2e0aa7
Rename
hmgaudecker Jul 2, 2025
a416677
Restructure, generalise.
hmgaudecker Jul 2, 2025
009aa71
Rename and spell out.
hmgaudecker Jul 2, 2025
9753800
Backup, switching machines.
hmgaudecker Jul 2, 2025
74b57ca
Adjust dates in GEPs 1-3, first step updatin GEP 7. Switching machines.
hmgaudecker Jul 3, 2025
6ee8258
Add remaining components of interface. Checkpoint before using differ…
hmgaudecker Jul 3, 2025
92d5cb1
Change strategy: Use a single file for main_ars and dataclasses for i…
hmgaudecker Jul 3, 2025
c134108
Get rid of _InterfaceDAGElement object.
hmgaudecker Jul 3, 2025
cb0a85a
Get rid of _InterfaceDAGElement object.
hmgaudecker Jul 3, 2025
f21c854
Merge branch 'collect-components-of-namespaces' into remaining-interf…
hmgaudecker Jul 3, 2025
f691a71
Updated main namespace.
hmgaudecker Jul 3, 2025
d857889
Updated GETTSIM interface.
hmgaudecker Jul 3, 2025
057c6d0
Update example.
hmgaudecker Jul 3, 2025
d7193c0
Merge branch 'remaining-interface-components' into gep-07
hmgaudecker Jul 3, 2025
72d5334
Update example and first bullet.
hmgaudecker Jul 3, 2025
e1b3f14
Attempt to fix rtd build
hmgaudecker Jul 3, 2025
1395a13
Rename targets -> tt_targets
hmgaudecker Jul 7, 2025
7a37dd3
Use main_target / main_targets instead of output + name / names.
hmgaudecker Jul 7, 2025
e2c39c2
Merge branch 'main_targets' into gep-07
hmgaudecker Jul 7, 2025
d3bb45e
Backup.
hmgaudecker Jul 7, 2025
39558de
Forgot to expose MainTarget in gettsim namespace.
hmgaudecker Jul 7, 2025
e247c48
Merge branch 'main_targets' into gep-07
hmgaudecker Jul 7, 2025
0122531
Backup
hmgaudecker Jul 7, 2025
f536f28
Draft almost ready for vote.
hmgaudecker Jul 7, 2025
b5ed61b
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jul 7, 2025
da336cc
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jul 7, 2025
a3201f4
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jul 8, 2025
a77866c
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jul 8, 2025
a559a99
Typos.
MImmesberger Jul 8, 2025
808621c
Add html figure to gep 7.
MImmesberger Jul 8, 2025
b1663e4
Update notebook.
MImmesberger Jul 8, 2025
42c3695
Update notebook.
MImmesberger Jul 8, 2025
6ffeac9
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jul 8, 2025
f4734e1
Move and rename example used in GEP 7.
hmgaudecker Jul 8, 2025
23c218d
Update the playground notebook.
hmgaudecker Jul 8, 2025
57f39f3
Incorporate review suggestions.
hmgaudecker Jul 9, 2025
074ea18
Do not make a copy of the policy environment in the example.
hmgaudecker Jul 11, 2025
ed11a86
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jul 15, 2025
cdcdb2c
Update GEP 7 with new model for policy/evaluation dates.
hmgaudecker Jul 15, 2025
90a9487
Use markdown tables instead of html.
hmgaudecker Jul 15, 2025
985b6fc
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jul 16, 2025
bc3de05
Modify examples so that they should work after #1026.
hmgaudecker Jul 16, 2025
022bec9
Merge branch 'collect-components-of-namespaces' into gep-07
MImmesberger Jul 16, 2025
3525917
Add new interface dag html.
MImmesberger Jul 16, 2025
864d5d7
Improve performance of `processed_data` (#1037)
JuergenWiemers Jul 20, 2025
3eada4c
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jul 21, 2025
98e133e
Merge branch 'collect-components-of-namespaces' into gep-07
hmgaudecker Jul 23, 2025
adf3b56
Merge branch 'ocollect-components-of-namespaces' into gep-07
hmgaudecker Jul 23, 2025
18c403c
Add dates and link to resolution.
hmgaudecker Jul 23, 2025
4a93e5b
Add pixi task to build docs.
hmgaudecker Jul 23, 2025
6d12d31
Move interface playground into sandbox directory.
hmgaudecker Jul 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/geps.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,6 @@ maxdepth: 1
../geps/gep-04
../geps/gep-05
../geps/gep-06
../geps/gep-07
../geps/gep-x
```
111 changes: 25 additions & 86 deletions docs/geps/gep-01.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
- * Created
* 2019-11-04
- * Updated
* 2022-03-28
* 2025-07-XX
- * Resolution
* [Accepted](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2001)
```
Expand All @@ -26,35 +26,17 @@ columns, parameters, Python identifiers (functions, variables), etc. should be n
a nutshell and without explanations, these conventions are:

1. Names follow standard Python conventions (`lowercase_with_underscores`).
Abbreviations of words that form a part of these names are always followed by an
underscore, unless it is the last word.

1. Names should be long enough to be readable. However, we impose limits in order to
make GETTSIM usable in languages, which place limits on characters (Stata, in
particular).

- Column names that are typically user-facing have a hard limit of 20 characters.
These columns are documented in `DEFAULT_TARGETS` in `gettsim/config.py`.
- Other column names that users might potentially be interested in have a hard limit
of 32 characters.
- Columns geared at internal use (e.g., helper variables before applying a
favorability check) start with an underscore and there are no restrictions.
Internal variables should be used sparingly.

1. If names need to be concatenated for making clear what a column name refers to (e.g.,
`arbeitslosengeld_2__vermögensfreibetrag_bg` vs.
`grundsicherung__im_alter__vermögensfreibetrag_eg`), the group (i.e., the tax or
transfer) that a variable refers to appears first.

1. Because of the necessity of concatenated column names, there will be conflicts
between readability (1.) and variable length (2.). If such conflicts arise, they need
to be solved on a case by case basis. Consistency across different variants of a
variable names always has to be kept.

1. The language should generally be English in all coding efforts and documentation.
German should be used for all institutional features and directly corresponding
names.

1. The hierarchical naming convention (see {ref}`GEP 6 <gep-6>`) means that
abbreviations should be used only very sparingly.

An abbreviation is always followed by an underscore (unless it is the last word).
Underscores must not be used to separate German words that are pulled together.

1. German identifiers use correct spelling even if it is non-ASCII (this mostly concerns
the letters ä, ö, ü, ß).

Expand Down Expand Up @@ -93,32 +75,17 @@ in English. For column names, we always allow a pure ASCII option, see the next

(gep-1-column-names)=

## Column names (a.k.a. "variables" in Stata)

We impose a hard limit of 20 characters for all column names that typically user-facing.
This is for the benefit of Stata users, who face a strict limit of 32 characters for
their column names. Furthermore, where developers using other languages may store
different experiments in different variables, Stata users' only chance to distinguish
them is to append characters to the column names.

For the same reason, there is a hard limit of 32 characters for variables that users may
reasonably request.

If a column is only present for internal use, it starts with an underscore and there is
no restriction on the number of characters. Internal columns should be used sparingly.
## Column / Policy Function names (a.k.a. "variables" in Stata)

Across variations that include the same identifier, this identifier should not be
changed, even if it leads to long variable names (e.g., `kinderfreib`,
`einkommensteuer__gesamteinkommen_y`). This makes searching for identifiers easier and
less error-prone.

If names need to be concatenated for making clear what a column name refers to (e.g.,
`arbeitslosengeld_2__vermögensfreibetrag_bg` vs.
`grundsicherung__im_alter__vermögensfreibetrag_eg`), the group (i.e., the tax or
transfer) that a variable refers to appears first.
The hierarchical naming convention (see {ref}`GEP 6 <gep-6>`) means that the
highest-level identifier is the type of the programme (e.g., `einkommensteuer` or
`kindergeld`). Very few variables live in the global namespace (e.g., the person
identifier `p_id` or `alter`). A special case is the namespace `familie`, which lives in
the global namespace.

If a column has a reference to a time unit (i.e., any flow variable like earnings or
transfers), a column is indicated by an underscore plus one of {`y`, `m`, `w`, `d`}.
transfers), a column is indicated by an underscore plus one of {`y`, `q`, `m`, `w`,
`d`}.

The default unit a column refers to is an individual. In case of groupings of
individuals, an underscore plus one of {`sn`, `hh`, `fg`, `bg`, `eg`, `ehe`} will
Expand Down Expand Up @@ -152,11 +119,6 @@ Note that households do not include flat shares etc.. Such broader definition ar
currently not relevant in GETTSIM but may be added in the future (e.g., capping rules
for costs of dwelling in SGB II depend on this).

Open questions:

- Can we use `arbeitslosengeld_2__bg_id` for both SGB II and SGB XII at the same time or
do we need to differentiate once we add serious support for SGB XII?

Time unit identifiers always appear before unit identifiers (e.g.,
`arbeitslosengeld_2__betrag_m_bg`).

Expand All @@ -168,12 +130,9 @@ general naming considerations here.
- There is a hierarchical structure to these parameters in that each of them is
associated with a group (e.g., `arbeitslosengeld`, `kinderzuschlag`). These groups or
abbreviations thereof do not re-appear in the name of the parameter.
- Parameter names should be generally be aligned with relevant column names. However,
since the group is not repeated for the parameter, it is often better not to
abbreviate them (e.g., `wohngeld_params["vermögensgrundfreibetrag"]` for the parameter
and `wohngeld__anspruchshöhe_m_wthh` for a column derived from it).
- Parameter names should generally be aligned with relevant column names.

## Other Python identifiers (Functions, Variables)
## Other Python identifiers

Python identifiers should generally be in English, unless they refer to a specific law
or set of laws, which is where the same reasoning applies as above.
Expand All @@ -183,40 +142,15 @@ comprehension or a short loop, `i` might be an acceptable name for the running v
A function that is used in many different places should have a descriptive name.

The name of variables should reflect the content or meaning of the variable and not the
type (i.e., float, int, dict, list, df, array ...). As for column names and parameters,
in some cases it might be useful to append an underscore plus one of {`m`, `w`, `d`} to
indicate the time unit and one of {`sn`, `hh`, `fg`, `bg`, `eg`, `ehe`} to indicate the
unit of aggregation.

## Examples

As an example we can consider the naming of the parameter group `arbeitsl_geld`. The
original name for this group of parameters was the abbreviation `alg`. This will seem
like a suitable candidate for native speakers who are familiar with the German social
security system; the abbreviation is commonly used to refer to this type of unemployment
benefit. However, acronyms are generally not self-explanatory and users unfamiliar with
them will thus not be able to guess their meaning without looking them up.

More meaningful alternatives could be `alo_geld` or `arb_los_geld`. These names use
abbreviations of the compounds of the term "Arbeitslosengeld", which the group name is
supposed to reflect, and connect them in a Pythonic manner through underscores. However,
`alo_geld` still leaves much room for interpretation and `arb_los_geld` separates the
term "Arbeitslosen" in an odd way.

The final choice `arbeitsl_geld` avoids all the disadvantages of the other options as it
is an unambivalent, natural, and minimal abbreviation of the original term it is
supposed to represent.
type (i.e., float, int, dict, list, df, array ...).

## Alternatives

- We worked with abbreviations before, but this hit limits and it led to never-ending
discussions (see `GEP-6 <gep-6>` for some history).
- We considered using more English identifiers, but opted against it because of the lack
of precision and uniqueness (see the example above: How to distinguish between
Erziehungsgeld, Elterngeld, and Elterngeld Plus in English?).
- Use one of the standards for column identifiers. They are not precise enough and
sometimes rather cryptic.
- Do something like EUROMOD and include some hierarchy in column names (e.g. start with
`d_` for demographics). Should not be necessary if column names have clear enough
names. If anything, we would achieve this via a MultiIndex for the columns.

## A final note

Expand Down Expand Up @@ -249,6 +183,9 @@ for that. Quoting from there:

## Discussion

The below refers to older versions of the GEP; it has been updated because
`GEP-6 <gep-6>` made much of the original content obsolete.

- GitHub PR: <https://github.com/iza-institute-of-labor-economics/gettsim/pull/60>
- Discussion on provisional acceptance:
<https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2001/near/189539859>
Expand All @@ -257,6 +194,8 @@ for that. Quoting from there:
- GitHub PR for second update (concatenated column names, dealing with conflicting
objectives, names for columns vs parameters):
<https://github.com/iza-institute-of-labor-economics/gettsim/pull/342>
- GitHub PR for third update (changes because of `GEP-6 <gep-6>`):
<https://github.com/iza-institute-of-labor-economics/gettsim/pull/855>

## Copyright

Expand Down
87 changes: 42 additions & 45 deletions docs/geps/gep-02.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,15 @@
* Standards Track
- * Created
* 2022-03-28
- * Updated
* 2025-07-xx
- * Resolution
* [Accepted](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2002)
```

## Abstract

This GEP lays out how GETTSIM stores the user-provided data (be it from the SOEP, EVS,
This GEP lays out how GETTSIM stores user-provided data (be it from the SOEP, EVS,
example individuals, ...) and passes it around to the functions calculating taxes and
transfers.

Expand All @@ -26,20 +28,24 @@ in the data provided by the user (if it comes in the form of a DataFrame) or cal
by GETTSIM. All these arrays have the same length. This length corresponds to the number
of individuals. Functions operate on a single row of data.

If a column name is `[x]_id` with `x` {math}`\in \{` `_hh`, `_bg`, `_fg`, `_ehe`, `_eg`,
`_sn` {math}`\}`, it will be the same for all households, Bedarfsgemeinschaften, or any
Arrays are stored in a nested dictionary (a pytree). One level of the dictionary is
called a *namespace*. Its innermost level is called a *leaf name*. The data columns are
called *leaves*.

If a leaf name is `[x]_id` with `id` {math}`\in \{` `hh`, `bg`, `fg`, `ehe`, `eg`, `sn`,
`wthh` {math}`\}`, it will be the same for all households, Bedarfsgemeinschaften, or any
other grouping of individuals specified in {ref}`GEP 1 <gep-1-column-names>`.

Any other column name ending in `_id` indicates a link to a different individual (e.g.,
child-parent relations could be `parent_0_ind_id`, `parent_1_ind_id`; receiver of child
benefits would be `kindergeldempf_id`).
Any leaf name `p_id_[y]` indicates a link to a different individual (e.g., child-parent
are specified via `(familie, p_id_elternteil_1)`, `(familie, p_id_elternteil_2)`; the
recipient of child benefits would be `(kindergeld, p_id_empfänger)`).

## Motivation and Scope

Taxes and transfers are calculated at different levels of aggregation: Individuals,
couples, families, households. Sometimes, relations between individuals are important:
parents and children, payors/receivers of alimony payments, which parent receives the
`kindergeld` payments, etc..
parents and children, payors/recipients of alimony payments, which parent receives
Kindergeld payments, etc..

Potentially, there are many ways of storing these data: Long form, wide form,
collections of tables adhering to
Expand All @@ -54,7 +60,7 @@ N-dimensional arrays, etc.. As usual, everything involves trade-offs, for exampl
- Almost all functions are much easier to implement when working with a single row. This
is most important for the typical user and increasing the number of developers.

- Modern tools for vectorization (e.g., Jax) scale best when working with single rows of
- Modern tools for vectorization (e.g., JAX) scale best when working with single rows of
data.

Aggregation to groups of individuals (households, Bedarfsgemeinschaften,...) or
Expand All @@ -67,56 +73,45 @@ This is primarily internal, i.e., only relevant for developers as the highest-le
interface can be easily adjusted. The default way to receive data will be one Pandas
DataFrame.

Users are affected only via the interface of lower-level functions. Under the proposed
implementation, they will always work on single rows of data. Many alternatives would
require users to write vectorised code, making filtering operations more cumbersome. For
aggregation or referencing other individuals' data, GETTSIM will provide functions that
allow abstracting from implementation details, see
{ref}`below <gep-2-aggregation-functions>`.
Users are affected only via the interface of lower-level functions. Functions will
always work on single rows of data. Many alternatives would require users to write
vectorised code, making filtering operations more cumbersome. For aggregation or
referencing other individuals' data, GETTSIM will provide functions that allow
abstracting from implementation details, see {ref}`below <gep-2-aggregation-functions>`.

## Detailed description

The following discussion assumes that data is passed in as a Pandas DataFrame. It will
be possible to pass data directly in the form that GETTSIM requires it internally. In
that case, only the relevant steps apply.

- GETTSIM will first make a check that all identifiers pointing to other individuals
(e.g., `kindergeldempf_id`) are valid.

- GETTSIM will then create internal identifiers for individuals, households, and tax
units. GETTSIM will also generate appropriate columns with identifiers pointing to
other individuals. Columns with the original values are stored.
- GETTSIM may make a check that all identifiers pointing to other individuals (e.g.,
`(kindergeld, p_id_empfänger)`) are valid.

All internal identifiers are integers starting at 0 and counting in increments of 1.
For individuals, they are sorted, implying they can be used to index into the arrays.
It also means that identifiers pointing to other individuals can be used directly for
indexing.
- GETTSIM may make a check that there is no variation within a group of individuals if
the column name indicates that there must not be (e.g., all members sharing the same
`hh_id` must have the same `anzahl_personen_hh` in case the variable is provided as an
input column).

Because groups of individuals are not necessarily nested (e.g., joint taxation during
- Because groups of individuals are not necessarily nested (e.g., joint taxation during
separation phase but living in different households), they cannot be sorted in
general. In case users know their data allows sorting on all groups (i.e., all groups
have a nesting structure), they will be able to provide a `data_is_sorted` flag, which
defaults to `False`.
general.

- The core of GETTSIM works with a collection of 1-d arrays, all of which have the same
length as the number of individuals.

These arrays form the nodes of its DAG computation engine (see {ref}`GEP 4 <gep-4>`).

- GETTSIM returns an object of the same type and with the same identifiers that was
- GETTSIM returns an object of the same type and with the same row identifiers that was
passed by the user.

- GETTSIM strives to show errors along with the original indices, but this may not
always be possible.

(gep-2-aggregation-functions)=

### Grouped values and aggregation functions

Often columns refer to groups of individuals. Such columns have a suffix indicating the
group (see {ref}`GEP 1 <gep-1-column-names>`, currently `_hh`, `_bg`, `_fg`, `_ehe`,
`_eg`, and `_sn`). These columns' values will be repeated for all individuals who form
part of a group.
group (see {ref}`GEP 1 <gep-1-column-names>`). These columns' values will be repeated
for all individuals who form part of a group.

By default, GETTSIM will check consistency on input columns in this respect. Users will
be able to turn this check off.
Expand All @@ -132,20 +127,16 @@ Aggregation functions will be provided by GETTSIM.
- As outlined in {ref}`GEP 4 <gep-4-aggregation-by-group-functions>` users will need to
specify:

- The stringified name of the aggregated variable. This **must** end with a feasible
unit of aggregation, i.e., `_hh`, `_bg`, `_fg`, `_ehe`, `_eg`, or `_sn`
- The stringified name of the original variable.
- The type of aggregation {math}`\in \{` `sum`, `mean`, `max`, `min`, `any` {math}`\}`
- The name of the aggregated variable. This **must** end with a feasible unit of
aggregation, e.g., `_hh` or `_ehe`.
- The type of aggregation {math}`\in \{` `count`, `sum`, `mean`, `max`, `min`, `any`,
`all`, {math}`\}`
- The name of the original variable (not relevant for `count`)

Note that as per {ref}`GEP 4 <gep-4-aggregation-by-group-functions>`, sums will be
calculated implicitly if the graph contains a column `my_col` and an aggregate such as
`my_col_hh` is requested somewhere.

Note that the groups `tu` and `hh` may change in the future. Some might also be
calculated via relations between household members, see
[discussion](https://gettsim.zulipchat.com/#narrow/stream/224837-High-Level-Architecture/topic/Update.20Data.20Structures/near/180917151)
on Zulip in this respect.

## Alternatives

Versions 0.3 -- 0.4 of GETTSIM used a collection of pandas Series. This proved to be
Expand All @@ -157,13 +148,19 @@ households like
\[here\](<https://www.tensorflow.org/api_docs/python/tf/math/segment_sum>) would have
led to many merge-like operations in user functions.

Versions 0.5 -- 0.7 of GETTSIM used flat collections of pandas Series. As the scope and
detail of GETTSIM grew, maintaining uniqueness of column names across different areas of
taxes and transfers became too difficult.

## Discussion

- Some
[discussion on Zulip](https://gettsim.zulipchat.com/#narrow/stream/224837-High-Level-Architecture/topic/Update.20Data.20Structures/near/180917151)
re data structures.
- Zulip stream for
[GEP 2](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2001/near/189539859).
- GitHub PR for update (changes because of `GEP-6 <gep-6>`):
<https://github.com/iza-institute-of-labor-economics/gettsim/pull/855>

## Copyright

Expand Down
Loading