diff --git a/docs/geps.md b/docs/geps.md index fb21657a0..903d38eff 100644 --- a/docs/geps.md +++ b/docs/geps.md @@ -20,5 +20,6 @@ maxdepth: 1 ../geps/gep-04 ../geps/gep-05 ../geps/gep-06 +../geps/gep-07 ../geps/gep-x ``` diff --git a/docs/geps/gep-01.md b/docs/geps/gep-01.md index 1c524812a..793e842ef 100644 --- a/docs/geps/gep-01.md +++ b/docs/geps/gep-01.md @@ -14,7 +14,7 @@ - * Created * 2019-11-04 - * Updated - * 2022-03-28 + * 2025-07-23 - * Resolution * [Accepted](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2001) ``` @@ -26,35 +26,17 @@ columns, parameters, Python identifiers (functions, variables), etc. should be n a nutshell and without explanations, these conventions are: 1. Names follow standard Python conventions (`lowercase_with_underscores`). - Abbreviations of words that form a part of these names are always followed by an - underscore, unless it is the last word. - -1. Names should be long enough to be readable. However, we impose limits in order to - make GETTSIM usable in languages, which place limits on characters (Stata, in - particular). - - - Column names that are typically user-facing have a hard limit of 20 characters. - These columns are documented in `DEFAULT_TARGETS` in `gettsim/config.py`. - - Other column names that users might potentially be interested in have a hard limit - of 32 characters. - - Columns geared at internal use (e.g., helper variables before applying a - favorability check) start with an underscore and there are no restrictions. - Internal variables should be used sparingly. - -1. If names need to be concatenated for making clear what a column name refers to (e.g., - `arbeitslosengeld_2__vermögensfreibetrag_bg` vs. - `grundsicherung__im_alter__vermögensfreibetrag_eg`), the group (i.e., the tax or - transfer) that a variable refers to appears first. - -1. Because of the necessity of concatenated column names, there will be conflicts - between readability (1.) and variable length (2.). If such conflicts arise, they need - to be solved on a case by case basis. Consistency across different variants of a - variable names always has to be kept. 1. The language should generally be English in all coding efforts and documentation. German should be used for all institutional features and directly corresponding names. +1. The hierarchical naming convention (see {ref}`GEP 6 `) means that + abbreviations should be used only very sparingly. + + An abbreviation is always followed by an underscore (unless it is the last word). + Underscores must not be used to separate German words that are pulled together. + 1. German identifiers use correct spelling even if it is non-ASCII (this mostly concerns the letters ä, ö, ü, ß). @@ -93,32 +75,17 @@ in English. For column names, we always allow a pure ASCII option, see the next (gep-1-column-names)= -## Column names (a.k.a. "variables" in Stata) - -We impose a hard limit of 20 characters for all column names that typically user-facing. -This is for the benefit of Stata users, who face a strict limit of 32 characters for -their column names. Furthermore, where developers using other languages may store -different experiments in different variables, Stata users' only chance to distinguish -them is to append characters to the column names. - -For the same reason, there is a hard limit of 32 characters for variables that users may -reasonably request. - -If a column is only present for internal use, it starts with an underscore and there is -no restriction on the number of characters. Internal columns should be used sparingly. +## Column / Policy Function names (a.k.a. "variables" in Stata) -Across variations that include the same identifier, this identifier should not be -changed, even if it leads to long variable names (e.g., `kinderfreib`, -`einkommensteuer__gesamteinkommen_y`). This makes searching for identifiers easier and -less error-prone. - -If names need to be concatenated for making clear what a column name refers to (e.g., -`arbeitslosengeld_2__vermögensfreibetrag_bg` vs. -`grundsicherung__im_alter__vermögensfreibetrag_eg`), the group (i.e., the tax or -transfer) that a variable refers to appears first. +The hierarchical naming convention (see {ref}`GEP 6 `) means that the +highest-level identifier is the type of the programme (e.g., `einkommensteuer` or +`kindergeld`). Very few variables live in the global namespace (e.g., the person +identifier `p_id` or `alter`). A special case is the namespace `familie`, which lives in +the global namespace. If a column has a reference to a time unit (i.e., any flow variable like earnings or -transfers), a column is indicated by an underscore plus one of {`y`, `m`, `w`, `d`}. +transfers), a column is indicated by an underscore plus one of {`y`, `q`, `m`, `w`, +`d`}. The default unit a column refers to is an individual. In case of groupings of individuals, an underscore plus one of {`sn`, `hh`, `fg`, `bg`, `eg`, `ehe`} will @@ -152,11 +119,6 @@ Note that households do not include flat shares etc.. Such broader definition ar currently not relevant in GETTSIM but may be added in the future (e.g., capping rules for costs of dwelling in SGB II depend on this). -Open questions: - -- Can we use `arbeitslosengeld_2__bg_id` for both SGB II and SGB XII at the same time or - do we need to differentiate once we add serious support for SGB XII? - Time unit identifiers always appear before unit identifiers (e.g., `arbeitslosengeld_2__betrag_m_bg`). @@ -168,12 +130,9 @@ general naming considerations here. - There is a hierarchical structure to these parameters in that each of them is associated with a group (e.g., `arbeitslosengeld`, `kinderzuschlag`). These groups or abbreviations thereof do not re-appear in the name of the parameter. -- Parameter names should be generally be aligned with relevant column names. However, - since the group is not repeated for the parameter, it is often better not to - abbreviate them (e.g., `wohngeld_params["vermögensgrundfreibetrag"]` for the parameter - and `wohngeld__anspruchshöhe_m_wthh` for a column derived from it). +- Parameter names should generally be aligned with relevant column names. -## Other Python identifiers (Functions, Variables) +## Other Python identifiers Python identifiers should generally be in English, unless they refer to a specific law or set of laws, which is where the same reasoning applies as above. @@ -183,40 +142,15 @@ comprehension or a short loop, `i` might be an acceptable name for the running v A function that is used in many different places should have a descriptive name. The name of variables should reflect the content or meaning of the variable and not the -type (i.e., float, int, dict, list, df, array ...). As for column names and parameters, -in some cases it might be useful to append an underscore plus one of {`m`, `w`, `d`} to -indicate the time unit and one of {`sn`, `hh`, `fg`, `bg`, `eg`, `ehe`} to indicate the -unit of aggregation. - -## Examples - -As an example we can consider the naming of the parameter group `arbeitsl_geld`. The -original name for this group of parameters was the abbreviation `alg`. This will seem -like a suitable candidate for native speakers who are familiar with the German social -security system; the abbreviation is commonly used to refer to this type of unemployment -benefit. However, acronyms are generally not self-explanatory and users unfamiliar with -them will thus not be able to guess their meaning without looking them up. - -More meaningful alternatives could be `alo_geld` or `arb_los_geld`. These names use -abbreviations of the compounds of the term "Arbeitslosengeld", which the group name is -supposed to reflect, and connect them in a Pythonic manner through underscores. However, -`alo_geld` still leaves much room for interpretation and `arb_los_geld` separates the -term "Arbeitslosen" in an odd way. - -The final choice `arbeitsl_geld` avoids all the disadvantages of the other options as it -is an unambivalent, natural, and minimal abbreviation of the original term it is -supposed to represent. +type (i.e., float, int, dict, list, df, array ...). ## Alternatives +- We worked with abbreviations before, but this hit limits and it led to never-ending + discussions (see `GEP-6 ` for some history). - We considered using more English identifiers, but opted against it because of the lack of precision and uniqueness (see the example above: How to distinguish between Erziehungsgeld, Elterngeld, and Elterngeld Plus in English?). -- Use one of the standards for column identifiers. They are not precise enough and - sometimes rather cryptic. -- Do something like EUROMOD and include some hierarchy in column names (e.g. start with - `d_` for demographics). Should not be necessary if column names have clear enough - names. If anything, we would achieve this via a MultiIndex for the columns. ## A final note @@ -249,6 +183,9 @@ for that. Quoting from there: ## Discussion +The below refers to older versions of the GEP; it has been updated because +`GEP-6 ` made much of the original content obsolete. + - GitHub PR: - Discussion on provisional acceptance: @@ -257,6 +194,8 @@ for that. Quoting from there: - GitHub PR for second update (concatenated column names, dealing with conflicting objectives, names for columns vs parameters): +- GitHub PR for third update (changes because of `GEP-6 `): + ## Copyright diff --git a/docs/geps/gep-02.md b/docs/geps/gep-02.md index ad012675e..573c5bc16 100644 --- a/docs/geps/gep-02.md +++ b/docs/geps/gep-02.md @@ -11,13 +11,15 @@ * Standards Track - * Created * 2022-03-28 +- * Updated + * 2025-07-23 - * Resolution * [Accepted](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2002) ``` ## Abstract -This GEP lays out how GETTSIM stores the user-provided data (be it from the SOEP, EVS, +This GEP lays out how GETTSIM stores user-provided data (be it from the SOEP, EVS, example individuals, ...) and passes it around to the functions calculating taxes and transfers. @@ -26,20 +28,24 @@ in the data provided by the user (if it comes in the form of a DataFrame) or cal by GETTSIM. All these arrays have the same length. This length corresponds to the number of individuals. Functions operate on a single row of data. -If a column name is `[x]_id` with `x` {math}`\in \{` `_hh`, `_bg`, `_fg`, `_ehe`, `_eg`, -`_sn` {math}`\}`, it will be the same for all households, Bedarfsgemeinschaften, or any +Arrays are stored in a nested dictionary (a pytree). One level of the dictionary is +called a *namespace*. Its innermost level is called a *leaf name*. The data columns are +called *leaves*. + +If a leaf name is `[x]_id` with `id` {math}`\in \{` `hh`, `bg`, `fg`, `ehe`, `eg`, `sn`, +`wthh` {math}`\}`, it will be the same for all households, Bedarfsgemeinschaften, or any other grouping of individuals specified in {ref}`GEP 1 `. -Any other column name ending in `_id` indicates a link to a different individual (e.g., -child-parent relations could be `parent_0_ind_id`, `parent_1_ind_id`; receiver of child -benefits would be `kindergeldempf_id`). +Any leaf name `p_id_[y]` indicates a link to a different individual (e.g., child-parent +are specified via `(familie, p_id_elternteil_1)`, `(familie, p_id_elternteil_2)`; the +recipient of child benefits would be `(kindergeld, p_id_empfänger)`). ## Motivation and Scope Taxes and transfers are calculated at different levels of aggregation: Individuals, couples, families, households. Sometimes, relations between individuals are important: -parents and children, payors/receivers of alimony payments, which parent receives the -`kindergeld` payments, etc.. +parents and children, payors/recipients of alimony payments, which parent receives +Kindergeld payments, etc.. Potentially, there are many ways of storing these data: Long form, wide form, collections of tables adhering to @@ -54,7 +60,7 @@ N-dimensional arrays, etc.. As usual, everything involves trade-offs, for exampl - Almost all functions are much easier to implement when working with a single row. This is most important for the typical user and increasing the number of developers. -- Modern tools for vectorization (e.g., Jax) scale best when working with single rows of +- Modern tools for vectorization (e.g., JAX) scale best when working with single rows of data. Aggregation to groups of individuals (households, Bedarfsgemeinschaften,...) or @@ -67,12 +73,11 @@ This is primarily internal, i.e., only relevant for developers as the highest-le interface can be easily adjusted. The default way to receive data will be one Pandas DataFrame. -Users are affected only via the interface of lower-level functions. Under the proposed -implementation, they will always work on single rows of data. Many alternatives would -require users to write vectorised code, making filtering operations more cumbersome. For -aggregation or referencing other individuals' data, GETTSIM will provide functions that -allow abstracting from implementation details, see -{ref}`below `. +Users are affected only via the interface of lower-level functions. Functions will +always work on single rows of data. Many alternatives would require users to write +vectorised code, making filtering operations more cumbersome. For aggregation or +referencing other individuals' data, GETTSIM will provide functions that allow +abstracting from implementation details, see {ref}`below `. ## Detailed description @@ -80,43 +85,33 @@ The following discussion assumes that data is passed in as a Pandas DataFrame. I be possible to pass data directly in the form that GETTSIM requires it internally. In that case, only the relevant steps apply. -- GETTSIM will first make a check that all identifiers pointing to other individuals - (e.g., `kindergeldempf_id`) are valid. - -- GETTSIM will then create internal identifiers for individuals, households, and tax - units. GETTSIM will also generate appropriate columns with identifiers pointing to - other individuals. Columns with the original values are stored. +- GETTSIM may make a check that all identifiers pointing to other individuals (e.g., + `(kindergeld, p_id_empfänger)`) are valid. - All internal identifiers are integers starting at 0 and counting in increments of 1. - For individuals, they are sorted, implying they can be used to index into the arrays. - It also means that identifiers pointing to other individuals can be used directly for - indexing. +- GETTSIM may make a check that there is no variation within a group of individuals if + the column name indicates that there must not be (e.g., all members sharing the same + `hh_id` must have the same `anzahl_personen_hh` in case the variable is provided as an + input column). - Because groups of individuals are not necessarily nested (e.g., joint taxation during +- Because groups of individuals are not necessarily nested (e.g., joint taxation during separation phase but living in different households), they cannot be sorted in - general. In case users know their data allows sorting on all groups (i.e., all groups - have a nesting structure), they will be able to provide a `data_is_sorted` flag, which - defaults to `False`. + general. - The core of GETTSIM works with a collection of 1-d arrays, all of which have the same length as the number of individuals. These arrays form the nodes of its DAG computation engine (see {ref}`GEP 4 `). -- GETTSIM returns an object of the same type and with the same identifiers that was +- GETTSIM returns an object of the same type and with the same row identifiers that was passed by the user. -- GETTSIM strives to show errors along with the original indices, but this may not - always be possible. - (gep-2-aggregation-functions)= ### Grouped values and aggregation functions Often columns refer to groups of individuals. Such columns have a suffix indicating the -group (see {ref}`GEP 1 `, currently `_hh`, `_bg`, `_fg`, `_ehe`, -`_eg`, and `_sn`). These columns' values will be repeated for all individuals who form -part of a group. +group (see {ref}`GEP 1 `). These columns' values will be repeated +for all individuals who form part of a group. By default, GETTSIM will check consistency on input columns in this respect. Users will be able to turn this check off. @@ -132,20 +127,16 @@ Aggregation functions will be provided by GETTSIM. - As outlined in {ref}`GEP 4 ` users will need to specify: - - The stringified name of the aggregated variable. This **must** end with a feasible - unit of aggregation, i.e., `_hh`, `_bg`, `_fg`, `_ehe`, `_eg`, or `_sn` - - The stringified name of the original variable. - - The type of aggregation {math}`\in \{` `sum`, `mean`, `max`, `min`, `any` {math}`\}` + - The name of the aggregated variable. This **must** end with a feasible unit of + aggregation, e.g., `_hh` or `_ehe`. + - The type of aggregation {math}`\in \{` `count`, `sum`, `mean`, `max`, `min`, `any`, + `all`, {math}`\}` + - The name of the original variable (not relevant for `count`) Note that as per {ref}`GEP 4 `, sums will be calculated implicitly if the graph contains a column `my_col` and an aggregate such as `my_col_hh` is requested somewhere. -Note that the groups `tu` and `hh` may change in the future. Some might also be -calculated via relations between household members, see -[discussion](https://gettsim.zulipchat.com/#narrow/stream/224837-High-Level-Architecture/topic/Update.20Data.20Structures/near/180917151) -on Zulip in this respect. - ## Alternatives Versions 0.3 -- 0.4 of GETTSIM used a collection of pandas Series. This proved to be @@ -157,6 +148,10 @@ households like \[here\]() would have led to many merge-like operations in user functions. +Versions 0.5 -- 0.7 of GETTSIM used flat collections of pandas Series. As the scope and +detail of GETTSIM grew, maintaining uniqueness of column names across different areas of +taxes and transfers became too difficult. + ## Discussion - Some @@ -164,6 +159,8 @@ led to many merge-like operations in user functions. re data structures. - Zulip stream for [GEP 2](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2001/near/189539859). +- GitHub PR for update (changes because of `GEP-6 `): + ## Copyright diff --git a/docs/geps/gep-03.md b/docs/geps/gep-03.md index 11b439705..d7a5dcbe3 100644 --- a/docs/geps/gep-03.md +++ b/docs/geps/gep-03.md @@ -12,7 +12,7 @@ - * Created * 2022-03-28 - * Updated - * 2024-11-21 + * 2025-07-23 - * Resolution * [Accepted](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2003) ``` @@ -33,21 +33,20 @@ sources of these parameters requires particular care. ## Usage and Impact GETTSIM developers should closely look at the Section {ref}`gep-3-structure-yaml-files` -before adding new parameters. +before adding new parameters. Some validation happens via the pre-commit hooks, but that +cannot catch all inconsistencies. (gep-3-structure-yaml-files)= ## Structure of the YAML files Each YAML file contains a number of parameters at the outermost level of indentation. -Each of these parameters in turn is a dictionary with at least three keys: `name`, -`description`, and the `YYYY-MM-DD`-formatted date on which it first took effect. Values -usually change over time; each time a value is changed, another `YYYY-MM-DD` entry is -added. +Each of these parameters is a dictionary with at least 6 keys: `name`, `description`, +`unit`, `reference_period`, `type` and the `YYYY-MM-DD`-formatted date on which it first +took effect. -Some keys at the outermost level refer to functions of the taxes and transfers system. -These work differently and they are -{ref}`treated separately below `. +Values usually change over time; each time a value is changed, another `YYYY-MM-DD` +entry is added. Beyond that, no additional keys are allowed. 1. The `name` key has two sub-keys `de` and `en`, which are @@ -56,13 +55,13 @@ These work differently and they are - not sentences; - correctly capitalised. - Example (from `arbeitsl_geld_2`): + Example (from `kindergeld`): ```yaml - parameter_anrechnungsfreies_einkommen_ohne_kinder_in_bg: + altersgrenze: name: - de: Anrechnungsfreies Einkommen - en: Income shares not subject to transfer withdrawal + de: Alter, ab dem Kindergeld nicht mehr gezahlt wird. + en: Age at which child benefit is no longer paid. ``` 1. The `description` key has two sub-keys `de` and `en`, which @@ -78,89 +77,119 @@ These work differently and they are Example: ```yaml - parameter_anrechnungsfreies_einkommen_ohne_kinder_in_bg: - + altersgrenze: description: de: >- - Einkommensanteile, die anrechnungsfrei bleiben. § 30 SGB II. Seit 01.10.2005 zudem - definiert durch Freibetrag in § 11 SGB II, siehe auch § 67 SGB II. Seit 01.04.2011 - § 11b (2) SGB II (neugefasst durch B. v. 13.05.2011 BGBl. I S. 850. Artikel 2 - G. v. 24.03.2011 BGBl. I S. 453). + § 32 Art. 2-4 EStG. + Für minderjährige Kinder besteht ohne Bedingungen ein Anspruch auf Kindergeld. + Auch für erwachsene Kinder kann bis zu einer Altersgrenze unter bestimmten + Bedingungen ein Anspruch auf Kindergeld bestehen. en: >- - Income shares which do not lead to tapering of benefits. + § 32 Art. 2-4 EStG. + Underage children are entitled to child benefit without any conditions. Also adult + children up to a specified age are entitled to child benefit under certain + conditions. ``` 1. The `unit` key informs on the unit of the values (Euro or DM if monetary, share of some other value, ...). - - In rare cases (e.g. child benefit age threshold), it might be omitted. + - In some cases (e.g., factor for the calculation of the marginal employment + threshold), there is no unit. + - It should be capitalised. - - Some values used at this point: `Euro`, `DM`, `Share`, `Percent`, `Factor`, `Year`, - `Month`, `Hour`, `Square Meter`, `Euro / Square Meter`. - - The `unit` key may be overridden at lower levels. For example, the unit will - typically be `Euro` for monetary quantities. For the years prior to its - introduction, it may be specified as `DM`. + + - Possible values: + + - `Euros`, + - `DM`, + - `Share`, + - `Percent`, + - `Years`, + - `Months`, + - `Hours`, + - `Square Meters`, + - `Euros / Square Meter`, + - *None*. Example: ```yaml - kindergeld: + altersgrenze: name: - de: Kindergeld, Betrag je nach Reihenfolge der Kinder. - unit: Euros + de: Alter, ab dem Kindergeld nicht mehr gezahlt wird. + unit: Euro ``` -1. The (optional) `type` key may contain a reference to a particular function that is - implemented. Examples are `piecewise_linear` or `piecewise_quadratic` +1. The `type` key signals to GETTSIM how the parameter is to be interpreted. It must be + specified as one of: -1. The (optional) `reference_period` key informs on the reference period of the values, - if applicable + - `scalar`, + - `dict`, + - `piecewise_constant`, + - `piecewise_linear`, + - `piecewise_quadratic`, + - `piecewise_cubic`, + - `birth_month_based_phase_inout` + - `birth_year_based_phase_inout`, + - `require_converter`, - Possible values: - `Year` - `Month` - `Week` - `Day` + `scalar` is self-explanatory; `dict` must be a homogeneous dictionary with string or + integer keys and scalar values (int, float, bool). - Example: + `piecewise_constant`, `piecewise_linear`, `piecewise_quadratic`, `piecewise_cubic` + will be converted automatically to be used with the `piecewise_polynomial` function. - ```yaml - kindergeld_stundengrenze: - name: - de: Wochenstundengrenze für Kindergeldanspruch - [...] - reference_period: Week - ``` + `birth_month_based_phase_inout` and `birth_year_based_phase_inout` are used to phase + in or out a parameter based on the birth year of the individual. They are + automatically converted to be used as `ConsecutiveIntLookupTableParamValue` objects. -(gep-3-access_prior_parameters)= + `require_converter` can be anything. However there must be a converter function in + the codebase. -6. The (optional) `access_prior_parameters` can be used to make the parameter of a - previous point in time (relative to the date specified in - {func}`set_up_policy_environment `) - available within GETTSIM functions. It requires the `reference_period` (one of - `Year`, `Month`, `Week`, `Day`) and the `number_of_lags`. +1. The `reference_period` key informs on the reference period of the values, if + applicable. Possible values: - Example: + - `Year`, + - `Quarter`, + - `Month`, + - `Week`, + - `Day`, + - `Hour`, + - *None* + +1. The optional `add_jahresanfang` can be used to make the parameter that is relevant at + the start of the year (relative to the date for which the policy environment is set + up) available to GETTSIM functions. + + If specified, two parameters will be available: + + ``` + ("path", "to", "parameter") + ("path", "to", "parameter_jahresanfang") + ``` + + Example from `sozialversicherung` / `arbeitslosen` / `beitragssatz.yaml`: ```yaml - rentenwert: + beitragssatz: name: - de: Rentenwerte alte und neue Bundesländer. - [...] - access_prior_parameters: - - reference_period: Year - - number_of_lags: 1 + de: Beitragssatz zur Arbeitslosenversicherung + unit: Share + reference_period: null + type: scalar + add_jahresanfang: true ``` 1. The YYYY-MM-DD key(s) - - hold all historical values for a specific parameter or set of parameters in the - `value` subkey; - - is present with `value: null` if a parameter ceases to exist starting on a - particular date; + - hold all historical values for a specific parameter or set of parameters in + dictionaries - contain a precise reference to the law in the `reference` subkey; - may add additional descriptions in the `note` key; - - may give hints towards the type of function they refer to via the `type` subkey; - - may include formulas if the law does; - - may reference other parameters as described below. - - may contain a `unit` subkey, which overrides the `unit` key mentioned in 3. (mostly - relevant for DM / Euro) + - is present with a note or reference only if a parameter ceases to exist starting on + a particular date; + - in case of a `scalar` type, the key of the scalar is `value`. The remainder of this section explains this element in much more detail. @@ -181,11 +210,12 @@ These work differently and they are Example: ```yaml -parameter_anrechnungsfreies_einkommen_ohne_kinder_in_bg: +beitragssatz: name: - de: Anrechnungsfreie Einkommensanteile - 2005-01-01: - reference: Artikel 1. G. v. 24.12.2003 BGBl. I S. 2954. + de: Beitragssatz zur Arbeitslosenversicherung + 2019-01-01: + value: 0.0125 + reference: V. v. 21.12.2018 BGBl. I S. 2663 ``` ### The `note` key of [YYYY-MM-DD] @@ -193,37 +223,41 @@ parameter_anrechnungsfreies_einkommen_ohne_kinder_in_bg: This optional key may contain a free-form note holding any information that may be relevant for the interpretation of the parameter, the implementer, user, ... -(gep-3-deviation_from)= +```yaml +beitragssatz: + name: + de: Beitragssatz zur Arbeitslosenversicherung + 2019-01-01: + value: 0.0125 + reference: V. v. 21.12.2018 BGBl. I S. 2663 + note: >- + Set to 0.013 in Art. 2 Nr. 15 G. v. 18.12.2018 BGBl. I S. 2651. Temporarily + reduced to 0.0125 in BeiSaV 2019. +``` -### The `deviation_from` key of [YYYY-MM-DD] +### The `updates_previous` key of [YYYY-MM-DD] Often laws change only part of a parameter. To avoid error-prone code duplication, we -allow for such cases via the `deviation_from` key. This is the reason why lists are to -be avoided in the value key (see the `piecewise_linear` function above). +allow for such cases via `updates_previous` key. -The key could either reference another value explicitly: +This must not be used with a scalar parameter type. Furthermore, it cannot be used in +the first period a parameter is defined. -```yaml -parameter_anrechnungsfreies_einkommen_mit_kindern_in_bg: - name: - de: Abweichende anrechnungsfreie Einkommensanteile falls Kinder im Haushalt - 2005-10-01: - deviation_from: arbeitsl_geld_2.parameter_anrechnungsfreies_einkommen_ohne_kinder_in_bg - 3: - upper_threshold: 1500 -``` - -A special keyword is `previous`, which just refers to the set of values in the previous -law change. +Example from `sozialversicherung` / `minijob.yaml`: ```yaml -parameter_anrechnungsfreies_einkommen_ohne_kinder_in_bg: +minijobgrenze_ost_west_unterschied name: - de: Anrechnungsfreie Einkommensanteile - 2011-04-01: - deviation_from: previous - 2: - upper_threshold: 1000 + de: Minijobgrenze + unit: Euros + reference_period: Month + type: dict + 1997-01-01: + west: 312 + ost: 266 + 1998-01-01: + updates_previous: true + west: 317 ``` ### The values of [YYYY-MM-DD] @@ -237,121 +271,401 @@ The following walks through several cases. - The simplest case is a single parameter, which should be specified as: ```yaml - kindergeld_stundengrenze: + minijobgrenze: name: - de: Wochenstundengrenze für Kindergeldanspruch - 2012-01-01: - value: 20 + de: Minijobgrenze + en: Thresholds for marginal employment (minijobs) + description: + de: Minijob § 8 (1) Nr. 1 SGB IV + en: Minijob § 8 (1) Nr. 1 SGB IV + unit: Euros + reference_period: Month + type: scalar + 1984-01-01: + value: 199 + 1985-01-01: + value: 205 + 1986-01-01: + value: 210 + 1987-01-01: + value: 220 + 1988-01-01: + value: 225 + 1989-01-01: + value: 230 + 1990-01-01: + note: >- + Minijobgrenze differs between West and East Germany. See + ``parameter_minijobgrenze_ost_west_unterschied``. + 2000-01-01: + value: 322 + 2002-01-01: + value: 325 + 2003-04-01: + value: 400 + 2013-01-01: + value: 450 + 2022-10-01: + note: Minijob thresholds now calculated based on statutory minimum wage + reference: Art. 7 G. v. 28.06.2022 BGBl. I S. 969 ``` -- There could be a dictionary, potentially nested: + Note that there are different "active periods" for this parameter. The first one lasts + from 1984-01-01 to 1989-12-31, after which there were different values in East and + West Germany. from 2000-01-01 until 2022-10-01, the parameter is active again. After + that, it is superseded by a formula based on the statutory minimum wage. + +- There could be a dictionary, which has to be homogenous in the keys (integers or + strings) and values (scalar floating point numbers, integers, or Booleans): ```yaml - exmin: + minijobgrenze_ost_west_unterschied: name: - de: Höhen des Existenzminimums, festgelegt im Existenzminimumsbericht der Bundesregierung. - 2005-01-01: - regelsatz: - single: 4164 - paare: 7488 - kinder: 2688 - kosten_der_unterkunft: - single: 2592 - paare: 3984 - kinder: 804 - heizkosten: - single: 600 - paare: 768 - kinder: 156 + de: Minijobgrenze, unterschiedlich in Ost und West + en: Thresholds for marginal employment (minijobs), different in East and West + description: + de: Minijob § 8 (1) Nr. 1 SGB IV + en: Minijob § 8 (1) Nr. 1 SGB IV + unit: Euros + reference_period: Month + type: dict + 1990-01-01: + west: 240 + ost: 102 + 1991-01-01: + west: 245 + ost: 120 + 1992-01-01: + west: 256 + ost: 153 + 1993-01-01: + west: 271 + ost: 199 + 1994-01-01: + west: 286 + ost: 225 + 1995-01-01: + west: 297 + ost: 240 + 1996-01-01: + west: 302 + ost: 256 + 1997-01-01: + west: 312 + ost: 266 + 1998-01-01: + updates_previous: true + west: 317 + 1999-01-01: + west: 322 + ost: 271 + 2000-01-01: + note: >- + Minijob thresholds do not differ between West and East Germany. See + `minijobgrenze_m`. ``` - In some cases, a dictionary with numbered keys makes sense. It is important to use - these, not lists! + these, not lists! The reason is that we always allow for the `note` and `reference` + keys to be present. ```yaml - kindergeld: + satz_gestaffelt: name: - de: Kindergeld, Betrag je nach Reihenfolge der Kinder. - 1975-01-01: - 1: 26 - 2: 36 - 3: 61 - 4: 61 + de: Kindergeld pro Kind, Betrag je nach Reihenfolge der Kinder. + en: Child benefit amount, depending on succession of children. + description: + de: >- + § 66 (1) EStG. Identische Werte in §6 (1) BKGG, diese sind aber nur für beschränkt + Steuerpflichtige relevant (d.h. Ausländer mit Erwerbstätigkeit in Deutschland). + Für Werte vor 2002, siehe 'BMF - Datensammlung zur Steuerpolitik' + en: null + unit: Euros + reference_period: Month + type: dict + 2002-01-01: + 1: 154 + 2: 154 + 3: 154 + 4: 179 + 2009-01-01: + reference: Art. 1 G. v. 22.12.2008 BGBl. I S. 2955 + 1: 164 + 2: 164 + 3: 170 + 4: 195 ``` - Another example would be referring to the parameters of a piecewise linear function: - > ```yaml - > parameter_anrechnungsfreies_einkommen_ohne_kinder_in_bg: - > name: - > de: Anrechnungsfreie Einkommensanteile - > en: Income shares not subject to transfer withdrawal - > type: piecewise_linear - > 2005-01-01: - > 0: - > lower_threshold: -inf - > upper_threshold: 0 - > rate: 0 - > intercept_at_lower_threshold: 0 - > ``` + ```yaml + parameter_solidaritätszuschlag: + name: + de: Solidaritätszuschlag + en: null + description: + de: >- + Ab 1995, der upper threshold im Intervall 1 ist nach der Formel + transition_threshold in soli_st.py berechnet. + en: null + unit: Euros + reference_period: Year + type: piecewise_linear + 1991-01-01: + reference: Artikel 1 G. v. 24.06.1991 BGBl. I S. 1318. + 0: + lower_threshold: -inf + rate_linear: 0 + intercept_at_lower_threshold: 0 + upper_threshold: 0 + 1: + lower_threshold: 0 + rate_linear: 0.0375 + upper_threshold: inf + ``` + +- Phase-in or phase-out of age thresholds based on the birth year of the individual + (e.g. increasing statutory retirement age thresholds) should be specified as type + `birth_year_based_phase_inout`. The parameter specification is converted to a lookup + table that maps a birth year to the age threshold. The conversion requires the + following stucture after the `YYYY-MM-DD` key: -- In general, a parameter should appear for the first time that it is mentioned in a - law, becomes relevant, etc.. + - `first_birthyear_to_consider`: The birth year at which the lookup table starts (just + choose some birthyear that is far enough in the past). + - `last_birthyear_to_consider`: The birth year at which the lookup table ends (just + choose some birthyear that is far enough in the future). + - `YYYY` entries with the following structure: + - `years`: The age threshold in years. + - `months`: The age threshold in months. - Only in exceptional cases it might be useful to set a parameter to some value - (typically zero) even if it does not exist yet. + Example from `sozialversicherung` / `rente` / `altersrente` / `regelaltersrente` / + `altersgrenze.yaml`: -- If a parameter ceases to be relevant, is superseded by something else, etc., there - must be a `YYYY-MM-DD` key with a note on this. + ```yaml + altersgrenze_gestaffelt: + name: + de: Gestaffeltes Eintrittsalter für Regelaltersrente nach Geburtsjahr + en: Staggered normal retirement age (NRA) for Regelaltersrente by birth year + description: + de: >- + § 35 Satz 2 SGB VI + Regelaltersgrenze ab der Renteneintritt möglich ist. Wenn früher oder später in + Rente gegangen wird, wird der Zugangsfaktor und damit der Rentenanspruch höher + oder niedriger, sofern keine Sonderregelungen gelten. + en: >- + § 35 Satz 2 SGB VI + Normal retirement age from which pension can be received. If retirement benefits + are claimed earlier or later, the Zugangsfaktor and thus the pension entitlement + is higher or lower unless special regulations apply. + unit: Years + reference_period: null + type: birth_year_based_phase_inout + 2007-04-20: + reference: RV-Altersgrenzenanpassungsgesetz 20.04.2007. BGBl. I S. 554 + note: >- + Increase of the early retirement age from 65 to 67 for birth cohort 1947-1964. + Vertrauensschutz (Art. 56) applies for birth cohorts before 1955 who were in + Altersteilzeit before January 1st, 2007 or received "Anpassungsgeld für + entlassene Arbeitnehmer des Bergbaus". + first_birthyear_to_consider: 1900 + last_birthyear_to_consider: 2031 + 1946: + years: 65 + months: 0 + 1947: + years: 65 + months: 1 + 1948: + years: 65 + months: 2 + 1949: + years: 65 + months: 3 + 1950: + years: 65 + months: 4 + 1951: + years: 65 + months: 5 + 1952: + years: 65 + months: 6 + 1953: + years: 65 + months: 7 + 1954: + years: 65 + months: 8 + 1955: + years: 65 + months: 9 + 1956: + years: 65 + months: 10 + 1957: + years: 65 + months: 11 + 1958: + years: 66 + months: 0 + 1959: + years: 66 + months: 2 + 1960: + years: 66 + months: 4 + 1961: + years: 66 + months: 6 + 1962: + years: 66 + months: 8 + 1963: + years: 66 + months: 10 + 1964: + years: 67 + months: 0 + ``` + +- Phase-in or phase-out of age thresholds based on the birth month of the individual + should be specified as type `birth_month_based_phase_inout`. The parameter + specification is the same as for `birth_year_based_phase_inout`, except that the + `YYYY` entries are followed by `MM` keys. The `MM` keys a have the following + structure: + + - `first_birthmonth_to_consider`: The birth month at which the lookup table starts + (just choose some birthmonth that is far enough in the past). + - `last_birthmonth_to_consider`: The birth month at which the lookup table ends (just + choose some birthmonth that is far enough in the future). + - `years`: The age threshold in years. + - `months`: The age threshold in months. + + Excerpt from `sozialversicherung` / `rente` / `altersrente` / `langjährig` / + `altersgrenze.yaml`: + + ```yaml + ... + 1989-12-18: + reference: Rentenreformgesetz 1992. BGBl. I S. 2261 1989 § 41 + note: Increase of full retirement age from 63 to 65 for birth cohort 1938-1943. + first_birthyear_to_consider: 1900 + last_birthyear_to_consider: 2100 + 1937: + 12: + years: 63 + months: 0 + 1938: + 1: + years: 63 + months: 1 + ... + ``` - Generally, this `YYYY-MM-DD` key will have an entry `value: null` regardless of the - previous structure. Ideally, there would be a `reference` and potentially a `note` - key. Example: +- Finally, there are parameters that have a more complex structure, which is not as + common as `piecewise_linear` etc. These need to be specified as `require_converter`. + + Example from `arbeitslosengeld_2` / `bedarfe.yaml`: ```yaml - value: null - note: arbeitsl_hilfe is superseded by arbeitsl_geld_2 + parameter_regelsatz_nach_regelbedarfsstufen: + name: + de: Regelsatz mit direkter Angabe für Regelbedarfsstufen + en: Standard rate with direct specification of "Regelbedarfsstufen" + description: + de: >- + § 20 V SGB II. Neufassung SGB II § 20 (1a) und (2) durch + Artikel 6 G. v. 22.12.2016 BGBl. I S. 3159. + Regelbedafstufen: + 1: Alleinstehender Erwachsener + 2: Erwachsene in Partnerschaft + 3: Erwachsene unter 25 im Haushalt der Eltern + 4: Jugendliche + 5: Ältere Kinder + 6: Jüngste Kinder + en: >- + Regelbedarfsstufen: + 1: Single Adult + 2: Adults in a partner relationship + 3: Adults under 25 in the household of their parents + 4: Adolescents + 5: Older children + 6: Youngest children + unit: Euros + reference_period: Month + type: require_converter + 2011-01-01: + 1: 364 + 2: 328 + 3: 291 + 4: + min_alter: 14 + max_alter: 17 + betrag: 287 + 5: + min_alter: 6 + max_alter: 13 + betrag: 251 + 6: + min_alter: 0 + max_alter: 5 + betrag: 215 + reference: Artikel 1 G. v. 24.03.2011 BGBl. I S. 453. ``` - Only in exceptional cases it might be useful to set a parameter to some value - (typically zero) even if it is not relevant any more. +- In general, a parameter should appear for the first time that it is mentioned in a + law, becomes relevant, etc.. - In any case, it **must** be the case that it is obvious from the `YYYY-MM-DD` entry - that the (set of) parameter(s) is not relevant any more, else the previous ones will - linger on. + Do not set parameters to some value if they are not relevant yet. -(gep-3-storage-of-parameters)= +- If a parameter ceases to be relevant, is superseded by something else, etc., there + must be a `YYYY-MM-DD` key with a `note` and/or `reference` key. There must not be + other entries except for these two. -## Storage of parameters + Example: -The contents of the YAML files become part of the `policy_params` dictionary. Its keys -correspond to the names of the YAML files. Each value will be a dictionary that follows -the structure of the YAML file. These values can be used in policy functions as -`[key]_params`. + ```yaml + parameter_regelsatz_anteilsbasiert: + name: + de: Berechnungsgrundlagen für den Regelsatz + 2011-01-01: + note: Calculation method changed, see regelsatz_nach_regelbedarfsstufen. + ``` + +(gep-3-handling-of-parameters-in-the-codebase)= + +## Handling of parameters in the codebase -The contents mostly follow the content of the YAML files. The main difference is that -all parameters are present in their required format; no further parsing shall be -necessary inside the functions. The important changes include: +The contents of the YAML files are processed and are a pytree-like structure, similar to +the functions. That is, they can be used directly in their namespace (=path to the yaml +file excluding the file name) and accessed by absolute paths otherwise. -- In the YAML files, parameters may be specified as deviations from other values, - {ref}`see above `. All these are converted so that the relevant - values are part of the dictionary. -- Similarly, values from other points in time (via `access_prior_parameters`, - {ref}`see above `) of `[param]` will be available as: - `[param]_t_minus_[number_of_lags]_[reference_period[0].lower()]`. -- Parameters for piecewise polynomials are parsed. -- Parameters that are derived from other parameters are calculated (examples include - `kinderzuschlag_max` starting in 2021 or calculating the phasing in of - `vorsorgeaufwendungen_alter` over the 2005-2025 period). +In this tree, they are specialised to the relevant policy date. Depending on the type of +the parameter (see the previous section), the following types are possible: -These functions will be avaiable to users en bloque or one-by-one so they can specify -parameters as in the YAML file for their own policy parameters. +- `scalar` parameters are just floats / ints / Booleans; i.e., simply the `value` key of + the yaml file. +- `dict` parameters are homogenous dictionaries with all contents of the `YYYY-MM-DD` + entries except for the `note` and `reference` keys. +- `piecewise_constant` / `piecewise_linear` / `piecewise_quadratic` / `piecewise_cubic` + parameters are converted to `PiecewisePolynomialParameter` objects. +- `birth_month_based_phase_inout` and `birth_year_based_phase_inout` are converted to + `ConsecutiveIntLookupTableParamValue` objects. +- `require_converter` must have a `params_function` that converts the `YYYY-MM-DD` + entries to a clear type. ## Discussion - - +- GitHub PR for update (changes because of `GEP-6 `): + ## Copyright This document has been placed in the public domain. + +## Appendix: json-schema for the yaml files + +```{literalinclude} ../../src/ttsim/params-schema.json +``` diff --git a/docs/geps/gep-04.md b/docs/geps/gep-04.md index 4d985f066..6f50adf0f 100644 --- a/docs/geps/gep-04.md +++ b/docs/geps/gep-04.md @@ -15,6 +15,8 @@ * Standards Track - * Created * 2022-03-28 +- * Updated + * 2025-07-23 - * Resolution * [Accepted](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2004) ``` @@ -49,6 +51,9 @@ motivated by two main reasons. input variables, it prevents unnecessary calculations, and it increases computation speed. +In addition to these requirements, we are using a hierarchical structure of functions to +allow for a clear separation of concerns. + ## Basic idea Based on the two requirements above we split the taxes and transfers system into a set @@ -63,11 +68,11 @@ GETTSIM; this is irrelevant for the DAG. Function arguments can be of three kinds: - User-provided input variables (e.g., - `einkommensteuer__einkünfte__aus_nichtselbstständiger_arbeit__bruttolohn_m`). + `(einkommensteuer, einkünfte, aus_nichtselbstständiger_arbeit, bruttolohn_m)`). - Outputs of other functions in the taxes and transfers system (e.g., - `einkommensteuer__betrag_y_sn`). -- Parameters of the taxes and transfers system, which are pre-defined and always end in - `_params` (e.g., `ges_rentenv_params`). + `(einkommensteuer, betrag_y_sn)`). +- Parameters of the taxes and transfers system (e.g., + `(einkommensteuer, abgeltungssteuer, satz)`). GETTSIM will calculate the variables a researcher is interested in by starting with the input variables and calling the required functions in a correct order. This is @@ -79,59 +84,44 @@ why we use functions when programming: readability, simplicity, lower maintenanc potential entry point for a researcher to change the taxes and transfers system if she is able to replace this function with her own version. -See the following example for capital income taxes (Abgeltungssteuer). +See the following example for capital income taxes (Abgeltungssteuer). Based on the +location in the file system, the full path is +`(einkommensteuer, abgeltungssteuer, betrag_y_sn)`. ```python -def einkommensteuer__abgeltungssteuer__betrag_y_sn( - einkommensteuer__einkünfte__aus_kapitalvermögen__betrag_y_sn: float, - abgelt_st_params: dict, -) -> float: - """Calculate Abgeltungssteuer on Steuernummer-level. - - Parameters - ---------- - einkommensteuer__einkünfte__aus_kapitalvermögen__betrag_y_sn - See :func:`einkommensteuer__einkünfte__aus_kapitalvermögen__betrag_y_sn`. - abgelt_st_params - See params documentation :ref:`abgelt_st_params `. - - Returns - ------- - - """ - return ( - abgelt_st_params["satz"] - * einkommensteuer__einkünfte__aus_kapitalvermögen__betrag_y_sn - ) +@policy_function(start_date="2009-01-01") +def betrag_y_sn(zu_versteuerndes_kapitaleinkommen_y_sn: float, satz: float) -> float: + """Abgeltungssteuer on Steuernummer level.""" + return satz * zu_versteuerndes_kapitaleinkommen_y_sn ``` -The function `einkommensteuer__abgeltungssteuer__betrag_y_sn` requires the variable -`einkommensteuer__einkünfte__aus_kapitalvermögen__betrag_y_sn`, which is the amount of -taxable capital income on the Steuernummer-level (the latter is implied by the `_sn` -suffix, see {ref}`gep-1`). -`einkommensteuer__einkünfte__aus_kapitalvermögen__betrag_y_sn` must be provided by the -user as a column of the input data or it has to be the name of another function. -`abgelt_st_params` is a dictionary of parameters related to the calculation of -`betrag_y_sn`. - -> Note: In the source code, the prefix `einkommensteuer__abgeltungssteuer__` is missing. -> This is because it is inferred from the path the function is defined in. For more -> details, see {ref}`gep-6`. +The function `(einkommensteuer, abgeltungssteuer, betrag_y_sn)` requires the variable +`zu_versteuerndes_kapitaleinkommen_y_sn`, which is the amount of taxable capital income +on the Steuernummer-level (the latter is implied by the `_sn` suffix, see {ref}`gep-1`). +`zu_versteuerndes_kapitaleinkommen_y_sn` must be provided by the user as a column of the +input data or it has to be the name of another function (in fact, in the GETTSIM code +base it will be calculated as income from capital minus expenses). `satz` is a parameter +coming out of a yaml file in the same directory. -Another function, say +Another function, say `(solidaritätszuschlag, betrag_y_sn)`, ```python -def solidaritätszuschlag__betrag_y_sn( +@policy_function( + start_date="2009-01-01", leaf_name="betrag_y_sn", vectorization_strategy="loop" +) +def betrag_y_sn_mit_abgelt_st( einkommensteuer__betrag_mit_kinderfreibetrag_y_sn: float, einkommensteuer__anzahl_personen_sn: int, einkommensteuer__abgeltungssteuer__betrag_y_sn: float, - soli_st_params: dict, -) -> float: ... + parameter_solidaritätszuschlag: PiecewisePolynomialParameters, +) -> float: ``` -may use `einkommensteuer__abgeltungssteuer__betrag_y_sn` as an input argument. The DAG -backend ensures that the function `einkommensteuer__abgeltungssteuer__betrag_y_sn` will -be executed first. +may use `(einkommensteuer, abgeltungssteuer, betrag_y_sn)` as an input argument. Note +that because of a different namespace, we need to specify the full path. In order to +make valid Python identifiers out of paths, we use double underscores. Important for +this GEP is that the DAG ensures that the function +`(einkommensteuer, abgeltungssteuer, betrag_y_sn)` will be executed first. Note that the type annotations (e.g. `float`) indicate the expected type of each input and the output of a function, see {ref}`gep-2`. @@ -139,9 +129,10 @@ and the output of a function, see {ref}`gep-2`. ## Directed Acyclic Graph The relationship between functions and their input variables is a graph where nodes -represent columns in the data. These columns must either be present in the data supplied -to GETTSIM or they are computed by functions. Edges are pointing from input columns to -variables, which require them to be computed. +represent columns in the data (or parameters of the taxes and transfers system, but +these will be partialled into the functions first). These columns must either be present +in the data supplied to GETTSIM or they are computed by functions. Edges are pointing +from input columns to variables, which require them to be computed. ```{note} GETTSIM allows to visualize the graph, see this [guide](../how_to_guides/visualizing_the_system.ipynb). @@ -169,51 +160,60 @@ inputs provided by the user: > these functions). These functions need to be written for scalars; they will be > vectorised during the set up of the DAG. > -> - A set of dictionaries specifying aggregation functions, calculating, for example, -> household-level averages. -> > - The target columns of interest. The DAG is then used to call all required functions in the right order and to calculate the requested targets. -### Level of the DAG and limitations +### Level of the DAG In principle, GETTSIM will import all functions defined in the modules describing the taxes and transfers system. In principle, these functions refer to all years in GETTSIM's scope. There has to be some discretion in order to allow for the interface of functions to change over time, new functions to appear, or old ones to disappear. +Because of this, all functions operating on data to be considered by GETTSIM need to be +decorated as `@policy_function`. For simple cases, the decorator does not require any +arguments, e.g., the high-level functions to calculate the total amount of income: -Some examples include: - -1. `arbeitsl_hilfe` being replaced by `arbeitsl_geld_2`. -1. `kinderbonus` being active only in a few years. -1. The introduction of `kinderzuschl`. -1. Capital income entering `sum_brutto_eink` or not. - -The goal is that the graph for any particular point in time is minimal in the sense that -`arbeitsl_geld_2` does not appear before it was conceived, it is apparent from the -interface of `sum_brutto_eink` whether it includes capital income or not, etc.. - -In the yaml-files corresponding to a particular tax / transfer, functions not present in -all years will need to be listed with along with the dates for when they are active. See -:gep-3-keys-referring-to-functions: for the precise syntax. That mechanism should be -used for: - -1. Functions that are newly introduced. +```python +@policy_function() +def gesamteinkommen_y( + einkünfte__gesamtbetrag_der_einkünfte_y_sn: float, + abzüge__betrag_y_sn: float, +) -> float: + """Gesamteinkommen without Kinderfreibetrag on tax unit level.""" +``` -1. Functions that cease to be relevant. +When functions change, different values can be specified for different time periods. The +`leaf_name` ensures that they can be used without changes elsewhere in the system, +despite different raw names. For example, the calculation of the Solidaritätszuschlag +changed with the introduction of the Abgeltungssteuer: -1. Functions whose interface changes over time. +```python +@policy_function(end_date="2008-12-31", leaf_name="betrag_y_sn") +def betrag_y_sn_ohne_abgelt_st( + einkommensteuer__betrag_mit_kinderfreibetrag_y_sn: float, + einkommensteuer__anzahl_personen_sn: int, + parameter_solidaritätszuschlag: PiecewisePolynomialParameters, +) -> float: + """Calculate the Solidarity Surcharge on Steuernummer level.""" -1. Functions whose body changes so much that - - it is useful to signal that things have changed and/or - - it would be awkward to program the different behaviors in one block with case - distinctions. +@policy_function(start_date="2009-01-01", leaf_name="betrag_y_sn") +def betrag_y_sn_mit_abgelt_st( + einkommensteuer__betrag_mit_kinderfreibetrag_y_sn: float, + einkommensteuer__anzahl_personen_sn: int, + einkommensteuer__abgeltungssteuer__betrag_y_sn: float, + parameter_solidaritätszuschlag: PiecewisePolynomialParameters, +) -> float: + """Calculate the Solidarity Surcharge on Steuernummer level.""" +``` -Needless to say, the different reasons may appear at different points in time for the -same function. +The above construct ensures that both versions can be accessed as +`solidaritätszuschlag__betrag_y_sn` in other parts of the code. If a policy environment +is created for a point in time before 2009, it will be the first version that is used. +If the policy environment is created for a point in time after 2008, the second version +will be used. ## Additional functionalities @@ -228,35 +228,21 @@ Many taxes or transfers require group-level variables. \ how reductions are handled in terms of the underlying data. This section describes how to specify them. -In order to inject aggregation functions at the group level into the graph, scripts with -functions of the taxes and transfer system should define a dictionary -`aggregation_specs` at the module level. This dictionary must specify the aggregated -columns as keys and the AggregateByGroupSpec data class as values. The data class -specifies the `source` (i.e. the column which is being aggregated) and the aggregation -method `agg`. +For example, we may need the number of adult household members. The following code in +`household_characteristics.py` does this: -For example, in `household_characteristics.py`, we could have: +```python +from ttsim import AggType, agg_by_group_function -``` -from ttsim.aggregation import AggregateByGroupSpec -aggregation_specs = { - "anzahl_kinder_hh": AggregateByGroupSpec(source="familie__kind", agg="sum"), - "anzahl_personen_hh": AggregateByGroupSpec(agg="count"), -} +@agg_by_group_function(agg_type=AggType.SUM) +def anzahl_erwachsene_hh(familie__erwachsen: bool, hh_id: int) -> int: + pass ``` -The group identifier (`hh_id`, `wohngeld__wthh_id`, `arbeitslosengeld_2__fg_id`, -`arbeitslosengeld_2__bg_id`, `arbeitslosengeld_2__eg_id`, `familie__ehe_id`, -`einkommensteuer__sn_id`) will be automatically included as an argument; for `count` -nothing else is necessary. - -The output type will be the same as the input type. Exceptions: - -- Input type `bool` and aggregation `sum` leads to output type `int`. -- Input type `int` and aggregation {math}`\in \{` `any`, `all` {math}`\}` leads to - output type `bool` -- Aggregation `count` will always result in an `int`. +That is, we need to specify the aggregation type (sum), the input column +(`familie__erwachsen`), and the group identifier (`hh_id`). GETTSIM will take care of +the rest. The most common operation are sums of individual measures. GETTSIM adds the following syntactic sugar: In case an individual-level column `my_col` exists, the graph will be @@ -284,18 +270,15 @@ def arbeitslosengeld_2__betrag_m_bg(kindergeld__betrag_m_bg, other_arguments): . a node `kindergeld__betrag_m_bg` containing the Bedarfsgemeinschaft-level sum of `kindergeld__betrag_m` will be automatically added to the graph. Its parents in the -graph will be `kindergeld__betrag_m` and `arbeitslosengeld_2__bg_id`. This is the same -as specifying: +graph will be `kindergeld__betrag_m` and `bg_id`. This is the same as specifying: -``` -from ttsim.aggregation import AggregateByGroupSpec - -aggregation_specs = { - "kindergeld__betrag_m_bg": AggregateByGroupSpec( - source="kindergeld__betrag_m", - agg="sum" - ) -} +```python +from ttsim import AggType, agg_by_group_function + + +@agg_by_group_function(agg_type=AggType.SUM) +def anzahl_erwachsene_hh(kindergeld__betrag_m: float, bg_id: int) -> float: + pass ``` (gep-4-aggregation-by-p-id-functions)= @@ -319,57 +302,52 @@ The key `source` specifies which column is the source of the aggregation operati key `p_id_to_aggregate_by` specifies the column that indicates to which `p_id` the values in `source` should be ascribed to. The key `agg` gives the aggregation method. -For example, in `kindergeld.py`, we could have: +For example, in the `kindergeld` namespace, we could have: -``` -aggregation_specs = { - "kindergeld__anzahl_ansprüche": AggregateByPIDSpec( - p_id_to_aggregate_by="kindergeld__p_id_empfänger", - source="kindergeld__grundsätzlich_anspruchsberechtigt", - agg="sum", - ), -} -``` +```python +from ttsim import AggType, agg_by_p_id_function -This dict creates a target function `kindergeld__anzahl_ansprüche` which gives the -amount of claims that a person has on Kindergeld, based on the -`kindergeld__grundsätzlich_anspruchsberechtigt` function which returns Booleans, which -show whether a child is a reason for a Kindergeld claim. -The output type will be the same as the input type. Exceptions: +@agg_by_p_id_function(agg_type=AggType.SUM) +def anzahl_ansprüche( + grundsätzlich_anspruchsberechtigt: bool, p_id_empfänger: int, p_id: int +) -> int: + pass +``` -- Input type `bool` and aggregation `sum` leads to output type `int`. -- Input type `int` or `float` and aggregation {math}`\in \{` `any`, `all` {math}`\}` - leads to output type `bool` -- Aggregation `count` will always result in an `int`. +This places a target function `kindergeld__anzahl_ansprüche` which gives the amount of +claims that a person has on Kindergeld, based on the +`kindergeld__grundsätzlich_anspruchsberechtigt` function which returns Booleans, which +show whether a child is a reason for a Kindergeld claim. `p_id` and some `p_id_[target]` +are required arguments; they will be processed according to naming conventions. (gep-4-time-unit-conversion)= ### Conversion between reference periods Similarly to summations to the group level, GETTSIM will automatically convert values -referring to different reference periods defined in {ref}`gep-1` (years (default, no -suffix), months `_m`, weeks `_w`, and days `_d`). +referring to different reference periods defined in {ref}`gep-1` (years `_y`, quarters +`_q`, months `_m`, weeks `_w`, and days `_d`). -In case a column with annual values `[column]` exists, the graph will be augmented with -a node including monthly values like `[column]_m` should that be requested. Requests can -be either inputs in a downstream function or explicit targets of the calculation. In -case the column refers to a different level of aggregation, say `[column]_hh`, the same -applies to `[column]_m_hh`. +In case a column with annual values `[column]_y` exists, the graph will be augmented +with a node including monthly values like `[column]_m` should that be requested. +Requests can be either inputs in a downstream function or explicit targets of the +calculation. In case the column refers to a different level of aggregation, say +`[column]_hh`, the same applies to `[column]_m_hh`. -Automatic summation will only happen in case no column `[column]_m` is explicitly set. +Automatic conversion will only happen in case no column `[column]_m` is explicitly set. Using a different conversion function than the sum is as easy as explicitly specifying `[column]_m`. Conversion goes both ways and uses the following formulas: -```{eval-rst} -| time unit | suffix | factor | -| Year | | 1 | -| Month | ``_m`` | 12 | -| Week | ``_w`` | 365.25 / 7 | -| Day | ``_d`` | 365.25 | -``` +| time unit | suffix | factor relative to Year | +| --------- | ------ | ----------------------- | +| Year | `_y` | 1 | +| Quarter | `_q` | 4 | +| Month | `_m` | 12 | +| Week | `_w` | 365.25 / 7 | +| Day | `_d` | 365.25 | These values average over leap years. They ensure that conversion is always possible both ways without changing quantities. In case more complex conversions are needed (for @@ -383,8 +361,7 @@ functions for, say, `[column]_w` need to be set. splits and distributes computations. - Based on GETTSIM and many other projects, the [dags](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2004) - project combines the core ideas in one spot. GETTSIM will likely use it to implement - functionality at some point. + project combines the core ideas in one spot and has become a dependency of GETTSIM. ## Alternatives @@ -395,6 +372,8 @@ computational advantages. - - +- GitHub PR for update (changes because of `GEP-6 `): + ## Copyright diff --git a/docs/geps/gep-05.md b/docs/geps/gep-05.md index 756f2405a..b69fc0489 100644 --- a/docs/geps/gep-05.md +++ b/docs/geps/gep-05.md @@ -11,6 +11,8 @@ * Standards Track - * Created * 2022-02-02 +- * Updated + * 2025-07-23 - * Resolution * [Accepted](https://gettsim.zulipchat.com/#narrow/stream/309998-GEPs/topic/GEP.2005/near/270427530) ``` @@ -54,14 +56,15 @@ from ttsim import policy_function, RoundingSpec, RoundingDirection @policy_function( rounding_spec=RoundingSpec( base=0.0001, - direction=RoundingDirection.NEAREST, + direction="nearest", reference="§76g SGB VI Abs. 4 Nr. 4", ), start_date="2021-01-01", ) def höchstbetrag_m( grundrentenzeiten_monate: int, - ges_rente_params: dict, + berücksichtigte_wartezeit_monate: dict[str, int], + höchstwert_der_entgeltpunkte: dict[str, float], ) -> float: ... ``` @@ -70,8 +73,7 @@ The specification of the rounding parameters is defined via the `RoundingSpec` c - The `base` determines the base to which the variables is rounded. It has to be a floating point number. -- The `direction` has to be one of `RoundingDirection.UP`, `RoundingDirection.DOWN`, - `RoundingDirection.NEAREST`. +- The `direction` has to be one of `up`, `down`, or `nearest`. - The `reference` provides the legal reference for the rounding rule. This is optional. - Additionally, via the `to_add_after_rounding` input, users can specify some amount that should be added after the rounding is done (this was relevant for the income tax @@ -87,13 +89,14 @@ This implementation was chosen over alternatives (e.g., specifying rounding rule parameter files) for the following reason: - Rounding rules are not a parameter, but a function property that we want to turn off - an one. Hence, it makes sense to define it at the function level. + and on. Hence, it makes sense to define it at the function level. - Rounding parameters might change over time. In this case, the rounding parameters for each period can be specified using the `start_date`, `end_date` keywords in the `policy_function` decorator. - Optional rounding can be easily specified for user-written functions. -- At the definition of a function, it is clearly visible whether it is optionally - rounded and where the rounding parameters are found. +- At the definition of a function, it is clearly visible whether and how it is + optionally rounded (initially we included the rounding parameters in the yaml files, + which led to an unclear structure there and one always had to look in two places). ## Discussion @@ -101,6 +104,10 @@ parameter files) for the following reason: - PR: - PR Implementation: +- GitHub PR for update (changes because of `GEP-6 `): + +- Github PR changing to a RoundingSpec class rather than parameters specified in the + yaml files: ## Copyright diff --git a/docs/geps/gep-06.md b/docs/geps/gep-06.md index a425364f8..072908b80 100644 --- a/docs/geps/gep-06.md +++ b/docs/geps/gep-06.md @@ -229,19 +229,18 @@ The proposed changes will affect all areas of GETTSIM is a standard way mapping dictionary contents in the yaml-files to corresponding data classes. Dates are selected by the `policy_environment` date. If there are changes in the structure of the parameters over time, a similar mechanism like the `start_date` - and `end_date` for the policy functions can be used. The data classes will all - inherit from a base class `PolicyParameter`. + and `end_date` for the policy functions will be used based on the `YYYY-MM-DD` keys + in the yaml-files. Functions will not have `[x]_params` arguments containing potentially large and - unstructured dicts any more. Instead, functions will only use the `PolicyParameters` - they require. These could be scalars or structured objects, e.g., the inputs for - `piecewise_polynomial`. + unstructured dicts any more. Instead, functions will only use the policy parameters + they require. These could be scalars, homogenous dictionaries, the inputs for + `piecewise_polynomial` parameters, or custom objects. The namespace makes clear we are talking about, say, the function `beitrag` in the namespace `arbeitslosenversicherung` will have an input `beitragssatz`. If we need parameters which are external to the current namespace, we will need the same verbose - syntax as in 1. - (`sozialversicherungsbeiträge__rentenversicherung__beitragsbemessungsgrenze_m`). + syntax as in 1. (`sozialversicherung__rente__beitrag__beitragsbemessungsgrenze_m`). ## Backward compatibility diff --git a/docs/geps/gep-07.md b/docs/geps/gep-07.md new file mode 100644 index 000000000..144dd0929 --- /dev/null +++ b/docs/geps/gep-07.md @@ -0,0 +1,408 @@ +(gep-7)= + +# GEP 7 — GETTSIM's User Interface + +```{list-table} +- * Author + * [Hans-Martin von Gaudecker](https://github.com/hmgaudecker) +- * Status + * Draft +- * Type + * Standards Track +- * Created + * 2025-07-23 +- * Resolution + * [Accepted](https://gettsim.zulipchat.com/#narrow/channel/309998-GEPs/topic/GEP.2007/near/530389224) +``` + +## Abstract + +This GEP proposes a new user interface for GETTSIM that simplifies data input/output +handling, reduces the learning curve for new users, and provides more flexibility in +working with different datasets. The interface redesign aims to address the challenges +identified during the 2024 GETTSIM workshop and, more generally, experience with using +GETTSIM versions up to 0.7.0. At the same time, we maintain compatibility with the +namespace architecture introduced in GEP 6. + +## Motivation and Scope + +The current GETTSIM interface presents several challenges that affect both new and +experienced users: + +1. **High Entry Barrier**: Users need detailed knowledge of the Directed Acyclic Graph + (DAG) structure and precise input requirements, making it difficult for newcomers to + get started. + +1. **Data Mapping Complexity**: Matching existing datasets with GETTSIM's requirements + is challenging due to the fine-grained nature of the graph. + +1. **Limited Flexibility**: The current interface makes it difficult to work with + different datasets / areas of the taxes and transfers system. + +This GEP aims to address these issues by introducing a more intuitive and flexible +interface while maintaining GETTSIM's computational robustness. + +## Usage and Impact + +1. **Basic workflow** + + There is a single entry point for GETTSIM: The `main` function. It is powered by a + DAG in the background. + + This means that the user will have to start by telling it the desired target + ("`main_target`") or set of targets ("`main_targets`"). Ultimately, the main target + will typically be a dataset with values for taxes and transfers. However, the `main` + function can also be used to obtain intermediate objects. E. g., the taxes and + transfers system at a particular date (the "`policy_environ­ment`"), which she wants + to modify in order to model a reform. + + The targets determine the required inputs. For example, in order to compute taxes and + transfers for a set of households, one will need as primitives + + - the date for which the policy environment is set up ("`policy_date_str`") + - data on these households ("`input_data`") + - the set of taxes and transfers to be computed ("`tt_targets`"). This could be left + out, in which case all possible targets will be computed. However, that will often + be a daunting task in terms of requirements on the data and computer memory. + + If only a policy environment is to be returned, just the date is required as an + input. + +1. **Worked example** + + Here is an example (the variables `inputs_df`, `inputs_map`, and `targets_tree` will + be shown below). + + ```python + from gettsim import InputData, MainTarget, TTTargets, main + + + outputs_df = main( + main_target=MainTarget.results.df_with_mapper, + policy_date_str="2025-01-01", + input_data=InputData.df_and_mapper( + df=inputs_df, + mapper=inputs_map, + ), + tt_targets=TTTargets(tree=targets_tree), + ) + ``` + + All elements that are not atomic are specified as GETTSIM objects, which means that + users can benefit from autocompletion and type hints provided by their IDE (see + below). + + The first argument, `main_target`, specifies the type of object to compute. In this + case, we want the "results" in the "DataFrame with mapper" format. That is, GETTSIM + will compute all desired targets and return a DataFrame with columns specified by the + user. + + Say we want to compute the contributions to long term care insurance + (Pflegeversicherung). The fourth argument, `tt_targets`, specifies the set of taxes + and transfers ("`tt`") to compute. Because we ask for the "results" in the "DataFrame + with mapper" format, this actually has to be a mapping from the targets to the + columns in the output DataFrame. In this case, the argument `tt_targets` needs to be + a *pytree*, which provides that mapping: + + ```python + targets_tree = { + "sozialversicherung": { + "pflege": { + "beitrag": { + "betrag_versicherter_m": "ltci_contrib", + } + } + } + } + ``` + + That is, the call to `main` above will return a DataFrame with one column + `ltci_contrib`, which will be of the same length as the input data. As the possible + target trees will depend on the policy environment, we will need to make the + documentation dynamic. + + The second argument, `policy_date_str`, specifies the date at which the policy + environment is set up and evaluated. + + Say we want to compute the long term care insurance contribution for three people, + one of whom has an underage child living in her household. Our data looks as follows: + + | | age | wage | id | hh_id | mother_id | has_kids | + | --: | --: | ---: | --: | ----: | --------: | :------- | + | 0 | 25 | 950 | 0 | 0 | -1 | False | + | 1 | 45 | 950 | 1 | 1 | -1 | True | + | 2 | 3 | 0 | 2 | 1 | 1 | False | + | 3 | 65 | 950 | 3 | 2 | -1 | True | + + We can use this DataFrame directly. All we need to do is to tell GETTSIM how to map + the columns of that DataFrame to the names of inputs it knows about. This is done by + a *mapper*, which again is a *pytree*. In our case, it looks as follows: + + ```python + inputs_map = { + "p_id": "id", + "hh_id": "hh_id", + "alter": "age", + "familie": { + "p_id_elternteil_1": "mother_id", + "p_id_elternteil_2": -1, + }, + "einkommensteuer": { + "einkünfte": { + "aus_nichtselbstständiger_arbeit": {"bruttolohn_m": "wage"}, + "ist_selbstständig": False, + "aus_selbstständiger_arbeit": {"betrag_m": 0.0}, + } + }, + "sozialversicherung": { + "pflege": { + "beitrag": { + "hat_kinder": "has_kids", + } + }, + "kranken": { + "beitrag": {"bemessungsgrundlage_rente_m": 0.0, "privat_versichert": False} + }, + }, + } + ``` + + All *leaves* of the tree are either column names in the data or scalars. E.g., we do + not consider self-employed people, pensioners, or people with (substitutive) private + health insurance. Instead of requiring some default value in the data, we can simply + use a scalar value in the mapper. + + *Note:* We picked an example with little, but not zero, complexity. The amount of + inputs is simply necessary because public long term care insurance contributions + depend on various kinds of income (from dependent employment, from self-employment, + pensions), the combination of the insured person's age and her children, and whether + the insured person is covered by private health insurance. + + Finally, here is the output of our example: + + | | ltci_contrib | + | --: | -----------: | + | 0 | 14.72 | + | 1 | 9.82 | + | 2 | 0 | + | 3 | 9.82 | + +1. **Underlying structure** + + The interface DAG looks as follows: + + ```{raw} html + --- + file: ./interface_dag.html + --- + ``` + + The **policy_date** is the date at which the policy environment is set up. It could + be passed as `policy_date_str`, which is an ISO-string `YYYY-MM-DD`. By default, it + is also used as the date for which the taxes and transfers function is evaluated (the + distinction matters for things like pensions etc., which depend on cohort, age, and + calendar time). If users need more control, `evaluation_date` (or + `evaluation_date_str`) can be specified separately. + + The **policy environment** consists of all functions relevant at some point in time. + E.g., when requesting a policy environment for some date in the 2020s, Erziehungsgeld + will not be part of it because it was replaced by Elterngeld long before. Users + wishing to implement reforms—whether they consist of changing parameter values or + replacing functions—will do so at the level of the policy environment. + + The **input data** are the data provided by the user. They need to be passed in one + of several forms. + + Users specify the **taxes and transfers targets** ("`tt_targets`"), which GETTSIM + will return. When left out, GETTSIM will return all functions it can compute. + + The elements of the **specialized environment** combine policy environment and data. + Even if users do not typically need to work with these elements, they are so central + to GETTSIM that it is useful to list them here. + + - **with derived functions and without tree logic** adds aggregations (e.g., adding + up individual incomes to income at the Steuernummer level) and time conversions + (e.g., from month to year). Doing so requires knowing the names of the columns in + the data. + - **with processed params and scalars**. The parameters of the taxes and transfers + system are stored in special objects. Some of them require further conversion + through functions that do not depend on household data ("`param_functions`"). + Similarly, it is possible to pass scalars instead of data columns for things that + are not observed in a dataset or that can be assumed constant in a particular + application (e.g., setting pension payments to zero when looking at the labor + supply of 30-year olds). In this step, these functions are run and all parameters + are converted to their final form (e.g., a `ScalarParam` becomes just a number). + Where relevant, policy functions are replaced by scalars passed as input data. + - **with partialled params and scalars** partials all parameters and scalars to the + functions that make up the taxes and transfers DAG. That is, the resulting + functions only depend on column arguments (either passed as input data or computed + earlier in the DAG). + - **taxes and transfers DAG** is the DAG representation of the functions in the + previous step. + - **taxes and transfers function** is the function that takes the columns in the + processed data as arguments and returns the desired taxes and transfers targets. + Running this function leads to raw results (they still contain internals and should + not be used by non-GETTSIM functions) + + The **results** contain the output of the taxes and transfers function, purged of + internals and converted to the format requested by the user. + + The German taxes and transfers system is complex and specifying inputs can be a + daunting task. The **templates** aim to help with that. E.g., asking for + `MainTarget.templates.input_data` will return a nested dictionary that may include: + + ```python + { + "einkommensteuer": { + "einkünfte": {"aus_forst_und_landwirtschaft": {"betrag_y": "FloatColumn"}} + } + } + ``` + + These templates can be modified to become a mapper, as in the above example. That is, + "`FloatColumn`" could be replaced by the name of the column in the input data frame + or by 0.0 if only employees who don't have other types of income are in the sample. + + Additional user-facing elements are: + + - **rounding** is a Boolean that determines whether to round the results. Defaults to + `True`, which yields a more accurate depiction of the taxes and transfers system. + Turn off if you need numerical derivatives or the like. + + * The **backend** is the backend used to compute the taxes and transfers. Default is + `"numpy"`, the other option is `"jax"`. + * **include_fail_nodes** is a Boolean that determines whether to raise errors for + invalid inputs. Defaults to `True`, only turn off if you really know what you are + doing (and even then, please turn it on before filing an issue). + * **include_warn_nodes** is a Boolean that determines whether to display warnings for + some cases that might lead to surprising behavior. Defaults to `True`, only turn + off if you really know what you are doing (and even then, please turn it on before + filing an issue). + + Other elements of the interface DAG, which will typically be less relevant for users, + include: + + - The **original policy objects**, which consist of all functions and parameters that + GETTSIM ships with. These are all functions and parameters that have been relevant + at some point in time. A user won't typically need to work with this; a policy + environment is constructed from this and a date. + - The **labels** contain things like column names, names of root nodes, etc. — + anything where we only need the label of something and not the object itself. + - **num_segments** is the number of unique individuals in the data. It is required by + the Jax backend to aggregate by group / another individual. determine the number of + segments in the data. + +1. **Autocompletion features** + + The internal structure of the building blocks described in the previous section can + be rather complex. In order to minimize errors arising from typos and misconceptions, + GETTSIM provides objects that allow to take advantage of modern IDEs'/editors' + autocompletion and type hinting features. + + For example, after: + + ```python + from gettsim import main, MainTarget + + main(main_target=MainTarget.) + ``` + + tools like VS Code will show the options: + + ```python + results + templates + policy_environment + specialized_environment + orig_policy_objects + processed_data + raw_results + labels + policy_date_str + input_data + tt_targets + num_segments + backend + policy_date + evaluation_date_str + evaluation_date + xnp + dnp + rounding + warn_if + fail_if + ``` + + Such objects are provided for all arguments to main that need a hierarchical + structure. E.g. , the `input_data` argument takes an instance of `InputData` like in + the above example. Again, one will be able to benefit from autocompletion features + from typing the first 'I' onwards. + +1. **Ecosystem** + + More functionality will be added in external packages. Check out: + + - [gettsim-personas](https://github.com/ttsim-dev/gettsim-personas): Pre-defined + example personas ("Musterhaushalte") + - [soep-preparation](https://github.com/ttsim-dev/soep-preparation): A pipeline + preparing the SOEP data for use with GETTSIM + +1. **Interactive Graph Interface** + + We focus on the infrastructure for the moment; this will be easy to add and will + require a much more interactive and user-driven approach. Top-down planning does not + seem useful at this point. + +## Backward Compatibility + +This interface represents a significant change. There is no way to ensure backward +compatibility. This said, the former: + +```python +from gettsim import ( + set_up_policy_environment, + compute_taxes_and_transfers, + +policy_params, policy_functions = set_up_policy_environment(2025) +result = compute_taxes_and_transfers( + data=data, + functions=policy_functions, + params=policy_params, + targets=targets, +) +``` + +can be replaced by: + +```python +from gettsim import main, InputData, MainTarget, TTTargets + +outputs = main( + main_targets=[ + MainTarget.policy_environment, + MainTarget.results.df_with_mapper, + ], + policy_date_str="2025-01-01", + input_data=InputData.df_and_mapper( + df=data, + mapper=inputs_map, + ), + tt_targets=TTTargets(tree=tt_targets_tree), +) +policy_environment = outputs["policy_environment"] +result = outputs["results"]["df_with_mapper"] +``` + +Beyond the interface change, users will need to change `targets` to `tt_targets_tree` +and to create the `inputs_map`. Both adjustments are due to the changes in the internal +structure of GETTSIM described in [GEP 6](gep-06). + +## Discussion + +- **ENH: Interface, 2024 edition · Issue #781 · iza-institute-of-labor-economics/gettsim + \- Part 1**. + [https://github.com](https://github.com/iza-institute-of-labor-economics/gettsim/issues/781) + +## Copyright + +This document has been placed in the public domain. diff --git a/docs/geps/gep_07_example.py b/docs/geps/gep_07_example.py new file mode 100644 index 000000000..0d8c55ad9 --- /dev/null +++ b/docs/geps/gep_07_example.py @@ -0,0 +1,70 @@ +import pandas as pd + +from gettsim import InputData, MainTarget, TTTargets, main + +inputs_df = pd.DataFrame( + { + "age": [25, 45, 3, 65], + "wage": [950, 950, 0, 950], + "id": [0, 1, 2, 3], + "hh_id": [0, 1, 1, 2], + "mother_id": [-1, -1, 1, -1], + "has_kids": [False, True, False, True], + } +) + +inputs_map = { + "p_id": "id", + "hh_id": "hh_id", + "alter": "age", + "familie": { + "p_id_elternteil_1": "mother_id", + "p_id_elternteil_2": -1, + }, + "einkommensteuer": { + "einkünfte": { + "aus_nichtselbstständiger_arbeit": {"bruttolohn_m": "wage"}, + "ist_hauptberuflich_selbstständig": False, + "aus_selbstständiger_arbeit": {"betrag_m": 0.0}, + } + }, + "sozialversicherung": { + "pflege": { + "beitrag": { + "hat_kinder": "has_kids", + } + }, + "kranken": { + "beitrag": {"bemessungsgrundlage_rente_m": 0.0, "privat_versichert": False} + }, + }, +} + +targets_tree = { + "sozialversicherung": { + "pflege": { + "beitrag": { + "betrag_versicherter_m": "ltci_contrib", + } + } + } +} + +outputs_df = main( + main_target=MainTarget.results.df_with_mapper, + policy_date_str="2025-01-01", + input_data=InputData.df_and_mapper( + df=inputs_df, + mapper=inputs_map, + ), + tt_targets=TTTargets(tree=targets_tree), +) + +print(outputs_df.round(2).to_markdown()) + +print(inputs_df.to_markdown()) + +pe = main( + main_target=MainTarget.policy_environment, + policy_date_str="2025-01-01", +) diff --git a/docs/geps/interface_dag.html b/docs/geps/interface_dag.html new file mode 100644 index 000000000..ddcf3530b --- /dev/null +++ b/docs/geps/interface_dag.html @@ -0,0 +1,3885 @@ + + + +
+
+ + diff --git a/pixi.lock b/pixi.lock index bada652ec..a4e9273c8 100644 --- a/pixi.lock +++ b/pixi.lock @@ -218,6 +218,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh0d859eb_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_hd72426e_102.conda @@ -476,6 +477,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-64/tk-8.6.13-hf689a15_2.conda @@ -718,6 +720,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h892fb3f_2.conda @@ -952,6 +955,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/win-64/tbb-2021.13.0-h62715c5_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh5737063_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda @@ -1239,6 +1243,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh0d859eb_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_hd72426e_102.conda @@ -1510,6 +1515,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-64/tk-8.6.13-hf689a15_2.conda @@ -1764,6 +1770,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h892fb3f_2.conda @@ -2010,6 +2017,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/win-64/tbb-2021.13.0-h62715c5_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh5737063_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda @@ -2286,6 +2294,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh0d859eb_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_hd72426e_102.conda @@ -2549,6 +2558,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-64/tk-8.6.13-hf689a15_2.conda @@ -2796,6 +2806,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h892fb3f_2.conda @@ -3035,6 +3046,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/win-64/tbb-2021.13.0-h62715c5_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh5737063_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda @@ -3316,6 +3328,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh0d859eb_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_hd72426e_102.conda @@ -3574,6 +3587,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-64/tk-8.6.13-hf689a15_2.conda @@ -3815,6 +3829,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h892fb3f_2.conda @@ -4048,6 +4063,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/win-64/tbb-2021.13.0-h62715c5_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh5737063_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda @@ -4324,6 +4340,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh0d859eb_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_hd72426e_102.conda @@ -4582,6 +4599,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-64/tk-8.6.13-hf689a15_2.conda @@ -4823,6 +4841,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h892fb3f_2.conda @@ -5056,6 +5075,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/win-64/tbb-2021.13.0-h62715c5_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh5737063_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda @@ -5331,6 +5351,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh0d859eb_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_hd72426e_102.conda @@ -5590,6 +5611,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-64/tk-8.6.13-hf689a15_2.conda @@ -5832,6 +5854,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h892fb3f_2.conda @@ -6066,6 +6089,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/win-64/tbb-2021.13.0-h62715c5_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh5737063_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda @@ -6341,6 +6365,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh0d859eb_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_hd72426e_102.conda @@ -6636,6 +6661,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh0d859eb_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_hd72426e_102.conda @@ -6900,6 +6926,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h892fb3f_2.conda @@ -7139,6 +7166,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/win-64/tbb-2021.13.0-h62715c5_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh5737063_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda @@ -7408,6 +7436,7 @@ environments: - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-qthelp-2.0.0-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/sphinxcontrib-serializinghtml-1.1.10-pyhd8ed1ab_1.conda - conda: https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.3-pyhd8ed1ab_1.conda + - conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda - conda: https://conda.anaconda.org/conda-forge/noarch/terminado-0.18.1-pyh31c8845_0.conda - conda: https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.4.0-pyhd8ed1ab_0.conda - conda: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h892fb3f_2.conda @@ -9362,8 +9391,8 @@ packages: timestamp: 1750744808182 - pypi: ./ name: gettsim - version: 0.7.1.dev184+gd0fe6233c.d20250721 - sha256: e41722d0b5a33a03690f6ef1575e822225b42690f6110d3ea72ede8f8939d5a8 + version: 0.7.1.dev535+g18c403c1b.d20250723 + sha256: f1c50ee4113d5524d34abe12998995f1f0ce30bb377618f3b7badc667f812cb1 requires_dist: - dags>=0.4.1 - ipywidgets @@ -17358,6 +17387,17 @@ packages: requires_dist: - pyreadline3 ; sys_platform == 'win32' requires_python: '>=3.8' +- conda: https://conda.anaconda.org/conda-forge/noarch/tabulate-0.9.0-pyhd8ed1ab_2.conda + sha256: 090023bddd40d83468ef86573976af8c514f64119b2bd814ee63a838a542720a + md5: 959484a66b4b76befcddc4fa97c95567 + depends: + - python >=3.9 + license: MIT + license_family: MIT + purls: + - pkg:pypi/tabulate?source=hash-mapping + size: 37554 + timestamp: 1733589854804 - conda: https://conda.anaconda.org/conda-forge/win-64/tbb-2021.13.0-h62715c5_1.conda sha256: 03cc5442046485b03dd1120d0f49d35a7e522930a2ab82f275e938e17b07b302 md5: 9190dd0a23d925f7602f9628b3aed511 diff --git a/pyproject.toml b/pyproject.toml index 0f6d1b291..524954a5b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -119,6 +119,7 @@ pytest-xdist = "*" python = ">=3.11,<3.14" pyyaml = "*" snakeviz = ">=2.2.2" +tabulate = "*" toml = "*" # Development Dependencies (pypi) @@ -192,6 +193,9 @@ tests-jax = "pytest --backend=jax" [tool.pixi.feature.mypy.tasks] mypy = "mypy --ignore-missing-imports" +[tool.pixi.feature.docs.tasks] +docs = "sphinx-build -T -b html docs docs/_build/html" + # Environments # -------------------------------------------------------------------------------------- diff --git a/interface-prototype.ipynb b/sandbox/interface-playground.ipynb similarity index 59% rename from interface-prototype.ipynb rename to sandbox/interface-playground.ipynb index 45699302e..d8214d2c9 100644 --- a/interface-prototype.ipynb +++ b/sandbox/interface-playground.ipynb @@ -6,14 +6,12 @@ "metadata": {}, "outputs": [], "source": [ - "from pathlib import Path\n", - "\n", "import pandas as pd\n", "\n", - "from ttsim import main\n", - "from ttsim.tt_dag_elements import ScalarParam\n", + "from gettsim import InputData, MainTarget, TTTargets, main\n", "\n", - "GETTSIM_ROOT = Path.cwd() / \"src\" / \"_gettsim\"" + "# Please ignore the import location for now; will be from gettsim in the future\n", + "from ttsim.tt_dag_elements import ScalarParam" ] }, { @@ -23,8 +21,7 @@ "# Prototypes of GETTSIM's new interface\n", "\n", "[GEP 7](https://gettsim--855.org.readthedocs.build/en/855/geps/gep-07.html) discusses\n", - "the principles of the new interface. This notebook demonstrates two candidates for\n", - "GETTIM's new interface. We would like to get your feedback on which one you prefer.\n", + "the principles of the new interface. This notebook allows you to play around with it.\n", "\n", "In this notebook, we compute income taxes and social security contributions for example\n", "data.\n", @@ -34,10 +31,11 @@ "This notebook requires to have GETTSIM installed in its current development version.\n", "\n", "To do this:\n", + "\n", "1. Clone the GETTSIM repository.\n", - "2. Install the [pixi package manager](https://pixi.sh/latest/) on your system.\n", - "3. `cd` into the GETTSIM repository and run `git checkout inputs-for-main`.\n", - "4. Run `pixi run jupyter-notebook` and select the `interface-prototype.ipynb` notebook.\n", + "2. [Install](https://pixi.sh/latest/#installation) the pixi package manager on your system.\n", + "3. In your shell, navigate (cd) to the GETTSIM repository and run `git checkout gep-07`.\n", + "4. Start the notebook with `pixi run jupyter-notebook` and open `interface-playground.ipynb`.\n", "\n", "If you have trouble with the setup, please reach out." ] @@ -46,10 +44,106 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Creating the Data\n", + "## Creating the Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first step in GETTSIM's new workflow is to define the targets you're interested in.\n", + "The key sequences of the nested dictionary below are the paths GETTSIM will use as\n", + "targets. For instance, via the path `einkommensteuer` and `betrag_m_sn`, we request the\n", + "amount of income tax to be paid monthly at the Steuernummer level. *Note: Of course, the\n", + "income tax is paid annually, but GETTSIM will do the conversion for you.*\n", "\n", - "First, we create some example data. Here, we use a pandas DataFrame with column names\n", - "that are different from the ones GETTSIM expects." + "The values on the lowest level of the dictionaries (called leaves) will be used as the\n", + "column names of the resulting DataFrame. Here, `income_tax_m` will be the name of the\n", + "column containing the income tax results.\n", + "\n", + "In this example, we are interested in the income tax and the social insurance\n", + "contributions paid when being in regular employment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TT_TARGETS = {\n", + " \"einkommensteuer\": {\"betrag_m_sn\": \"income_tax_m\"},\n", + " \"sozialversicherung\": {\n", + " \"pflege\": {\n", + " \"beitrag\": {\n", + " \"betrag_versicherter_m\": \"long_term_care_insurance_contribution_m\"\n", + " }\n", + " },\n", + " \"kranken\": {\n", + " \"beitrag\": {\"betrag_versicherter_m\": \"health_insurance_contribution_m\"}\n", + " },\n", + " \"rente\": {\n", + " \"beitrag\": {\"betrag_versicherter_m\": \"pension_insurance_contribution_m\"}\n", + " },\n", + " \"arbeitslosen\": {\n", + " \"beitrag\": {\n", + " \"betrag_versicherter_m\": \"unemployment_insurance_contribution_m\"\n", + " }\n", + " },\n", + " },\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we need to find out which input data we actually need to calculate the targets we\n", + "are interested in. We can do this by specifying a template as the `main_target` of\n", + "`gettsim.main`.\n", + "\n", + "Because we are interested social insurance contributions paid when being in regular\n", + "employment, we are not interested in retirees or households dependent on social\n", + "assistance. We can override these transfers when requesting the template. This removes\n", + "the input data needed to compute these transfers from the template." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "main(\n", + " main_target=MainTarget.templates.input_data_dtypes,\n", + " policy_date_str=\"2025-01-01\",\n", + " tt_targets=TTTargets(\n", + " tree=TT_TARGETS,\n", + " ),\n", + " input_data=InputData.tree(\n", + " {\n", + " \"p_id\": pd.Series([0]),\n", + " \"sozialversicherung\": {\n", + " \"rente\": {\n", + " \"altersrente\": {\"betrag_m\": pd.Series([0])},\n", + " },\n", + " \"arbeitslosen\": {\"betrag_m\": pd.Series([0])},\n", + " },\n", + " \"wohngeld\": {\"betrag_m_wthh\": pd.Series([0])},\n", + " \"kinderzuschlag\": {\"betrag_m_bg\": pd.Series([0])},\n", + " \"elterngeld\": {\"betrag_m\": pd.Series([0])},\n", + " \"arbeitslosengeld_2\": {\"betrag_m_bg\": pd.Series([0])},\n", + " }\n", + " ),\n", + " include_warn_nodes=False,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, we create some example data. Here, we use a pandas DataFrame with column names that are different from the ones GETTSIM expects." ] }, { @@ -58,7 +152,6 @@ "metadata": {}, "outputs": [], "source": [ - "# Some example data as a standard pandas DataFrame\n", "DATA = pd.DataFrame(\n", " {\n", " \"age\": [30, 30, 10],\n", @@ -75,7 +168,6 @@ " \"income_from_forest_and_agriculture\": [0, 0, 0],\n", " \"income_from_capital\": [500, 0, 0],\n", " \"income_from_other_sources\": [0, 0, 0],\n", - " \"pension_income\": [0, 0, 0],\n", " \"contribution_to_private_pension_insurance\": [0, 0, 0],\n", " \"childcare_expenses\": [0, 0, 0],\n", " \"person_that_pays_childcare_expenses\": [-1, -1, 0],\n", @@ -90,12 +182,6 @@ " \"parent_id_2\": [-1, -1, 1],\n", " \"in_training\": [False, False, False],\n", " \"id_recipient_child_allowance\": [-1, -1, 0],\n", - " \"wohngeld\": [0, 0, 0],\n", - " \"kinderzuschlag\": [0, 0, 0],\n", - " \"elterngeld\": [0, 0, 0],\n", - " \"alg1\": [0, 0, 0],\n", - " \"old_age_pension_income\": [0, 0, 0],\n", - " \"bürgergeld\": [0, 0, 0],\n", " }\n", ")" ] @@ -104,14 +190,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The first step in GETTSIM's new workflow is to define the targets you're interested in.\n", - "The keys of the nested dictionary below are the paths GETTSIM will use as targets. For\n", - "instance, via the keys `einkommensteuer` and `betrag_m_sn`, we request the income tax as\n", - "a target.\n", + "Next, we define a mapping from GETTSIM's expected input structure to your data. Note\n", + "that the paths are the union of the input_data for `main` and the result from calling it\n", + "above. Just the leaves are different; we have replaced the dtype hints by the column\n", + "names in the data. \n", "\n", - "The values on the lowest level of the dictionaries will be used as the column names of\n", - "the resulting DataFrame. Here, `income_tax_y` will be the name of the column containing\n", - "the income tax results." + "In practice, you would probably want to save the template above to disk (e.g. as a yaml\n", + "file) and edit it there. Then you can read in the file and use its content as the\n", + "mapper." ] }, { @@ -120,45 +206,7 @@ "metadata": {}, "outputs": [], "source": [ - "TARGETS_TREE = {\n", - " \"einkommensteuer\": {\"betrag_y_sn\": \"income_tax_y\"},\n", - " \"sozialversicherung\": {\n", - " \"pflege\": {\n", - " \"beitrag\": {\n", - " \"betrag_versicherter_m\": \"long_term_care_insurance_contribution_m\"\n", - " }\n", - " },\n", - " \"kranken\": {\n", - " \"beitrag\": {\"betrag_versicherter_m\": \"health_insurance_contribution_m\"}\n", - " },\n", - " \"rente\": {\n", - " \"beitrag\": {\"betrag_versicherter_m\": \"pension_insurance_contribution_m\"}\n", - " },\n", - " \"arbeitslosen\": {\n", - " \"beitrag\": {\n", - " \"betrag_versicherter_m\": \"unemployment_insurance_contribution_m\"\n", - " }\n", - " },\n", - " },\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we define a mapping from GETTSIM's expected input structure to your data. As\n", - "above, we map the paths GETTSIM uses to the columns of your data. (We will provide\n", - "templates for this, so you won't have to type the paths manually.)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "TREE_TO_DF_MAPPER = {\n", + "MAPPER = {\n", " \"alter\": \"age\",\n", " \"arbeitsstunden_w\": \"working_hours\",\n", " \"behinderungsgrad\": \"disability_grade\",\n", @@ -168,7 +216,7 @@ " \"wohnort_ost\": \"east_germany\",\n", " \"einkommensteuer\": {\n", " \"einkünfte\": {\n", - " \"ist_selbstständig\": \"self_employed\",\n", + " \"ist_hauptberuflich_selbstständig\": \"self_employed\",\n", " \"aus_gewerbebetrieb\": {\"betrag_m\": \"income_from_self_employment\"},\n", " \"aus_vermietung_und_verpachtung\": {\"betrag_m\": \"income_from_rent\"},\n", " \"aus_nichtselbstständiger_arbeit\": {\n", @@ -181,7 +229,11 @@ " \"aus_kapitalvermögen\": {\"kapitalerträge_m\": \"income_from_capital\"},\n", " \"sonstige\": {\n", " \"alle_weiteren_m\": \"income_from_other_sources\",\n", - " \"betrag_m\": \"pension_income\",\n", + " \"rente\": {\n", + " \"betriebliche_altersvorsorge_m\": 0.0,\n", + " \"geförderte_private_vorsorge_m\": 0.0,\n", + " \"sonstige_private_vorsorge_m\": 0.0,\n", + " },\n", " },\n", " },\n", " \"abzüge\": {\n", @@ -192,11 +244,14 @@ " \"gemeinsam_veranlagt\": \"joint_taxation\",\n", " },\n", " \"sozialversicherung\": {\n", - " \"arbeitslosen\": {\"betrag_m\": \"alg1\"},\n", + " \"arbeitslosen\": {\"betrag_m\": 0.0},\n", " \"rente\": {\n", - " \"private_rente_betrag_m\": \"amount_private_pension_income\",\n", + " \"jahr_renteneintritt\": 0,\n", " \"altersrente\": {\n", - " \"betrag_m\": \"old_age_pension_income\",\n", + " \"betrag_m\": 0.0,\n", + " },\n", + " \"erwerbsminderung\": {\n", + " \"betrag_m\": 0.0,\n", " },\n", " },\n", " \"kranken\": {\n", @@ -212,16 +267,16 @@ " \"p_id_elternteil_2\": \"parent_id_2\",\n", " },\n", " \"wohngeld\": {\n", - " \"betrag_m_wthh\": \"wohngeld\",\n", + " \"betrag_m_wthh\": 0.0,\n", " },\n", " \"kinderzuschlag\": {\n", - " \"betrag_m_bg\": \"kinderzuschlag\",\n", + " \"betrag_m_bg\": 0.0,\n", " },\n", " \"elterngeld\": {\n", - " \"betrag_m\": \"elterngeld\",\n", + " \"betrag_m\": 0.0,\n", " },\n", " \"arbeitslosengeld_2\": {\n", - " \"betrag_m_bg\": \"bürgergeld\",\n", + " \"betrag_m_bg\": 0.0,\n", " },\n", " \"kindergeld\": {\n", " \"in_ausbildung\": \"in_training\",\n", @@ -230,15 +285,37 @@ "}" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Note: When writing and reading the template to your disk, don't forget to allow for\n", + "unicode characters. This is important because many transfers have Umlaute in their\n", + "names. An example could look like this:\n", + "\n", + "```python\n", + "import yaml\n", + "\n", + "# Write the template to your disk...\n", + "with PATH_FOR_TEMPLATE.open(\"w\") as f:\n", + " yaml.dump(TEMPLATE, f, allow_unicode=True)\n", + "\n", + "# Edit the leafs in the template and then read it back in\n", + "with PATH_FOR_TEMPLATE.open(\"r\") as f:\n", + " MAPPER = {yaml.load(f, allow_unicode=True)}\n", + "```" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using GETTSIM's interface\n", "\n", - "Just as for taxes and transfers, GETTSIM's infrastructure is a DAG. GETTSIM's interface\n", - "is a function that allows you to interact with this DAG. This comes with the\n", - "advantages GETTSIM's users already know from the taxes and transfers part:\n", + "Just as for taxes and transfers, GETTSIM's `main` function is powered by a DAG. This\n", + "comes with the advantages that seasoned GETTSIM users already know from the taxes and\n", + "transfers part:\n", "- Users can select any part of the DAG as a target. This means that users can access\n", " any intermediate objects.\n", "- Users can feed any part of the DAG as input. This means that users can overwrite\n", @@ -251,15 +328,7 @@ "data. In a second example, we manipulate the policy environment to see why the interface\n", "DAG is useful.\n", "\n", - "### One-stop-shop: Computing taxes and transfers with GETTSIM\n", - "\n", - "Now we can compute taxes and transfers. For this, we need to call the `main` function.\n", - "`main` takes two input arguments:\n", - "- `inputs`: a nested dictionary of the inputs you're passing to GETTSIM.\n", - "- `output_names`: a list of the outputs you want to get from GETTSIM.\n", - "\n", - "`inputs` can be specified as a nested dictionary (see below) or as strings, separating\n", - "nesting levels with `__` (e.g. `\"input_data__df_with_mapper_df\"`).\n", + "### Simple computation of taxes and transfers\n", "\n", "Let's calculate taxes and transfers first:" ] @@ -271,53 +340,20 @@ "outputs": [], "source": [ "result = main(\n", - " inputs={\n", - " \"policy_date_str\": \"2025-01-01\",\n", - " \"input_data\": {\n", - " \"df_and_mapper\": {\n", - " \"df\": DATA,\n", - " \"mapper\": TREE_TO_DF_MAPPER,\n", - " },\n", - " },\n", - " \"targets\": {\n", - " \"tree\": TARGETS_TREE,\n", - " },\n", - " \"orig_policy_objects\": {\n", - " \"root\": GETTSIM_ROOT\n", - " }, # don't worry about this, will be gone in the future\n", - " },\n", - " output_names=[\"results__df_with_mapper\"],\n", - ")[\"results__df_with_mapper\"]\n", + " policy_date_str=\"2025-01-01\",\n", + " input_data=InputData.df_and_mapper(\n", + " df=DATA,\n", + " mapper=MAPPER,\n", + " ),\n", + " main_target=MainTarget.results.df_with_mapper,\n", + " tt_targets=TTTargets(\n", + " tree=TT_TARGETS,\n", + " ),\n", + " include_warn_nodes=False,\n", + ")\n", "result.T" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Input data can also be specified directly as a tree:\n", - "\n", - "```python\n", - "result = main(\n", - " inputs={\n", - " \"input_data\": {\n", - " \"tree\": INPUT_TREE,\n", - " },\n", - " ...\n", - "```\n", - "\n", - "Or as a DataFrame with MultiIndex columns:\n", - "\n", - "```python\n", - "result = main(\n", - " inputs={\n", - " \"input_data\": {\n", - " \"df_with_nested_columns\": DF_WITH_NESTED_COLUMNS,\n", - " },\n", - " ...\n", - "```" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -335,14 +371,9 @@ "outputs": [], "source": [ "policy_environment = main(\n", - " inputs={\n", - " \"policy_date_str\": \"2025-01-01\",\n", - " \"orig_policy_objects\": {\n", - " \"root\": GETTSIM_ROOT\n", - " }, # don't worry about this, will be gone in the future\n", - " },\n", - " output_names=[\"policy_environment\"],\n", - ")[\"policy_environment\"]" + " policy_date_str=\"2025-01-01\",\n", + " main_target=MainTarget.policy_environment,\n", + ")" ] }, { @@ -368,8 +399,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We replace the `ScalarParam` object with a new one. Then, we add this parameter to the\n", - "new policy environment." + "We get the current `value` of the `ScalarParam` out. We then inject a new `ScalarParam` object into the same place of `policy_environment`:" ] }, { @@ -381,28 +411,16 @@ "old_beitragssatz = policy_environment[\"sozialversicherung\"][\"rente\"][\"beitrag\"][\n", " \"beitragssatz\"\n", "]\n", - "new_beitragssatz = ScalarParam( # don't worry too much about this, will get easier\n", - " leaf_name=old_beitragssatz.leaf_name,\n", - " start_date=old_beitragssatz.start_date,\n", - " end_date=old_beitragssatz.end_date,\n", - " value=old_beitragssatz.value + 0.01,\n", - " unit=old_beitragssatz.unit,\n", - " description=old_beitragssatz.description,\n", - " name=old_beitragssatz.name,\n", - " reference_period=old_beitragssatz.reference_period,\n", - ")\n", - "\n", - "modified_policy_environment = policy_environment.copy()\n", - "modified_policy_environment[\"sozialversicherung\"][\"rente\"][\"beitrag\"][\n", - " \"beitragssatz\"\n", - "] = new_beitragssatz" + "policy_environment[\"sozialversicherung\"][\"rente\"][\"beitrag\"][\"beitragssatz\"] = (\n", + " ScalarParam(value=old_beitragssatz.value + 0.01)\n", + ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now we can compute taxes and transfers using the modified policy environment." + "Now we can compute taxes and transfers with the increased contribution rate:" ] }, { @@ -412,21 +430,18 @@ "outputs": [], "source": [ "result = main(\n", - " inputs={\n", - " \"policy_date_str\": \"2025-01-01\",\n", - " \"input_data\": {\n", - " \"df_and_mapper\": {\n", - " \"df\": DATA,\n", - " \"mapper\": TREE_TO_DF_MAPPER,\n", - " },\n", - " },\n", - " \"targets\": {\n", - " \"tree\": TARGETS_TREE,\n", - " },\n", - " \"policy_environment\": policy_environment,\n", - " },\n", - " output_names=[\"results__df_with_mapper\"],\n", - ")[\"results__df_with_mapper\"]\n", + " main_target=MainTarget.results.df_with_mapper,\n", + " policy_date_str=\"2025-01-01\",\n", + " policy_environment=policy_environment,\n", + " input_data=InputData.df_and_mapper(\n", + " df=DATA,\n", + " mapper=MAPPER,\n", + " ),\n", + " tt_targets=TTTargets(\n", + " tree=TT_TARGETS,\n", + " ),\n", + " include_warn_nodes=False,\n", + ")\n", "result.T" ] }, diff --git a/src/ttsim/interface_dag_elements/warn_if.py b/src/ttsim/interface_dag_elements/warn_if.py index 211d6f85b..1c07cfca7 100644 --- a/src/ttsim/interface_dag_elements/warn_if.py +++ b/src/ttsim/interface_dag_elements/warn_if.py @@ -50,7 +50,7 @@ def functions_and_data_columns_overlap( that appears in the list above. Turn off warnings by setting `include_warn_nodes=False` in `main`. - If you want to be selective about warnings, include these + If you want to be selective about warnings, include these among the `main_targets`. """, ) @@ -66,7 +66,7 @@ def functions_and_data_columns_overlap( each column that appears in the list above. Turn off warnings by setting `include_warn_nodes=False` in `main`. - If you want to be selective about warnings, include these + If you want to be selective about warnings, include these among the `main_targets`. """, )