Prepare for jitting / vectorization of GETTSIM #891

hmgaudecker · 2025-05-01T05:54:13Z

What problem do you want to solve?

A brief shot at trying to do the same for GETTSIM what #879 did for METTSIM.

Good news: test_full_taxes_and_transfers runs nicely!
Bad news: Many tests fail because the logic in dividing up the taxes and transfers function is not elaborate enough. In the first pass, we are trying to build all ids. However, in many cases we are missing the required input data (made-up example: Einkommensteuer tests may require calculation of sn_id, but won't have all inputs required for bg_id).

Solving this should be doable (first set up the entire graph, then check which ids are needed), but the required functions are buried inside of compute_taxes_and_transfers so we should not waste time on that before implementing the new interface.

…n.to_source. In limited set of experiments, it produced exactly the same result.

…g tests. Fix typing and some isinstance checks.

…d_transfers.py'. Fails.

…ons.

… group id).

… will do in the office where I had merged things but not pushed.

…-economics/gettsim into vectorize-mettsim

…here.

…h 'loop' where necessary.

…he default.

hmgaudecker · 2025-05-01T12:09:11Z

Update: I changed the default of vectorization_strategy to be vectorize now and added loop only where required. Should not be too hard to adjust most of these cases, eventually, I probably added a few too much.

What I did not realise until recently is that jax.jit and vectorising the functions are not nested. I had always thought of the latter as a prerequisite of the former... @timmens @mj023, I guess it will be mainly compilation speed / efficiency that we mostly want to have vectorised functions to be jitted?

codecov · 2025-05-01T12:12:06Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.69%. Comparing base (7980a57) to head (04f4847).
Report is 1 commits behind head on collect-components-of-namespaces.

Additional details and impacted files

@@                          Coverage Diff                          @@
##           collect-components-of-namespaces     #891       +/-   ##
=====================================================================
- Coverage                             83.57%   69.69%   -13.88%     
=====================================================================
  Files                                   147      147               
  Lines                                  5704     5702        -2     
=====================================================================
- Hits                                   4767     3974      -793     
- Misses                                  937     1728      +791

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MImmesberger

I think the logic of calculating IDs beforehand and calculate the rest of the DAG afterwards fails for stuff like bg_id and wthh_id. Their dependencies consist basically of almost the entire DAG. (That's probably what you meant, I just want to make sure we're one the same page). So users would not get speed benefit if their data is large, but only if they want to call GETTSIM repeatedly on the same (stable - with respect to BD and WTHH) household constellation.

MImmesberger · 2025-05-01T17:42:54Z

src/_gettsim_tests/utils.py

@@ -51,7 +56,7 @@ def __init__(
        date: datetime.date,
    ) -> None:
        self.info = info
-        self.input_tree = input_tree
+        self.input_tree = optree.tree_map(np.array, input_tree)


Wouldn't we want np.array([1, 2, 3]) instead of [np.array(1), np.array(2), np.array(3)] here?

That's what happens in my book 😉

(Pdb+) input_tree { 'alter': 0 67 1 34 2 37 3 48 dtype: int64, ... } (Pdb+) self.input_tree {'alter': array([67, 34, 37, 48]), ... }

hmgaudecker · 2025-05-01T18:48:32Z

I think the logic of calculating IDs beforehand and calculate the rest of the DAG afterwards fails for stuff like bg_id and wthh_id. Their dependencies consist basically of almost the entire DAG. (That's probably what you meant, I just want to make sure we're one the same page). So users would not get speed benefit if their data is large, but only if they want to call GETTSIM repeatedly on the same (stable - with respect to BD and WTHH) household constellation.

And yes, I'd think that will be a fairly common use case. And of course, one could do 3+ calls of GETTSIM; just that we can't / probably don't want to bake that into gettsim.oss.

hmgaudecker and others added 30 commits April 22, 2025 15:13

Use ast.unparse (available since Python 3.9) instead of astor.code_ge…

3a7cbe7

…n.to_source. In limited set of experiments, it produced exactly the same result.

Remove checks from grouped calculations via JAX and skip correspondin…

b84cd83

…g tests. Fix typing and some isinstance checks.

Attempt to jit three more tests in 'tests/ttsim/test_compute_taxes_an…

1929585

…d_transfers.py'. Fails.

A few more errors when trying to jit the tests in test_combine_functi…

7daffb7

…ons.

Vectorize policy functions used in mettsim testing

506e8e6

Merge branch 'collect-components-of-namespaces' into vectorize-mettsim

732a8a9

Merge branch 'collect-components-of-namespaces' into vectorize-mettsim

bd0bef4

Use typed aggregation_jax from #887 to avoid conflicts down the road

8e808e5

Allow for different signatures in aggregation functions

5bd38e1

Add num_segments argument to JAX aggregation functions

62a30b0

Create rough first draft of jitted tax_transfer_function

e91da97

Adjust a few tests

054bd21

Adjust a few tests in test_combine_functions.py

bc72274

Merge branch 'collect-components-of-namespaces' into vectorize-mettsim

5ec9ba6

Add num_segments to other test case.

e539bbb

Update test_aggregation_functions, call things by name.

62ddd05

Use correct num_segments in aggregation tests (needs to be 1 + max of…

01190b0

… group id).

Almost done; need to add [group_id]_num_segments to global namespace,…

8cc6c9f

… will do in the office where I had merged things but not pushed.

Merge branch 'vectorize-mettsim' of github.com:iza-institute-of-labor…

f495102

…-economics/gettsim into vectorize-mettsim

Add relevant num_segments to global namespace, fix tests.

66a7b5b

Fix vectorisation tests.

c294efc

Almost there.

bce2f59

Take care of case where we only have group_ids as output.

108720d

Remove test that is now obsolete.

f48235f

Changelog.

a4675cf

Remove commented-out code

74cb0b8

Remove marking partialled arguments as static_argnames (redundant).

757ba3c

Prepare for using Jax with GETTSIM; remove SUPPORTED_GROUPINGS everyw…

91c024a

…here.

Adjust group_creation_functions so they work with non-jitted Jax.

d64e8f3

Use correct format for 'jahr'. Will remove entirely, not used anymore.

feabbb5

hmgaudecker added 5 commits May 1, 2025 07:25

Replace float test with approximate equality.

ea36c90

Remove 'jahr' key from the test files.

55a2a49

Use Umlaute in file names.

75960ff

Change default of vectorization_strategy to 'vectorize', override wit…

4164f87

…h 'loop' where necessary.

Remove vectorization_strategy='vectorize' everywhere now that it is t…

d8461b8

…he default.

hmgaudecker requested a review from MImmesberger May 1, 2025 11:59

Merge branch 'collect-components-of-namespaces' into vectorize-gettsim

04f4847

MImmesberger approved these changes May 1, 2025

View reviewed changes

hmgaudecker merged commit d9248f9 into collect-components-of-namespaces May 1, 2025
8 of 9 checks passed

hmgaudecker deleted the vectorize-gettsim branch May 1, 2025 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prepare for jitting / vectorization of GETTSIM #891

Prepare for jitting / vectorization of GETTSIM #891

Uh oh!

hmgaudecker commented May 1, 2025 •

edited

Loading

Uh oh!

hmgaudecker commented May 1, 2025

Uh oh!

codecov bot commented May 1, 2025 •

edited

Loading

Uh oh!

MImmesberger left a comment

Uh oh!

MImmesberger May 1, 2025

Uh oh!

hmgaudecker May 1, 2025

Uh oh!

hmgaudecker commented May 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Prepare for jitting / vectorization of GETTSIM #891

Prepare for jitting / vectorization of GETTSIM #891

Uh oh!

Conversation

hmgaudecker commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem do you want to solve?

Uh oh!

hmgaudecker commented May 1, 2025

Uh oh!

codecov bot commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MImmesberger left a comment

Choose a reason for hiding this comment

Uh oh!

MImmesberger May 1, 2025

Choose a reason for hiding this comment

Uh oh!

hmgaudecker May 1, 2025

Choose a reason for hiding this comment

Uh oh!

hmgaudecker commented May 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hmgaudecker commented May 1, 2025 •

edited

Loading

codecov bot commented May 1, 2025 •

edited

Loading