Skip to content

Conversation

@MImmesberger
Copy link
Collaborator

What problem do you want to solve?

Closes #1006

@MImmesberger MImmesberger linked an issue Jul 17, 2025 that may be closed by this pull request
@MImmesberger MImmesberger requested a review from hmgaudecker July 17, 2025 12:34
@codecov
Copy link

codecov bot commented Jul 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks!


@fail_function(
include_if_any_element_present=[
"specialized_environment__tax_transfer_dag",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"specialized_environment__tax_transfer_dag",
"specialized_environment__tax_transfer_function",

My bad in the issue description. The DAG only requires the labels.

Copy link
Collaborator Author

@MImmesberger MImmesberger Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this PR is just done with this change:

@fail_function(
    include_if_any_element_present=[
->      "raw_results__columns",
    ]
)
def root_nodes_are_missing(

I think it's just that the root_nodes_are_missing triggers to early. It shouldn't do
when just using specialized_environment__tax_transfer_dag as a target, and maybe it
also shouldn't when we target specialized_environment__tax_transfer_function.

Do we need input data for anything else than num_segments when creating the DAG or the
TT function? Both should be fine without, no? We could use the default for
num_segments then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's it. Let's not worry too much.

The num_segments needs to be at least a large as the number of observations. But it is Jax-specific and we cannot check everything -- the probability that someone creates a function with some input data and then injects other data with Jax seems sufficiently small. Sort of similar to #966. If you feel like it, create an issue, but nothing to worry about now.

@MImmesberger MImmesberger requested a review from hmgaudecker July 17, 2025 20:29
Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I did not immerse myself enough into this when replying above -- was thinking that the root nodes in root_nodes_are_missing were those of the interface DAG 🙈

  • Renamed to fail_if.tt_root_nodes_are_missing
  • Changed the behaviour s.t. we send a different message when processed_data is empty
  • Removed a test so that we continue to be a bit stricter (less code, less special cases)

Please double-check and merge as you see fit!

assert flat_result_template.keys() == flat_expected.keys()


def test_can_create_tt_function(backend: Literal["numpy", "jax"]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that this is precisely what we don't want, as the num_segments will be potentially wrong. Certainly we don't want a test for this.

@MImmesberger
Copy link
Collaborator Author

Okay, thanks! RE creating TT function without input data: The more intuitive approach for me would be to ask the user to provide num_segments themselves if they want to be input data agnostic, most users won't care about num_segments because they use numpy and those who do are advanced users. For me, this feels more natural than (what I think is) the current workflow:

  • creating auxiliary input data that resembles the one used later to create the TT function
  • let a model create the real input data
  • call the function upon the real input data

But you put far more thought into this than I did, so I might be missing something!

@hmgaudecker
Copy link
Collaborator

We also need the data for checking scalar inputs -- these will be put into the with_processed_... environment directly. Anyhow, I think we are talking about something close to an empty set of use cases here. If it turns out we are not, we can always relax the behaviour.

@hmgaudecker hmgaudecker merged commit e0d9801 into collect-components-of-namespaces Jul 18, 2025
16 of 17 checks passed
@hmgaudecker hmgaudecker deleted the fail-if-input-data-missing branch July 18, 2025 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: ("fail_if", "input_data_are_missing")

3 participants