Skip to content

Refactor: QuantityFactory data dimensions operations#272

Merged
FlorianDeconinck merged 10 commits into
NOAA-GFDL:developfrom
FlorianDeconinck:refactor/update_ddim_qty_factory
Oct 23, 2025
Merged

Refactor: QuantityFactory data dimensions operations#272
FlorianDeconinck merged 10 commits into
NOAA-GFDL:developfrom
FlorianDeconinck:refactor/update_ddim_qty_factory

Conversation

@FlorianDeconinck
Copy link
Copy Markdown
Collaborator

@FlorianDeconinck FlorianDeconinck commented Oct 22, 2025

Description

The set_extra_dim_lengths function of QuantityFactory is badly named: it updates the dictionary of data dimensions, rather than set.

Usage of this function is also a bit odd, you mostly want to add dimensions. But because a QuantityFactory can be shared (should be shared) across many codes you want to make sure to not override previous dimensions set.

Introducing:

  • QuantityFactory.add_data_dimensions: check for existing names and error out.
  • QuantityFactory.update_data_dimensions: update existing or add new ones.

Deprecating set_extra_dim_lengths for later removal.

  • Unit tests and proper type hints

How has this been tested?

New utest.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (e.g. add new modules to docs/docstrings/)
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • New check tests, if applicable, are included

@FlorianDeconinck
Copy link
Copy Markdown
Collaborator Author

Ping @CharlesKrop @katrinafandrich

Copy link
Copy Markdown
Collaborator

@romanc romanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the naming and I think we should propagate it down into the sizer.

Comment thread ndsl/initialization/allocator.py Outdated
Comment thread ndsl/initialization/allocator.py Outdated
Comment thread ndsl/initialization/allocator.py Outdated
@FlorianDeconinck
Copy link
Copy Markdown
Collaborator Author

Blocked by NOAA-GFDL/pace#154

Copy link
Copy Markdown
Collaborator

@romanc romanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That turned out bigger then I expected. Thanks for the cleanup! 🧹 ✨

Comment thread ndsl/initialization/subtile_grid_sizer.py Outdated
extra_dim_lengths: dict[str, int],
layout: tuple[int, int],
*,
data_dimensions: dict[str, int] = {},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a python anti-pattern to have an empty dict/list as default argument. The reason is that this will coupke function invocations because that empty dict might not be empty anymore once we get here in the next invocation. People usually use dict | None as type with None as default and the test for Non inside and assign a {} in that case (wich is then local to that call).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also do a follow-up, but basically, as far as I can see, the last commit, bcea590, wouldn't be necessary if we had

        data_dimensions: dict[str, int] | None = None,

in the argument list and then do some more fiddling inside the function. This is surprising, especially coming form C++ where default arguments behave differently. Here's a simple example of what goes wrong.

I think there's mypy or flake rules to catch those. I remember seeing them (probably in gt4py). I'll search tomorrow.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it's in flake8-bugbear

B006: Do not use mutable data structures for argument defaults. They are created during function definition time. All calls to the function reuse this one instance of that data structure, persisting changes between them.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up is here: #277

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to fix this is actually use a dataclasses.field(default_factory) on the base dataclass

@FlorianDeconinck FlorianDeconinck added this pull request to the merge queue Oct 23, 2025
auto-merge was automatically disabled October 23, 2025 20:56

Pull Request is not mergeable

Merged via the queue into NOAA-GFDL:develop with commit cc47585 Oct 23, 2025
6 checks passed
romanc added a commit to romanc/PyFV3 that referenced this pull request Oct 28, 2025
"extra dims" have been renamed to "data dimensions" in NDSL [1] and
after doing so we figured that we need to be careful with mutable
default values [2]. Togehter, these two PRs make specifying
`extra_dim_lengths` unnecessary if they are the default empty
dictionary.

[1]: NOAA-GFDL/NDSL#272
[2]: NOAA-GFDL/NDSL#277
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants