chore: replace zip with zip_strict#3003
Conversation
|
@FBruzzesi I tried to replace |
Hey @raisadz if I understand you correctly, you tried to replace every The issue in safe_keys = [
# multi-output expression cannot have duplicate names, hence it's safe to suffix
key.name.map(_temporary_name)
if (metadata := key._metadata) and metadata.expansion_kind.is_multi_output()
# otherwise it's single named and we can use Expr.alias
else key.alias(_temporary_name(new_name))
for key, new_name in zip(keys, output_names)
]The test that is failing evaluates to the first expression: However this actually spots a hidden issue in case for which multiple "mixed" keys are passed. Let's consider the test case we have: def test_group_by_selector(constructor: Constructor) -> None:
data = {"a": [1, 1, 1], "b": [4, 4, 6], "c": [7.5, 8.5, 9.0]}
result = (
nw.from_native(constructor(data))
.group_by(nw.selectors.by_dtype(nw.Int64))
.agg(nw.col("c").mean())
.sort("a", "b")
)
expected = {"a": [1, 1], "b": [4, 6], "c": [8.0, 9.0]}
assert_equal_data(result, expected)If we were to replace the group by statement with keys = [nw.selectors.by_dtype(nw.Int64), nw.col("c")]
output_names = ['a', 'b', 'c']which is bad as we would end up aliasing "c" as "b". Let me try to address it in a dedicated PR. Unrelated to the group by issue, could we rename
|
|
cool, thanks, definitely in favour of using strict zip everywhere by default, especially if it's helped uncover an issue |
I really disagree with this, the changes in (https://github.com/narwhals-dev/narwhals/pull/3003/files#diff-168a6da9cf3080d5790d562874fe82ff45411bc50b8a1bb90b6dc00ee19ea3ca) aren't safer - they just repeatedly check the same inputs are still the same length I'm happy with strict zip when we actually don't know the length, but if we can't guarantee anywhere that we have equal length iterables - then there's a major design issue 🤔 |
|
what's the downside of using this? |
It adds overhead multiple times to the same operation How many times is narwhals/tests/selectors_test.py Lines 197 to 225 in a4d8ebf |
|
true, but checking if it's necessary or not is a manual task and thus is error-prone. if we decide to use it everywhere, then it's a simple pre-commit check to enforce it from what i can see, the overhead is minimal (and comparable to the rest of the Python operations we do) so I think I'd rather take this hit (if any, i'd be quite surprised if it actually showed up in a performance test, we're typically zipping over very few elements at a time anyway) and have it be safer. When we're Python3.10, we can just use |
|
Again I'm not concerned about using this where we actually need to validate it But there are a very large number of changes here that are blindly doing it If @property
def schema(self) -> dict[str, DType]:
schema = self.native.schema
return {
name: native_to_narwhals_dtype(dtype, self._version)
for name, dtype in zip_strict(schema.names, schema.types)
}This occurs directly after a length check return {
name: to_native_dtype(dtype=dtype, dtype_backend=backend)
for name, dtype, backend in zip_strict(self.keys(), self.values(), backends)
}This creates things to All I'm asking for is to do this in a measured way and 100% if it fixes an issue (#3003 (comment)) |
|
Adding my two cents in the conversation since this PR probably was triggered by a request I did to validate the input in My understanding of the situation is the following:
|
Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com>
|
To get coverage, this might be a starting point def test_strict_zip(monkeypatch: pytest.MonkeyPatch) -> None:
with monkeypatch.context() as mp:
mp.setattr(sys, "version_info", (3, 9)) |
|
thanks all we're already running the test suite with python 3.9 so i'd be ok with just pragma no covering the else branch |
zip with zip equalzip with zip_strict
|
thanks all - let's ship this then! |

What type of PR is this? (check all applicable)
Related issues
DataFrame.top_kandLazyFrame.top_k#2977Checklist
If you have comments or can explain your changes, please do so below