-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[pandas] Avoid flagging PD002 if pandas is not imported
#18963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pandas] Avoid flagging PD002 if pandas is not imported
#18963
Conversation
non pandas `DataFrame` should pass
| /// PD002 | ||
| pub(crate) fn inplace_argument(checker: &Checker, call: &ast::ExprCall) { | ||
| // If the function was imported from another module, and it's _not_ Pandas, abort. | ||
| if checker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is now outdated.
Can you tell me more why we remove this check entirely?
CC: @dhruvmanila
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is now outdated.
Can you tell me more why we remove this check entirely?
CC: @dhruvmanila
Fair call, I will remove the comment (7348b1c). I removed the check for the following reasons:
- To align better with the code styling and implementation from: Skip panda rules if panda module hasn't been seen #14671
- As noted in that PR yes this may give false negatives. But this is preferred to false positives (less end-user gripe)
- I believe the check is now unnecessary given the additional tests added
- On current
mainthese would fail, eg they would trigger thepandas linterissue. - With these changes, they now pass.
- All existing tests pass, thus no existing functionality (that I'm aware of) will change.
- On current
If there's something I'm missing here please let me know. Happy to add back in the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EG with these changes we are using the cheker however in a different way (as introduced in #14671.
!checker.semantic().seen_module(Modules::PANDAS) we will now skip the check pandas has not been seen (used) within the offending code.
Thus; this will allow for less false positives with similar codes on in-place operations. eg pl.dataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaReiser Isn't this change in line with what you did in #14671? In other words, why wasn't that change applied to this rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing check would only work when the method is used directly like pandas.DataFrame.sort_values and not when used as df = pandas.DataFrame(); df.sort_values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing check would only work when the method is used directly like pandas.DataFrame.sort_values and not when used as df = pandas.DataFrame(); df.sort_values.
Yeah, I think that's the main difference. This PR removes some false positives (when panda isn't enabled at all) but introduces some new false positives (when using panda but the method doesn't come from pandas). This is different from my PRs where I only added the check.
However, I think this is still fine because it is in line with all other pandas rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaReiser good to merge from your point of view here? What I guess I was going for here was standardization.
And also basically looking to commit to this repo for the first time. @dhruvmanila's suggestion is what I was implementing here for robustness, completeness and to close out #6432 (comment)
In my experience people who would use pandas.DataFrame would typically have these as variables and run my_df.sort_values(in_place=True) or so... Thus I figured it's better to pick up on these cases (when we know pandas has been seen)... And rather have all pd methods work the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it's fine. Thank you for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nws, thanks for the approval and merge.
Anything further required from my side after merge?
I have read the contributing guide just trying to make sure there's nothing I've missed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, all good from your side. This will go out in the next release
|
| code | total | + violation | - violation | + fix | - fix |
|---|---|---|---|---|---|
| PD002 | 2 | 1 | 1 | 0 | 0 |
Linter (preview)
ℹ️ ecosystem check detected linter changes. (+1 -1 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)
apache/superset (+1 -1 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL
- tests/integration_tests/model_tests.py:491:29: PD002 `inplace=True` should be avoided; it has inconsistent behavior + tests/unit_tests/pandas_postprocessing/test_rename.py:65:9: PD002 `inplace=True` should be avoided; it has inconsistent behavior
Changes by rule (1 rules affected)
| code | total | + violation | - violation | + fix | - fix |
|---|---|---|---|---|---|
| PD002 | 2 | 1 | 1 | 0 | 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
Edit: I didn't see Micha's review, I'll defer to him and Dhruv.
Noted. Will wait until Micha approves or has concerns addressed. |
non pandas dataframe in-place usage (PD002)pandas] Avoid flagging PD002 if pandas is not imported
* main: [ty] Add builtins to completions derived from scope (#18982) [ty] Don't add incorrect subdiagnostic for unresolved reference (#18487) [ty] Simplify `KnownClass::check_call()` and `KnownFunction::check_call()` (#18981) [ty] Add micro-benchmark for #711 (#18979) [`flake8-annotations`] Make `ANN401` example error out-of-the-box (#18974) [`flake8-async`] Make `ASYNC110` example error out-of-the-box (#18975) [pandas]: Fix issue on `non pandas` dataframe `in-place` usage (PD002) (#18963) [`pylint`] Fix `PLC0415` example (#18970) [ty] Add environment variable to dump Salsa memory usage stats (#18928) [`pylint`] Fix `PLW0108` autofix introducing a syntax error when the lambda's body contains an assignment expression (#18678) Bump 0.12.1 (#18969) [`FastAPI`] Add fix safety section to `FAST002` (#18940) [ty] Add regression test for leading tab mis-alignment in diagnostic rendering (#18965) [ty] Resolve python environment in `Options::to_program_settings` (#18960) [`ruff`] Fix false positives and negatives in `RUF010` (#18690) [ty] Fix rendering of long lines that are indented with tabs [ty] Add regression test for diagnostic rendering panic [ty] Move venv and conda env discovery to `SearchPath::from_settings` (#18938)
Summary
pandasdataframes.PD002is still triggering.seen_modulethat was introduced there.Test Plan
non pandasimport throwing.Current
mainWith new changes
Appendix
Added negative case tests failing on `main`