Skip to content

Conversation

@jordyjwilliams
Copy link
Contributor

@jordyjwilliams jordyjwilliams commented Jun 26, 2025

Summary

Test Plan

  • Added new test cases to reproduce the issue of a non pandas import throwing.

Current main

  • Fails added tests.

With new changes

  • Tests pass.

Appendix

Added negative case tests failing on `main`
failures:

---- rules::pandas_vet::tests::contents::r_import_polars_as_pl_x_pl_dataframe_x_drop_a_inplace_true_pd002_pass_polars_expects stdout ----
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot file: crates/ruff_linter/src/rules/pandas_vet/snapshots/ruff_linter__rules__pandas_vet__tests__PD002_pass_polars.snap
Snapshot: PD002_pass_polars
Source: crates/ruff_linter/src/rules/pandas_vet/mod.rs:373
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
          0 │+<filename>:4:15: PD002 [*] `inplace=True` should be avoided; it has inconsistent behavior
          1 │+  |
          2 │+2 | import polars as pl
          3 │+3 | x = pl.DataFrame()
          4 │+4 | x.drop(['a'], inplace=True)
          5 │+  |               ^^^^^^^^^^^^ PD002
          6 │+  |
          7 │+  = help: Assign to variable; remove `inplace` arg
          8 │+
          9 │+ℹ Unsafe fix
         10 │+1 1 |
         11 │+2 2 | import polars as pl
         12 │+3 3 | x = pl.DataFrame()
         13 │+4   |-x.drop(['a'], inplace=True)
         14 │+  4 |+x = x.drop(['a'])
────────────┴───────────────────────────────────────────────────────────────────
To update snapshots run `cargo insta review`
Stopped on the first failure. Run `cargo insta test` to run all snapshots.

thread 'rules::pandas_vet::tests::contents::r_import_polars_as_pl_x_pl_dataframe_x_drop_a_inplace_true_pd002_pass_polars_expects' panicked at /Users/jordywilliams/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/insta-1.43.1/src/runtime.rs:679:13:
snapshot assertion for 'PD002_pass_polars' failed in line 373
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- rules::pandas_vet::tests::contents::r_x_dataframe_x_drop_a_inplace_true_pd002_pass_no_import_expects stdout ----
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot file: crates/ruff_linter/src/rules/pandas_vet/snapshots/ruff_linter__rules__pandas_vet__tests__PD002_pass_no_import.snap
Snapshot: PD002_pass_no_import
Source: crates/ruff_linter/src/rules/pandas_vet/mod.rs:373
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
          0 │+<filename>:3:15: PD002 [*] `inplace=True` should be avoided; it has inconsistent behavior
          1 │+  |
          2 │+2 | x = DataFrame()
          3 │+3 | x.drop(['a'], inplace=True)
          4 │+  |               ^^^^^^^^^^^^ PD002
          5 │+  |
          6 │+  = help: Assign to variable; remove `inplace` arg
          7 │+
          8 │+ℹ Unsafe fix
          9 │+1 1 |
         10 │+2 2 | x = DataFrame()
         11 │+3   |-x.drop(['a'], inplace=True)
         12 │+  3 |+x = x.drop(['a'])
────────────┴───────────────────────────────────────────────────────────────────
To update snapshots run `cargo insta review`
Stopped on the first failure. Run `cargo insta test` to run all snapshots.

thread 'rules::pandas_vet::tests::contents::r_x_dataframe_x_drop_a_inplace_true_pd002_pass_no_import_expects' panicked at /Users/jordywilliams/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/insta-1.43.1/src/runtime.rs:679:13:
snapshot assertion for 'PD002_pass_no_import' failed in line 373


failures:
    rules::pandas_vet::tests::contents::r_import_polars_as_pl_x_pl_dataframe_x_drop_a_inplace_true_pd002_pass_polars_expects
    rules::pandas_vet::tests::contents::r_x_dataframe_x_drop_a_inplace_true_pd002_pass_no_import_expects

test result: FAILED. 2442 passed; 2 failed; 4 ignored; 0 measured; 0 filtered out; finished in 0.93s

error: test failed, to rerun pass `-p ruff_linter --lib`

/// PD002
pub(crate) fn inplace_argument(checker: &Checker, call: &ast::ExprCall) {
// If the function was imported from another module, and it's _not_ Pandas, abort.
if checker
Copy link
Member

@MichaReiser MichaReiser Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is now outdated.

Can you tell me more why we remove this check entirely?

CC: @dhruvmanila

Copy link
Contributor Author

@jordyjwilliams jordyjwilliams Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is now outdated.

Can you tell me more why we remove this check entirely?

CC: @dhruvmanila

Fair call, I will remove the comment (7348b1c). I removed the check for the following reasons:

  • To align better with the code styling and implementation from: Skip panda rules if panda module hasn't been seen #14671
  • As noted in that PR yes this may give false negatives. But this is preferred to false positives (less end-user gripe)
  • I believe the check is now unnecessary given the additional tests added
    • On current main these would fail, eg they would trigger the pandas linter issue.
    • With these changes, they now pass.
    • All existing tests pass, thus no existing functionality (that I'm aware of) will change.

If there's something I'm missing here please let me know. Happy to add back in the check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EG with these changes we are using the cheker however in a different way (as introduced in #14671.

!checker.semantic().seen_module(Modules::PANDAS) we will now skip the check pandas has not been seen (used) within the offending code.

Thus; this will allow for less false positives with similar codes on in-place operations. eg pl.dataframe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaReiser Isn't this change in line with what you did in #14671? In other words, why wasn't that change applied to this rule?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing check would only work when the method is used directly like pandas.DataFrame.sort_values and not when used as df = pandas.DataFrame(); df.sort_values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing check would only work when the method is used directly like pandas.DataFrame.sort_values and not when used as df = pandas.DataFrame(); df.sort_values.

Yeah, I think that's the main difference. This PR removes some false positives (when panda isn't enabled at all) but introduces some new false positives (when using panda but the method doesn't come from pandas). This is different from my PRs where I only added the check.

However, I think this is still fine because it is in line with all other pandas rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaReiser good to merge from your point of view here? What I guess I was going for here was standardization.

And also basically looking to commit to this repo for the first time. @dhruvmanila's suggestion is what I was implementing here for robustness, completeness and to close out #6432 (comment)

In my experience people who would use pandas.DataFrame would typically have these as variables and run my_df.sort_values(in_place=True) or so... Thus I figured it's better to pick up on these cases (when we know pandas has been seen)... And rather have all pd methods work the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's fine. Thank you for working on this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nws, thanks for the approval and merge.

Anything further required from my side after merge?

I have read the contributing guide just trying to make sure there's nothing I've missed here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, all good from your side. This will go out in the next release

@github-actions
Copy link
Contributor

github-actions bot commented Jun 26, 2025

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+1 -1 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

apache/superset (+1 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --no-preview --select ALL

- tests/integration_tests/model_tests.py:491:29: PD002 `inplace=True` should be avoided; it has inconsistent behavior
+ tests/unit_tests/pandas_postprocessing/test_rename.py:65:9: PD002 `inplace=True` should be avoided; it has inconsistent behavior

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
PD002 2 1 1 0 0

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+1 -1 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

apache/superset (+1 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

- tests/integration_tests/model_tests.py:491:29: PD002 `inplace=True` should be avoided; it has inconsistent behavior
+ tests/unit_tests/pandas_postprocessing/test_rename.py:65:9: PD002 `inplace=True` should be avoided; it has inconsistent behavior

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
PD002 2 1 1 0 0

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Edit: I didn't see Micha's review, I'll defer to him and Dhruv.

@jordyjwilliams
Copy link
Contributor Author

Looks good, thanks!

Edit: I didn't see Micha's review, I'll defer to him and Dhruv.

Noted. Will wait until Micha approves or has concerns addressed.

@MichaReiser MichaReiser added the rule Implementing or modifying a lint rule label Jun 27, 2025
@MichaReiser MichaReiser enabled auto-merge (squash) June 27, 2025 06:53
@MichaReiser MichaReiser merged commit 1874d52 into astral-sh:main Jun 27, 2025
34 checks passed
@jordyjwilliams jordyjwilliams deleted the 6432_pandas_on_non_pandas_fix branch June 27, 2025 06:58
@dhruvmanila dhruvmanila changed the title [pandas]: Fix issue on non pandas dataframe in-place usage (PD002) [pandas] Avoid flagging PD002 if pandas is not imported Jun 27, 2025
dcreager added a commit that referenced this pull request Jun 27, 2025
* main:
  [ty] Add builtins to completions derived from scope (#18982)
  [ty] Don't add incorrect subdiagnostic for unresolved reference (#18487)
  [ty] Simplify `KnownClass::check_call()` and `KnownFunction::check_call()` (#18981)
  [ty] Add micro-benchmark for #711 (#18979)
  [`flake8-annotations`] Make `ANN401` example error out-of-the-box (#18974)
  [`flake8-async`] Make `ASYNC110` example error out-of-the-box (#18975)
  [pandas]: Fix issue on `non pandas` dataframe `in-place` usage (PD002) (#18963)
  [`pylint`] Fix `PLC0415` example (#18970)
  [ty] Add environment variable to dump Salsa memory usage stats (#18928)
  [`pylint`] Fix `PLW0108` autofix introducing a syntax error when the lambda's body contains an assignment expression (#18678)
  Bump 0.12.1 (#18969)
  [`FastAPI`] Add fix safety section to `FAST002` (#18940)
  [ty] Add regression test for leading tab mis-alignment in diagnostic rendering (#18965)
  [ty] Resolve python environment in `Options::to_program_settings` (#18960)
  [`ruff`] Fix false positives and negatives in `RUF010` (#18690)
  [ty] Fix rendering of long lines that are indented with tabs
  [ty] Add regression test for diagnostic rendering panic
  [ty] Move venv and conda env discovery to `SearchPath::from_settings` (#18938)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rule Implementing or modifying a lint rule

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants