Implement chained comparison improvements and related checks#7611
Implement chained comparison improvements and related checks#7611areveny wants to merge 24 commits into
Conversation
Pull Request Test Coverage Report for Build 3231501930
💛 - Coveralls |
This comment has been minimized.
This comment has been minimized.
Pierre-Sassoulas
left a comment
There was a problem hiding this comment.
Thank you for the work @areveny. There's a lot to review but from what I can see after a surface review it's possible that making the gragh check's code independent from the RefactoringChecker might be a good idea, because 1) There's a lot of code and maintenance would benefit from it being separated 2) It's going to be easier if we want to extends usiing-constant-testsdirectly where it's at instead of refactoring how the checker work to be able to raise any message anywhere.
| "Emitted when items in a boolean condition are all <= or >=" | ||
| "This is equivalent to asking if they all equal.", | ||
| ), | ||
| "R1738": ( |
There was a problem hiding this comment.
Maybe this should raise the existing message for constant comparison using-constant-test (https://pylint.pycqa.org/en/latest/user_guide/messages/warning/using-constant-test.html#using-constant-test-w0125). We might need to relax some constraint on what checker can do, I'm not sure a checker can raise messages it does not define right now.
|
Hi @areveny, just wondering: are the false negatives in the primer result one of the items you were hoping to polish after getting an initial review? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7611 +/- ##
==========================================
+ Coverage 96.22% 96.26% +0.03%
==========================================
Files 178 178
Lines 19699 19857 +158
==========================================
+ Hits 18956 19115 +159
+ Misses 743 742 -1
🚀 New features to boost your workflow:
|
- Rename comparison-all-equal symbol to chained-comparison-all-equal. - Renumber the two new messages to R1738/R1739 (R1737 is now use-yield-from). - Drop the leftover comparison-all-equal emit and emit the renamed symbol. - Fix the evalutes typo in the impossible-comparison message. - Include the simplified equality form in the chained-comparison-all-equal message and rephrase descriptions. - Tighten the type hints on the new helpers and on _get_compare_operand_value (it can be a static method). - Make the graph helpers added for this check private and modernize their signatures. - Register the new symbols on visit_boolop's only_required_for_messages decorator. - Add documentation pages, bad.py / good.py examples and news fragments for the new checks.
Consolidate the two new_check fragments under pylint-dev#5814 into a single file (the script/check_newsfragments.py validator requires the filename stem to match the referenced issue) and reference the PR for the false_negative fragment instead. Apply pydocstringformatter and black tweaks on graph.py / refactoring_checker.py.
This comment has been minimized.
This comment has been minimized.
Before this change, _check_comparisons bailed out as soon as it saw a statement that wasn't a simple Compare (e.g. a function call or a Compare with an unsupported operator like !=). That meant common patterns like _is_int(v) and v >= 0 and v <= 999 never produced a suggestion. The check now skips those statements but keeps their source text, so the suggestion shows the full simplified condition (e.g. _is_int(v) and 999 >= v >= 0) instead of being silent. Partial Compare statements no longer leak half-committed edges into the graph either — edges for a Compare are only mutated once the whole statement has been verified to be processable.
``a > 1 and a < 10`` now suggests ``1 < a < 10`` instead of ``10 > a > 1``. The graph is still normalised largest-first internally (it makes the cycle and ordering math simpler), but ``_render_path`` now walks it in reverse and flips the operators when emitting the user-facing text, so the suggestion reads in the direction Python code idiomatically writes a range check (``0 <= v <= 999``). Equality-only paths keep their original direction so ``b == 2`` is not rewritten as ``2 == b``.
The four parallel dicts (edges / symbols / indegree / frequency) plus the ``unprocessed`` list were being passed around through return tuples and verbose function signatures. The return type of ``_get_graph_from_comparison_nodes`` in particular spanned eight lines and was unreadable. Wrap them in a ``_ComparisonGraph`` dataclass and introduce two TypeAlias shorthands (``_ComparisonOperand``, ``_ComparisonEdge``) so the helper signatures shrink to one line. Pull the constant-linking loop out into its own ``_link_constants`` helper, and rewrite ``_handle_cycles`` to build the edge list explicitly instead of doing two passes over the cycle. No behaviour change.
This comment has been minimized.
This comment has been minimized.
The check is purely syntactic — ``_get_compare_operand_value`` only pattern-matches ``nodes.Name`` and ``nodes.Const`` and no ``safe_infer`` call is made — so ``HIGH`` is the right confidence level (same as ``use-yield-from``, ``consider-using-in`` etc.) rather than the implicit ``UNDEFINED`` we had before. Updated the functional test expectations.
Replace the placeholder ``int(input())`` examples with little function-argument scenarios that show why the pattern matters: - impossible-comparison uses a botched triangle-inequality check (a > b and b > a) and contrasts it with the real inequality. - chained-comparison-all-equal uses three bodies in thermal equilibrium (a >= b and b >= c and c >= a) and simplifies it to a == b == c.
This comment has been minimized.
This comment has been minimized.
…parts Before, ``_is_int(v) and v >= 0 and v <= 999`` was being rendered as ``0 <= v <= 999 and _is_int(v)`` because the rendered parts were alphabetically sorted before being joined. That reordering is unsafe: ``_is_int(v)`` is a type guard whose role is to short-circuit before any numeric comparison runs, and pulling it after the chain could let a non-comparable ``v`` raise ``TypeError``. Split ``node.values`` into runs of consecutive processable Compares separated by unprocessable boundaries. Each run gets its own graph and can chain internally; the boundary statements keep their original source-order slots. The previous all-equal segment shortcut becomes a check across all graphs, and cycle detection now runs per-segment (a cycle that only exists across an unprocessable boundary is no longer impossible, since the boundary can short-circuit). Also reword the ``_ComparisonGraph`` docstring to drop the codespell-flagged words (``Multigraph``, ``normalisation``, ``indegree``, ``Consts``) and the em dashes.
Add a ``test_cycles_do_not_cross_unprocessable_boundaries`` function that nails down two facts: - ``a > b and b > a and is_int(v)`` (and the all-equal flavor) still fires because the cycle lives entirely within a single run of consecutive comparisons. - ``a > b and is_int(v) and b > a`` does *not* fire even though the cycle is a contradiction in isolation. The guard between the two halves can short-circuit, so reporting the cycle would be a false positive in the general case.
``value == 1 and value == 2`` was firing the new chained-comparison suggestion as ``value == 2``, but that's wrong: dropping ``value == 1`` is not safe when the kept constraint doesn't imply it. The pre-PR check silently ignored all-``==`` conjunctions, keep it that way. The bug came from ``_link_constants`` adding a synthetic ``>`` edge between the two constants, which let path-finding produce a misleading chain. The all-``==`` bailout has to run *before* the synthetic linking. Restructure ``_segment_comparisons`` to stage the per-segment graphs in a ``pending`` list, check all-``==`` across them, then link only if we're going to keep going. This restores the silence the ``tests/functional/c/consider/consider_using_in.py`` test was relying on. Also reword the docstring to drop the codespell-flagged ``Unprocessable``.
This comment has been minimized.
This comment has been minimized.
- Float Const operand: ``0.5 < a and a < 1.5`` exercises the ``isinstance(node.value, (int, float))`` branch of ``_get_compare_operand_value`` with a float. - Attribute/subscript as a Compare operand: ``items[0] < 5 and v > 1 and v < 10`` trips the ``_get_compare_operand_value`` ``return None`` branch via an operand (distinct from the existing ``is_int(v)`` test, where the whole statement isn't a Compare). - All-``==`` bailout in ``_segment_comparisons``: the cases ``a == 1 and a == 2`` and ``a == b and b == c`` lock in the silence restored by ddbe375; it was only covered indirectly via ``consider_using_in.py``. - Mixed-symbol cycle in ``_handle_cycles``: ``a > b and b == a`` and ``a >= b and b > a`` cover cycles that pair an inequality with ``==`` (existing tests only mixed ``>`` with ``>=``).
This comment has been minimized.
This comment has been minimized.
- Drop the ``if not operators: return False`` guard at the end of ``_add_compare_to_graph``: ``Compare`` AST nodes always have at least one operator, and the per-op loop has already returned ``False`` for every unprocessable case, so this branch was unreachable. - Add ``a < 5 and a > 5`` to ``test_impossible_comparison`` so the ``smaller >= largest`` continue in ``_link_constants`` (duplicate constants on both sides of a Compare) is exercised.
Hi.
This PR adds improvements to
chained-comparisonby processing the chain of comparisons into a graph. It's still a bit unpolished but I would like some higher-level feedback on the direction.It adds two new checks:
impossible-comparisonfor cases when a comparison would always be false, e.g.a > b and b > aora == 5 and a > b and b == 1.comparison-all-equalfor a special cycle case where every comparison allows equality.a >= b >= c and c >= asimplifies toa == b == c.It modifies
chained-comparisonto be more accurate, specifically the casea < b < c and a < cshould be simplified to justa < b < c. It currently is not.It also prints what simplified comparison is.
Unfortunately, this is at the cost of complexity. The algorithm finds the minimum number of comparisons by finding the longest path through the graph of comparisons repeatedly, recalculating all the path lengths after each path is emitted. (I could not get it working by emitting paths in arbitrary order until all nodes are accounted for because of the case above,
a < b < c and a < c.) The overhead of doing so should be limited by the reality that most comparisons tend to have just a few statements.Please let me know what you think. I'm open to big suggestions, but want to share my code for feedback and to keep this moving.
Type of Changes
Description
Closes #5814