[ty] loop control flow analysis using loop header definitions#22794
[ty] loop control flow analysis using loop header definitions#22794oconnor663 merged 1 commit intomainfrom
Conversation
Typing conformance resultsNo changes detected ✅ |
Merging this PR will degrade performance by 23.27%
Performance Changes
Comparing Footnotes
|
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
possibly-unresolved-reference |
319 | 0 | 0 |
unresolved-attribute |
63 | 1 | 55 |
invalid-argument-type |
39 | 0 | 58 |
unresolved-reference |
5 | 91 | 0 |
invalid-key |
51 | 0 | 0 |
unsupported-operator |
19 | 0 | 19 |
unused-type-ignore-comment |
1 | 33 | 0 |
invalid-assignment |
6 | 4 | 6 |
division-by-zero |
0 | 14 | 0 |
not-subscriptable |
14 | 0 | 0 |
no-matching-overload |
8 | 1 | 0 |
invalid-declaration |
5 | 0 | 0 |
invalid-return-type |
1 | 0 | 4 |
not-iterable |
1 | 0 | 3 |
possibly-missing-attribute |
4 | 0 | 0 |
call-non-callable |
1 | 0 | 0 |
too-many-positional-arguments |
1 | 0 | 0 |
type-assertion-failure |
0 | 0 | 1 |
| Total | 538 | 144 | 146 |
962ec70 to
a02fdff
Compare
|
e663909 to
f0701e9
Compare
There was a problem hiding this comment.
The changes to Divergent handling in IntersectionType here are related to the test case that begins "We need to avoid oscillating cycles...".
There was a problem hiding this comment.
Update: I've replaced the changes I was referring to here (Divergent handling in intersections during cycle recovery) with changes to InnerIntersectionBuilder in crates/ty_python_semantic/src/types/builder.rs (Divergent handling in all intersections). Without this change, the new and pretty broad Divergent case in evaluate_expr_compare_op leads to a lot of Signature::todo("Type::Intersection.call") types showing up in output.
There was a problem hiding this comment.
I wonder if we can move the Divergent handling back here once I finally land #22469 -- at that point we should just get back an intersection with Divergent from the call, and then that would probably get handled here.
But if we aren't seeing negative effects from moving it into IntersectionBuilder directly, maybe that doesn't matter much.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f0701e9a85
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Ecosystem dispositions
|
I'm actually a little bit surprised that our place-based narrowing doesn't handle this? |
carljm
left a comment
There was a problem hiding this comment.
It's amazing to see this come together! I did a first-pass review.
| // A place shouldn't be definitely-unbound when there are visible bindings. The | ||
| // initial UNBOUND definition should either get shadowed by those bindings (if | ||
| // they're unconditional), or it should have constraints attached to it (if the | ||
| // other bindings are conditional). However, there's one exception to that | ||
| // intuitive rule: Loop header definitions don't shadow prior bindings, because | ||
| // prior bindings are always visible at the start of the first loop iteration. | ||
| Some(Truthiness::AlwaysTrue) => Definedness::PossiblyUndefined, |
There was a problem hiding this comment.
This is unfortunate, as this unreachable!() has definitely caught bugs before. Can we give a clearer explanation of the kind of code that causes us to hit this case? I don't think I quite understand it -- it seems like having unbound be definitely-visible would require that we have a place that is unbound before the loop, and never bound in the loop... in which case it shouldn't have a loop-header binding at all?
There was a problem hiding this comment.
093392c restores the assert in most cases, and suppresses it only when a loop header bindings are the only bindings visible. I've added a while_loop.md test case with the simplest example of something that panics without this special case:
while True:
x # error: [possibly-unresolved-reference]
x = 1The issue is that, because the loop condition is statically true, we don't encounter any narrowing conditions before we get to the first use of x. But because of the loop header, we do have a binding. So the general rule "UNBOUND shouldn't be definitely-visible if you have a binding" is violated. Of course this example doesn't run cleanly, it raises an exception at the use of x, but either way we can't panic when we see it.
There are several other examples in our existing tests that hit this panic (with my loop header changes but without this new special case), and some of them even run (if not terminate) without exceptions, but they're pretty esoteric. eq_without_hash.py has this funny loop:
class MaybeEqWhile:
while ...:
def __eq__(self, other): ...Part of the story there is that ... is truthy. Another part is that I think a function definition also acts like a use of that function, because of overload resolution.
| let narrowed_ty = if let Some(constraint) = loop_back_binding | ||
| .narrowing_predicates | ||
| .iter() | ||
| .filter_map(|predicate| infer_narrowing_constraint(db, *predicate, place)) | ||
| .reduce(|acc, constraint| constraint.merge_constraint_and(acc, db)) | ||
| { | ||
| NarrowingConstraint::intersection(binding_ty) | ||
| .merge_constraint_and(constraint, db) | ||
| .evaluate_constraint_type(db) |
There was a problem hiding this comment.
It seems like the constraint merging is something we should have already done in semantic indexing, so that here we just have a single NarrowingConstraint and can use NarrowingConstraint::narrow, like place_from_bindings_impl does.
There was a problem hiding this comment.
Cleaned up in 7094021. @dcreager do you have any advice for how to name this function that I'm calling constraints_iter? It seems like the items of a ConstraintsIterator are "predicates" rather than "constraints". Would it be better for the world if I called this a predicates_iter? Should we rename the whole type?
There was a problem hiding this comment.
Hmm, pyantic and colour_science show up in CodSpeed with this change, but not before. That feels fishy to me. Any chance those are flakey? (I'm going to rebase and see what happens either way.)
7364f8a to
e0648fa
Compare
|
Putting this into Draft while I address feedback, so I stop alerting folks |
25b1645 to
25a3596
Compare
|
I've been chasing the class Node:
def __init__(self, next: "Node | None" = None):
self.next: "Node | None" = next
node = Node(Node(Node()))
while node.next is not None:
node = node.nextThat currently gives a false positive "Attribute
I'm curious if this sounds plausible to other folks. Not yet sure where we want to intervene here... Update: This was fixed in e07ecd0. |
060a411 to
897195a
Compare
942e186 to
e07ecd0
Compare
fb7b575 to
8901fe9
Compare
Memory usage reportSummary
Significant changesClick to expand detailed breakdownprefect
sphinx
trio
flake8
|
7094021 to
dc23a4a
Compare
|
I'm bringing this PR out of draft for a second round of review. I've addressed all of @carljm's review comments above, and I've been through the ecosystem report and fixed anything that looked like an obvious bug. (See this comment above.) Some more performance regressions have crept in in the last few commits, which have now all been rebased together. The CodSpeed report on an experiment/throwaway PR (specifically this commit) makes it look like my changes to |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc23a4a2be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
08098f4 to
09b740a
Compare
crates/ty_python_semantic/src/semantic_index/builder/loop_bindings_visitor.rs
Show resolved
Hide resolved
| /// Infer the type for a loop header definition. | ||
| /// | ||
| /// The loop header sees all bindings that loop-back, either by reaching the end of the loop | ||
| /// body or a `continue` statement. This includes bindings from before the loop too, though |
There was a problem hiding this comment.
Nit: I think this might be potentially confusing. Let's say it can include bindings from before the loop (if they reach the end of the loop). Saying it unconditionally almost sounds like we redundantly include pre-loop bindings in the loop header always, which we don't.
There was a problem hiding this comment.
I wonder if we can move the Divergent handling back here once I finally land #22469 -- at that point we should just get back an intersection with Divergent from the call, and then that would probably get handled here.
But if we aren't seeing negative effects from moving it into IntersectionBuilder directly, maybe that doesn't matter much.
64acc21 to
29e3110
Compare
|
Only thing I've noticed on rebase is that a few of the small CodSpeed perf hits went away 🤷♂️ |
This is a draft PR to trigger CI and an ecosystem report. I still need to "own" some of Claude's code here, but all tests are plausibly passing. I expect the ecosystem report will have a lot of findings. I'm also not especially optimistic about the CodSpeed numbers. (This PR adds an extra preliminary AST pass to every loop body.)CodSpeed turned out not as bad as I was fearing. I
still need to write a proper description hereand disposition the ecosystem results, but @carljm this is ready for a review pass :)Major moving parts:
whileloops andforloops now synthesize "loop header definitions". SeeSemanticIndexBuilder::visit_stmt. This is a new type of definition that collects all the bindings that reach a loop-back edge (the end of the loop body, or acontinuestatement), making these visible to all uses in the loop (as opposed to just the uses that come after).LoopBindingsVisitor. This is an example of something that's pretty easy to get working with Claude (it's happy to write piles of code), but which (in the old days?) we might not be thrilled about maintaining long term. I'm curious whether anyone has alternative ideas for solving this problem.break, it could be that no bindings loop-back.) We solve this with a layer of indirection. The loop header definitions hold aLoopToken, which uniquely identifies the loop but otherwise holds no data. Then we assemble the bindings we collect during the walk into aLoopHeaderstruct. Finally, we use the Salsaspecifyfeature to connect theLoopTokento theLoopHeader.UNBOUND) follow control flow into the loop body normally and remain visible after the loop. Alternatively I could've "snapshotted" all the bindings at loop entry and relied entirely on "looking through" the loop header definitions, but "don't shadow" was the first approach Claude and I managed to get working :) I'm curious if anyone has strong opinions about this design choice.sympy. It turned out to be this loop, which 1) binds a variable cyclically in the loop body, 2) using a tuple-unpacking assignment, and 3) uses the same variable as anifcondition around that binding. A couple of the new mdtests are a minimized version of this. With a lot of help from Carl, it turned out that the fix for this cycle was to more aggressively eliminateDivergentfrom intersections. These changes make sense in the abstract (Divergentshould behave likeNeverin unions and intersections), but I still don't fully understand the evolution of the Salsa cycle that these changes fix. If you want to see the panic for yourself, undo this PR's changes tocrates/ty_python_semantic/src/types.rsand run thewhile_loop.mdorfor.mdmdtests.