[ty] Build constraint set sequent maps lazily#22577
Conversation
Typing conformance resultsNo changes detected ✅ |
|
Merging this PR will improve performance by 5.14%
Performance Changes
Comparing Footnotes
|
1987266 to
df741d7
Compare
| impl PartialEq for SequentMap<'_> { | ||
| fn eq(&self, _other: &Self) -> bool { | ||
| false | ||
| } | ||
| } |
There was a problem hiding this comment.
I think this is the same as setting [no_eq] on the query (salsa will not do any backdating, meaning all queries reading the sequent_map of a particular interior node will re-run even if it creates the exact same SeqMap. Are there any other fields that we could base Eq on (e.g., the ones that don't change :)).
Looks very straightforward otherwise :)
There was a problem hiding this comment.
I did confirm that this is equivalent to setting #[no_eq] on the query method. And if I do that, I can remove the PartialEq impl entirely.
But does that mean we would get a separate SequentMap each time we call the tracked query? My intent is that there will be one created for each interior node. (And the updated performance numbers suggests that's what's happening.) I'm okay with a different SequentMap being created for that interior node if it appears again in a later revision, since I think it's correct to invalidate that cache then.
Although maybe we do want to reuse the cache in later revisions? The interior node should entirely determine the contents of the BDD, and walking the BDD later on should yield the same results. Okay I think you've convinced me. (Assuming I understand your suggestion correctly.) To do this I can add the InteriorNode as a field of SequentMap, to record which node the sequent map belongs to, and then have that be the only field that PartialEq checks.
There was a problem hiding this comment.
But does that mean we would get a separate SequentMap each time we call the tracked query? My intent is that there will be one created for each interior node. (And the updated performance numbers suggests that's what's happening.) I'm okay with a different SequentMap being created for that interior node if it appears again in a later revision, since I think it's correct to invalidate that cache then.
No, you get the same instance within the same revision and the instance is cached for as long as no data read by the ::sequent_map() query changes.
Taking exported_names as an example here because it's easier to explain the concept. Salsa re-exeuctes the exported_names query every time the file's AST changes. When Salsa's done, it compares the result from running exported_names the last time with the newly computed result. If the two results are equal, then the query didn't change (even though the AST changed). This allows Salsa to reuse the cached result for a query that only depends exported_names (or only depends on queries that all haven't changed).
If you set no_eq, then you opt out of this optimization and Salsa will always assume that the result changed when any of the query's inputs changed. Which is probably fine in your case.
The one thing we need to be careful is that the internal mutability code doesn't access db because a query reading a cached result wouldn't see all its dependencies, breaking Salsa's cache invalidation.
There was a problem hiding this comment.
The one thing we need to be careful is that the internal mutability code doesn't access
dbbecause a query reading a cached result wouldn't see all its dependencies, breaking Salsa's cache invalidation.
Is this part true in general? That might be a deal-breaker for this approach, because the interior mutability code will definitely need to access the db.
There was a problem hiding this comment.
Is this part true in general? That might be a deal-breaker for this approach, because the interior mutability code will definitely need to access the db.
It depends on what you read, but how salsa tracks dependencies is something I'd consider internal to Salsa (or at least requires a lot of documentation)
I don't think we add read dependencies for interned structs, but we used to (CC: @ibraheemdev). But calling any salsa query, reading a tracked field of a tracked struct, or reading any input makes this approach unsound.
There was a problem hiding this comment.
Creating any new interned values I think would be unsound.
I guess so is reading because reading an interned value with low durability must propagate to the outer query. So it's not just about the dependencies, it's also about the query's metadata that need to be reflected accordingly
There was a problem hiding this comment.
Okay that tells me I need to rethink this...
There was a problem hiding this comment.
Yeah, interior mutability will not play well with Salsa here. If the interior mutability code creates an interned value without the sequent_map query having a dependency on that interned value, the interned value may be garbage collected, and later calls to sequent_map will read stale data.
There was a problem hiding this comment.
I found a different way to do this that keeps the performance win but doesn't require interior mutability
|
Cool to see that the internal mutability is good for performance :) |
|
5b240fb to
6555491
Compare
122bdf4 to
2d07e1b
Compare
| "create sequent map", | ||
| ); | ||
|
|
||
| fn path_assignments(self, db: &'db dyn Db) -> PathAssignments<'db> { |
There was a problem hiding this comment.
Nit: Maybe for a separate PR: Would it make sense to maybe use SmallVec here? (what's a "typcial" size of `constraints?)
There was a problem hiding this comment.
Good idea, done! (I don't have specific numbers but it's definitely true that most constraint sets will have a smallish number of constraints. I chose 8 more or less at random)
| cycle_initial=sequent_map_cycle_initial, | ||
| heap_size=ruff_memory_usage::heap_size, | ||
| )] | ||
| fn sequent_map(self, db: &'db dyn Db) -> SequentMap<'db> { |
There was a problem hiding this comment.
The old sequent_map query had cycle handling, but not all queries calling path_assignments have. Was it only the SequentMap::add call that could result in cycles? If so, are there any queries where we need to add cycle handling, now that the cycle is no longer "contained" by the sequent_map query?
There was a problem hiding this comment.
Was it only the
SequentMap::addcall that could result in cycles?
Yes, because of the subtype checks that we have to perform to compare the lower/upper bounds of each constraint. That could cause a cycle if we had to create a sequent map while we were in the middle of inferring the lower/upper bound type.
I was using the mdtests + ecosystem tests to verify that the cycle handler isn't needed with this change. My intuition for why is that (a) the lazy processing delays the add calls enough that we're no longer in the middle of inferring the types of the lower/upper bounds, and (b) we might not have to analyze certain constraints at all anymore, since we only look at a constraint once we actually encounter it when walking a BDD tree.
|
Nice. I don't have a lot of context on the BDD work but the cachng makes sense to me. Probably something that would also benefit from within-same-revision LRU caching, to cap the memory usage (see prefect). |
Before, when building a
SequentMapfor a constraint set, we would immediately iterate through all of the constraints in the set, and compare each pair of them looking for intersection/implication relationships. It turns out that we often don't need to examine every pair when walking the BDD tree of a constraint set. Instead, we can visit each constraint as we encounter it for the first time in our BDD walk. We do still need to collect all of the constraints in the BDD to ensure that they remain ordered in a consistent way, but we can track that separately and without having to immediately build up the actual sequents.