Skip to content

Conversation

MichaReiser
Copy link
Contributor

@MichaReiser MichaReiser commented Oct 6, 2025

This PR rewrites how Salsa runs fixpoint iterations when multiple cylce heads participate in a cycle (when there are multiple nested cycles).

Motivation

The main motivation for this is that we see a runaway in ty for code like this:

self.x1 = 0
self.x2 = 0

self.y1 = self.y2 + self.y3
self.y2 = self.y1
self.y3 = self.y1 + self.y2

self.z1 = self.y1 + self.z2
self.z2 = self.z1

Or as a graph

flowchart TD
    x1
    x2
    
    y1 --> y2
	y1 --> y3
    y2 --> y1
	y3 --> y1
	y3 --> y2

    z1 --> y1 
    z1 --> z2
    z2 --> z1
Loading

Today, we'll run separate fixpoint iterations for y1 <-> y2, y3 <-> y1, and z1 <-> z2.

The downside of this is that Salsa runs the nested fixpoint iterations y1<->y3 to convergence for every y1 <-> y2 iteration. If there are more deeply nested cycles, then each of those inner cycles is run to completion for every y1 <-> y2 and y1 <-> y3 iteration, making the number of iterations dependent on the nesting of queries rather than how quickly the problem converges.

To avoid this, the idea is to identify the strongly connected components in the dependency graph. These are:

  • x1 (no dependencies)
  • x2 (no dependencies)
  • y1, y2, y3
  • z1, z2 (only depends on y1 in one direction but not both)

Then, run a fixpoint per strongly connected component. That means, we only run 2 fixpoint iterations for the example above. This ensures that the runtime is bound by how quickly the problem converges and not by how deeply nested our queries are (which creates a tension between granular queries and performance).

Single-threaded implementation

Let's ignore multi-threading first as this only complicates things ;)

The change in itself isn't that complicated:

  • execute_maybe_iterate: If the query is part of an outer cycle, then don't remove the cycle head from cycle_heads, set a flag on the memo indicating whether this head has converged, and return the provisional value. The query will iterate as part of the outer cycle
  • The outer cycle is the query that, when claiming it, results in a cycle (this tells us it's higher up on the stack)
  • If the query is an outer cycle, then:
    • Test if itself and all its inner cycle heads (can be retrieved from memo.cycle_heads) have converged
    • If not, update the iteration count on each cycle head and iterate again
    • Otherwise, mark all the cycle heads as finalized.

That's roughly it :)

Multithreading

Where things get complicated is multithreading. I tried to keep the existing locking (where we wait on cycle heads in some places), but I wasn't able to make it work (there were spurious errors till the end). That's why I ended up rewriting cycle head synchronization from scratch.

What makes cycles tricky is that multiple threads can enter the same cycle simultaneously, each starting from a different cycle head that ultimately resolves to the same outer cycle. The new implementation makes this even trickier because inner cycles now complete immediately without iterating, which results in them releasing their locks. That allows other threads to claim those locks again, which can result in them competing for the same locks forever.

This PR extends Salsa's DependencyGraph with a mechanism to transfer the ownership of a query to another query. Let's say we have a -> b -> a and b -> c -> b. The inner cycle head can transfer its lock to the query a. b then remains locked until c completes (successfully or panicking). Only the thread owning a (because it's the thread holding that lock or it's another thread on which a is blocked on) is allowed to reclaim b.

This ensures that the locks of all cycle heads participating in a query ultimately converge to be owned by a single query/thread. We need to do this for every query that supports cycle handling, even if it isn't a cycle head, because the query could become a cycle head in later iterations (e.g., due to changes in the cycle-entry point).

I also found this approach much easier to reason about than the old approach of "just wait for all cycle heads".

However, there's a downside. It requires acquiring the global DependencyGraph lock and tracking dependencies whenever claiming or releasing any query with cycle handle support that participates in a cycle. We incur this cost even when all queries participating in the cycle run on the same thread; this is also the main reason why converge-diverge's performance regresses. I'm not sure if there's a solution to avoid this.

TODO:

  • Documentation
  • Investigate perf regressions
  • Review if there are any changes necessary to maybe_changed_after
  • Investigate why cycle_nested_deep_conditional_changed::the_test sometimes hangs or panics
  • Add a test for a panic in a nested cycle

Copy link

netlify bot commented Oct 6, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 9293c3d
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/68f0b70ce3c8ec00080855b7

Copy link

codspeed-hq bot commented Oct 6, 2025

CodSpeed Performance Report

Merging #999 will degrade performances by 14.68%

Comparing MichaReiser:fixpoint-scc-sync-table (9293c3d) with master (8b0831f)

Summary

⚡ 3 improvements
❌ 3 (👁 3) regressions
✅ 7 untouched

Benchmarks breakdown

Benchmark BASE HEAD Change
amortized[Input] 2.1 µs 2 µs +4.51%
amortized[SupertypeInput] 3 µs 2.8 µs +7.91%
👁 new[Input] 9.6 µs 10.1 µs -4.64%
👁 new[InternedInput] 4.3 µs 4.5 µs -4.46%
👁 converge_diverge 125.4 µs 147 µs -14.68%
converge_diverge_nested 200.8 µs 120.1 µs +67.22%

@MichaReiser
Copy link
Contributor Author

MichaReiser commented Oct 6, 2025

Hmm, I'm surprised that this is even slower than #995, considering that most logic now lives directly in DependencyGraph. I suppose the downside is that we incur the cost of transferring the outer head in all cases, even if the entire calculation occurs on a single thread. That might be an issue for ty, ugh.

Either way. I think there are some optimization opportunities:

  1. Avoid calling unblock in more cases
  2. Add a method to DependencyGraph to determine if all queries form a cycle (rather than having many wait_for calls)

Comment on lines +538 to +535
// If the value is from the same revision but is still provisional, consider it changed
// because we're now in a new iteration.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always found the deep_verify_memo logs for provisional values confusing. Exiting early here ensures they don't show up

@MichaReiser
Copy link
Contributor Author

MichaReiser commented Oct 9, 2025

The amortized benchmark results make no sense to me... I didn't touch fetch_hot at all... #1001 also shows a 7% perf improvement which is complete nonsense.

@MichaReiser MichaReiser marked this pull request as ready for review October 10, 2025 14:09
@MichaReiser MichaReiser requested review from carljm and ibraheemdev and removed request for carljm October 10, 2025 14:10
@MichaReiser
Copy link
Contributor Author

I found one more bug. We hit fetch_cold_cycle should have inserted a provisional memo with Cycle::initial for infer_definition_types(Id(4177c))` after a cyclic query panicked.

I'm not entirely sure yet what's happening

@carljm
Copy link
Contributor

carljm commented Oct 10, 2025

@MichaReiser Should I review this now, or better wait for further exploration of this new bug?

@MichaReiser
Copy link
Contributor Author

MichaReiser commented Oct 11, 2025

@MichaReiser Should I review this now, or better wait for further exploration of this new bug?

It's ready. I very much hope that any bugs that I discover now won't require a significant rewrite. The most recent bug related to panicking cycles required a 3-line fix only.

@MichaReiser MichaReiser force-pushed the fixpoint-scc-sync-table branch from e5a0069 to c5cc3dd Compare October 11, 2025 14:55
let mut opt_last_provisional: Option<&Memo<'db, C>> = None;
// This is different from `opt_old_memo` which might be from a different revision.
let mut last_provisional_memo: Option<&Memo<'db, C>> = None;
// TODO: Can we seed those somehow?
Copy link
Contributor Author

@MichaReiser MichaReiser Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is #980

Comment on lines -143 to -145
// This isn't strictly necessary, but if this is a provisional memo for an inner cycle,
// await all outer cycle heads to give the thread driving it a chance to complete
// (we don't want multiple threads competing for the queries participating in the same cycle).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for fallback immediate in multi threaded context. I didn't spend any time understanding why

Copy link
Member

@ibraheemdev ibraheemdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review everything too closely, but I left a couple nits and questions.

.iter()
.find(|active_query| {
cycle_heads.contains(&active_query.database_key_index)
&& active_query.database_key_index != current_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enough to identify strongly connected components? Do we never need to prefer the outermost query for deeply nested cycles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding the outer-most query is somewhat difficult when it stretches across multiple threads. At some point, I had an implementation that queried the DependencyGraph to find the outermost cycle. However, it turned out that finding the outermost cycle was fairly expensive. Just finding "an outer" query is cheaper, and we can leave it to "that outer" query to find the "next outer query". Ultimately, we'll reach the outermost query.

Things are much easier if all queries are on the stack, which is what we handle here. We start from the outermost active query and try to find the first query that is a cycle head. This gives us a "local maximum" for what the outermost query is.

@MichaReiser MichaReiser added the refactoring Code works but is messy label Oct 16, 2025
@MichaReiser MichaReiser enabled auto-merge October 16, 2025 08:51
@MichaReiser MichaReiser disabled auto-merge October 16, 2025 09:01
@MichaReiser MichaReiser added this pull request to the merge queue Oct 16, 2025
Merged via the queue into salsa-rs:master with commit ef9f932 Oct 16, 2025
12 checks passed
@MichaReiser MichaReiser deleted the fixpoint-scc-sync-table branch October 16, 2025 09:33
@Veykril
Copy link
Member

Veykril commented Oct 16, 2025

This is pretty cool stuff I have to say!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring Code works but is messy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants