Skip to content

Conversation

@MichaReiser
Copy link
Contributor

@MichaReiser MichaReiser commented Feb 21, 2025

This PR fixes a bug specific to tracked structs that are recycled and the query doesn't read any tracked fields.

Salsa maintains a free-list with tracked structs that were created in a previous revision but are now no longer used. The storage and IDs of those tracked structs can be reused in the next revision. This helps reduce overall memory usage and allows the recycling of IDs.

However, the reuse of ids combined with coarse-grained dependencies as implemented today can lead to incorrect cache-reuse if a tracked struct ID gets reused and a query takes multiple arguments -- which requires interning. The problem is that the interned argument struct compares equal because the tracked struct has equal ids, making salsa believe that reusing the cached data is fine (when it isn't).

This PR fixes this by:

  • Adding a new created_at field to tracked structs that stores the revision when the tracked struct was first created. This field gets updated to the current revision when a tracked struct is recycled
  • Add a maybe_changed_after implementation to TrackedStruct that returns true if the tracked struct was created after the query was last run. This suggests that this is no longer the same tracked struct
  • Add a read dependency to the tracked struct (not the field) when accessing any untracked field. This guarantees that the query depends on the tracked struct and calls its maybe_changed_after when validating if the query has changed.

The new field is similar to the interned struct's reset_at field. It's also the same as we have for tracked fields on tracked structs where each field caries its own revision.

I added a new test and verified that it failed on master before implementing this fix.

Note:

Why not use updated_at?: updated_at gets bumped in every revision any field of the tracked struct was read. This is different from what we need here where it's mainly to identify when the id was reused.

@netlify
Copy link

netlify bot commented Feb 21, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit b338772
🔍 Latest deploy log https://app.netlify.com/sites/salsa-rs/deploys/67bb4974faa5b50008d3fc6c

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 21, 2025

CodSpeed Performance Report

Merging #708 will not alter performance

Comparing MichaReiser:fix-tracked-struct-dependency (b338772) with master (97ea558)

Summary

✅ 9 untouched benchmarks

@MichaReiser MichaReiser force-pushed the fix-tracked-struct-dependency branch from f3c6422 to b3e77e5 Compare February 21, 2025 08:20
@MichaReiser MichaReiser marked this pull request as ready for review February 21, 2025 08:22
@MichaReiser MichaReiser force-pushed the fix-tracked-struct-dependency branch 2 times, most recently from b9e446f to 181902f Compare February 21, 2025 08:27
@carljm
Copy link
Contributor

carljm commented Feb 21, 2025

At a high level, one thing I'm not understanding from the PR description is why this only affects queries with multiple arguments and implicitly interned structs. If a reused tracked struct ID is the sole input to a query, why doesn't that cause the same problem?

@MichaReiser
Copy link
Contributor Author

MichaReiser commented Feb 21, 2025

At a high level, one thing I'm not understanding from the PR description is why this only affects queries with multiple arguments and implicitly interned structs. If a reused tracked struct ID is the sole input to a query, why doesn't that cause the same problem?

Results for single-argument queries are directly stored on the tracked struct's memo table. The recycling code already unsets all associated query results.

https://github.com/MichaReiser/salsa/blob/027db454cbcab4edf6cc9ba538f6044a3835f998/src/tracked_struct.rs#L609-L629

Multi-argument queries are different because they use interning and the query then only depends on that interned value. The interned value's identity is whether the value compares and hashes equal, which is the case when ids are reused

@MichaReiser MichaReiser force-pushed the fix-tracked-struct-dependency branch from 5e6d524 to 3ac27e9 Compare February 21, 2025 18:36
@MichaReiser
Copy link
Contributor Author

MichaReiser commented Feb 21, 2025

I pulled the changes for the bad hash/hash collision down into this PR to make the next PR a memory improvement only. I hope that helps clarify when/how created_at gets re-initialized and that it can happen more than once (but doesn't require any synchronization)

@MichaReiser MichaReiser changed the title Tracked struct recycling and coarse grained dependencies Fix incorrect query caching for recycled tracked structs Feb 21, 2025
@MichaReiser MichaReiser force-pushed the fix-tracked-struct-dependency branch from d5285d3 to b338772 Compare February 23, 2025 16:14
@MichaReiser MichaReiser dismissed davidbarsky’s stale review February 23, 2025 16:16

I applied the suggestion and extended the comment to also account for other reasons where created_at needs updating.

@MichaReiser MichaReiser added this pull request to the merge queue Feb 23, 2025
Merged via the queue into salsa-rs:master with commit 234ad9a Feb 23, 2025
11 checks passed
@MichaReiser MichaReiser deleted the fix-tracked-struct-dependency branch February 23, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants