Tables #564

nikomatsakis · 2024-08-16T06:55:35Z

This PR modifies the way we store salsa struct to use a central table/page system. The results of tracked functions are now stored in memos/syncs attached to a particular slot.

This lays the foundation for serialization/deserialization as well as speculative execution by using copy-on-write pages, but it also avoids a ton of centralized hashing. Tracked functions that take a single salsa struct as argument now avoid hashtable lookups altogether.

In the future I'd like to improve the way that tracked functions with >1 argument work by extending the memo table to optionally store a per-struct hashmap, but that can be a separate PR.

netlify · 2024-08-16T06:55:52Z

✅ Deploy Preview for salsa-rs canceled.

Name	Link
🔨 Latest commit	`02036ff`
🔍 Latest deploy log	https://app.netlify.com/sites/salsa-rs/deploys/66c4a3bf75d33e00083d33fb

codspeed-hq · 2024-08-16T06:57:08Z

CodSpeed Performance Report

Merging #564 will improve performances by ×2.7

_{Comparing nikomatsakis:tables (02036ff) with master (08820ea)}

Summary

⚡ 1 improvements
✅ 7 untouched benchmarks

Benchmarks breakdown

	Benchmark	`master`	`nikomatsakis:tables`	Change
⚡	`many_tracked_structs`	130.9 µs	48.8 µs	×2.7

MichaReiser · 2024-08-16T07:08:18Z

Wow, very impressive (I haven't read through the code yet). One understand question:

This lays the foundation for serialization/deserialization as well as speculative execution by using copy-on-write pages, but it also avoids a ton of centralized hashing. Tracked functions that take a single salsa struct as argument now avoid hashtable lookups altogether.

By salsa struct. Does this apply to both inputs and tracked structs or only tracked structs?

MichaReiser · 2024-08-16T07:09:36Z

It's interesting that some of the other benchmarks regress.

MichaReiser · 2024-08-16T07:16:00Z

src/function/memo.rs


 #[allow(type_alias_bounds)]
-type ArcMemo<'lt, C: Configuration> = ArcSwap<Memo<<C as Configuration>::Output<'lt>>>;
+pub(super) type ArcMemo<'lt, C: Configuration> = Arc<Memo<<C as Configuration>::Output<'lt>>>;


What's the remaining use case for ArcMemo? Aren't we now allocating the data in the pages or do the pages only store references to the arc?

In this PR, the pages store a reference to the Arc<Memo>. I was debating about that, whether we could replace it with an Id of its own.

Hmm, my understanding was that the main change in this PR is that memos are no longer stored in Arcs, instead they're stored directly in Pages to get an arena-like allocation. I remembered that it is important to you that all page-headers have the same layout. Is this the reason why we're keeping an Arc (or Id`) or are there other reasons for not storing the value in the pages? Or am I misunderstanding the change.

Not exactly. I need to write up some docs on the layout. The big advantage in this PR though is that we do (on the fast path) two array lookups instead of a hash. But the memoized data is still stored in "arcs".

MichaReiser · 2024-08-16T10:37:22Z

You might already be aware of it: Constant queries no-longer compile:

/// Salsa query to get the builtins scope.
///
/// Can return None if a custom typeshed is used that is missing `builtins.pyi`.
#[salsa::tracked]
pub(crate) fn builtins_scope(db: &dyn Db) -> Option<ScopeId<'_>> {
    let builtins_name =
        ModuleName::new_static("builtins").expect("Expected 'builtins' to be a valid module name");
    let builtins_file = resolve_module(db, builtins_name)?.file();
    Some(global_scope(db, builtins_file))
}

error[E0658]: cannot cast `dyn db::Db` to `dyn Database`, trait upcasting coercion is experimental
  --> crates/red_knot_python_semantic/src/builtins.rs:10:1
   |
10 | #[salsa::tracked]
   | ^^^^^^^^^^^^^^^^^
   |
   = note: see issue #65991 <https://github.com/rust-lang/rust/issues/65991> for more information
   = note: required when coercing `&'db (dyn db::Db + 'static)` into `&(dyn Database + 'static)`
   = note: this error originates in the macro `salsa::plumbing::setup_tracked_fn` which comes from the expansion of the attribute macro `salsa::tracked` (in Nightly builds, run with -Z macro-backtrace for more info)

nikomatsakis · 2024-08-16T16:56:18Z

By salsa struct. Does this apply to both inputs and tracked structs or only tracked structs?

Salsa struct = interned | input | tracked

nikomatsakis · 2024-08-16T16:57:34Z

It's interesting that some of the other benchmarks regress.

What regresses?

nikomatsakis · 2024-08-16T16:58:11Z

You might already be aware of it: Constant queries no-longer compile:

Hmm. Seems like we are missing a test. I did change how constant queries are implemented (it's a bit less efficient now than it was, as it winds up with a silly hashmap; could be fixed, just didn't).

davidbarsky · 2024-08-16T17:13:59Z

It's interesting that some of the other benchmarks regress.

What regresses?

Here: https://codspeed.io/salsa-rs/salsa/branches/nikomatsakis:tables. For whatever reason, the comment doesn't show that change; you have to click into the report.

amortized[Input] (8%, from 3.9µs to 4.2µs)
amortized[InternedInput] (8%, from 3.2µs to 3.5µs)
mutating[20] (4%, from 19.1µs to 20µs)

carljm · 2024-08-16T17:37:41Z

It looks like CodSpeed considers those "regressions" to be within the margin of error/noise. I don't know how sophisticated the statistics are that CodSpeed uses to make those decisions, but I'm also not sure that we should read too much into results that CodSpeed has decided are not significant.

davidbarsky · 2024-08-16T17:44:38Z

Gotcha! The thing I'm unsure of is whether criterion considers these to be noise. Does codespeed delegate to criterion on that front?

carljm · 2024-08-16T17:51:32Z

Yeah, good q, I don't know...

MichaReiser · 2024-08-16T18:01:43Z

Gotcha! The thing I'm unsure of is whether criterion considers these to be noise. Does codespeed delegate to criterion on that front?

There's a settings page and it defaults to a 10 margin. It also supports commenting on improvements/regressions only

nikomatsakis · 2024-08-17T05:21:06Z

My hunch is that this branch is slightly worse on some microbenchmarks (among other things, it is using array lookups instead of pure pointers), but much better in more complex scenarios by avoiding centralized hashing.

davidbarsky · 2024-08-17T15:10:48Z

That makes sense. A single-digit regression on those benchmarks is feels like it's fine and nothing to be concerned about.

MichaReiser · 2024-08-20T10:55:36Z

Tracked functions that take a single salsa struct as argument now avoid hashtable lookups altogether.

Is my understanding correct that this is achieved by using the salsa struct ID as ID into the query cache and is based on the assumption that a query is "dense" (likely to be called for all arguments)?

nikomatsakis · 2024-08-20T14:02:58Z

Is my understanding correct that this is achieved by using the salsa struct ID as ID into the query cache and is based on the assumption that a query is "dense" (likely to be called for all arguments)?

Not exactly. Each tracked function is assigned a MemoIngredientIndex (starting from 0). Each salsa struct has an "id" which corresponds to a slot in one of the tables. The slots then carry an array for memos. This array is grown from 0 to the length of the max memo-ingredient-index with which it is used. The entries are arcs. So I guess you could say that we assume if you invoke any tracked functions on a given struct, you'll invoke many. I'm not sure how this will scale, we could do more sophisticated things.

This also retools a tiny bit how deletion works. We will reuse ids faster than before, actually.

This returns the memos attached to a given slot. Not all slots have affiliated memos, so return an `Option`.

This will allow us to invoke callbacks when deleting a memo with `Arc<dyn Any>` values.

The goal here is that ALL `Id` values come from a `Table`

We want to ensure that accessing the memos only occurs in revision R after the struct is created.

`Id` values are used in a very tailored way now, no reason to let people construct arbitrary ones.

nikomatsakis · 2024-08-20T14:10:26Z

Rebased at @davidbarsky's request. Once the CI stuff re-runs I'll merge this, presuming it looks good.

nikomatsakis · 2024-08-20T14:18:10Z

@davidbarsky

This is what I see on codspeed:

davidbarsky · 2024-08-20T14:22:11Z

thanks for rebasing! those numbers look good for now; i think this branch is fine to land for now? things can be improved after.

MichaReiser · 2024-08-20T14:33:21Z

Hmm, I think there's still the issue with const queries not compiling.

davidbarsky · 2024-08-20T14:35:26Z

Hmm, I think there's still the issue with const queries not compiling.

@MichaReiser made an issue: #565

MichaReiser reviewed Aug 16, 2024

View reviewed changes

nikomatsakis added 12 commits August 20, 2024 10:09

hex Id printouts

5cb5198

run tests with UPDATE_EXPECT=1

a27befe

introduce Table and use for interned values

01d4ef8

use table for tracked structs and their fields

188f759

This also retools a tiny bit how deletion works. We will reuse ids faster than before, actually.

create shared utility fn transmute_data_ptr

1fbca6d

introduce a trait for downcasting

94d58e6

introduce MemoIngredientIndex

c16c60c

introduce memo table (first draft)

33a99da

introduce Slot trait

db8d64f

add memo_table method to Slot trait

b5540f1

This returns the memos attached to a given slot. Not all slots have affiliated memos, so return an `Option`.

allow fetching memo table from a given id

9d93fdb

give each function ingredient a memo ingredient index

4037555

nikomatsakis added 15 commits August 20, 2024 10:09

add unused db argument to event method

2f8e78f

port input to use Table

703f312

extend justfile testing

d8ae590

store fn to create Arc<dyn Any> instead of drop

f12874f

This will allow us to invoke callbacks when deleting a memo with `Arc<dyn Any>` values.

remove the _mut accesors from Table (unused)

c251001

store ingredient index for each memo ingredient

31257ba

remove "constant" functions

8833a71

The goal here is that ALL `Id` values come from a `Table`

make memos take read-lock (and infallible return)

3dabb0d

We want to ensure that accessing the memos only occurs in revision R after the struct is created.

introduce Memo trait

8b058be

store memoized fn results attached to the salsa struct

6401563

simplify Id traits, make new crate-private

d7af1a8

`Id` values are used in a very tailored way now, no reason to let people construct arbitrary ones.

pacify the merciless cargo fmt

3e55594

apply cargo clippy --fix

8b8dd53

fix the remaining clippy items

8ad967e

pacify the merciless cargo fmt

02036ff

nikomatsakis force-pushed the tables branch from f5367d2 to 02036ff Compare August 20, 2024 14:10

nikomatsakis added this pull request to the merge queue Aug 20, 2024

Merged via the queue into salsa-rs:master with commit d5018d5 Aug 20, 2024
10 checks passed

nikomatsakis mentioned this pull request Aug 20, 2024

Fix verification of tracked struct from high-durability query #550

Closed

davidbarsky mentioned this pull request Aug 20, 2024

Const queries no longer compile #565

Closed

carljm mentioned this pull request Aug 28, 2024

add test for high-durability dependency validation #569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tables #564

Tables #564

nikomatsakis commented Aug 16, 2024

netlify bot commented Aug 16, 2024 •

edited

Loading

codspeed-hq bot commented Aug 16, 2024 •

edited

Loading

MichaReiser commented Aug 16, 2024

MichaReiser commented Aug 16, 2024

MichaReiser Aug 16, 2024

nikomatsakis Aug 16, 2024

MichaReiser Aug 20, 2024

nikomatsakis Aug 20, 2024

MichaReiser commented Aug 16, 2024

nikomatsakis commented Aug 16, 2024

nikomatsakis commented Aug 16, 2024

nikomatsakis commented Aug 16, 2024

davidbarsky commented Aug 16, 2024

carljm commented Aug 16, 2024

davidbarsky commented Aug 16, 2024 •

edited

Loading

carljm commented Aug 16, 2024

MichaReiser commented Aug 16, 2024

nikomatsakis commented Aug 17, 2024

davidbarsky commented Aug 17, 2024

MichaReiser commented Aug 20, 2024

nikomatsakis commented Aug 20, 2024

nikomatsakis commented Aug 20, 2024

nikomatsakis commented Aug 20, 2024 •

edited

Loading

davidbarsky commented Aug 20, 2024

MichaReiser commented Aug 20, 2024

davidbarsky commented Aug 20, 2024

Tables #564

Tables #564

Conversation

nikomatsakis commented Aug 16, 2024

netlify bot commented Aug 16, 2024 • edited Loading

✅ Deploy Preview for salsa-rs canceled.

codspeed-hq bot commented Aug 16, 2024 • edited Loading

CodSpeed Performance Report

Merging #564 will improve performances by ×2.7

Summary

Benchmarks breakdown

MichaReiser commented Aug 16, 2024

MichaReiser commented Aug 16, 2024

MichaReiser Aug 16, 2024

Choose a reason for hiding this comment

nikomatsakis Aug 16, 2024

Choose a reason for hiding this comment

MichaReiser Aug 20, 2024

Choose a reason for hiding this comment

nikomatsakis Aug 20, 2024

Choose a reason for hiding this comment

MichaReiser commented Aug 16, 2024

nikomatsakis commented Aug 16, 2024

nikomatsakis commented Aug 16, 2024

nikomatsakis commented Aug 16, 2024

davidbarsky commented Aug 16, 2024

carljm commented Aug 16, 2024

davidbarsky commented Aug 16, 2024 • edited Loading

carljm commented Aug 16, 2024

MichaReiser commented Aug 16, 2024

nikomatsakis commented Aug 17, 2024

davidbarsky commented Aug 17, 2024

MichaReiser commented Aug 20, 2024

nikomatsakis commented Aug 20, 2024

nikomatsakis commented Aug 20, 2024

nikomatsakis commented Aug 20, 2024 • edited Loading

davidbarsky commented Aug 20, 2024

MichaReiser commented Aug 20, 2024

davidbarsky commented Aug 20, 2024

netlify bot commented Aug 16, 2024 •

edited

Loading

codspeed-hq bot commented Aug 16, 2024 •

edited

Loading

davidbarsky commented Aug 16, 2024 •

edited

Loading

nikomatsakis commented Aug 20, 2024 •

edited

Loading