Fix verification of tracked struct from high-durability query #550

carljm · 2024-08-02T06:29:14Z

This adds a test for, and proposes one possible fix for, a bug we ran into with durabilities where we were hitting the "access struct from previous revision" assert.

The test models a situation where we have two File inputs (0, 1), where File(0) has LOW durability and File(1) has HIGH durability. We can query an index for each file, and a definitions from that index (just a sub-part of the index), and we can infer each file. The index and definitions queries depend only on the File they operate on, but the infer query has some other dependencies: infer(0) depends on infer(1), and infer(1) also depends directly on File(0). The dependency graph is shown below: light color represents low durability, black represents high durability, dark gray are structs and struct fields; I didn't get around to adding their durability to the graph yet.

(The graph is automatically generated using some hacky printlns I added to Salsa that create a mermaid.live graph; if I end up debugging another issue where it's useful, I'll probably clean this code up and try to make it generally reusable.)

The panic occurs because definitions(1) is high durability, and depends on index(1) which is also high durability. index(1) creates the tracked struct Definition(1), and infer(1) (which is low durability) depends on Definition.file(1).

After a change to file 0 (low durability), we only shallowly verify definitions(1) -- it doesn't need deep-verify because it is high durability. So that means we never verify index(1) at all (deeply or shallowly), which means we never mark Definition(1) validated. So when we deep-verify infer(1), we try to access its dependency Definition.file(1), and hit the panic because we are accessing a tracked struct that has never been re-validated or re-recreated in R2.

The proposed fix is that in maybe_changed_after for a tracked struct field, rather than checking that the struct has been created/validated at the current revision, check that it has been created/validated at least as recently as its durability last changed, and if that is older than the current revision, validate it. The logic here is that this tracked struct can't have changed unless some input at or higher than its durability has changed.

The other possible fix I considered (but haven't tried implementing yet) is that when we short-circuit a shallow verify due to durability, we need to not only mark our own outputs validated, but recursively find all our inputs that also verify due to durability, and validate all of their outputs as well. That fix seems more complex and expensive than what I have in this PR.

netlify · 2024-08-02T06:29:31Z

✅ Deploy Preview for salsa-rs canceled.

Name	Link
🔨 Latest commit	`5c541dc`
🔍 Latest deploy log	https://app.netlify.com/sites/salsa-rs/deploys/66bcc55a4b54c600086ce361

codspeed-hq · 2024-08-02T06:32:39Z

CodSpeed Performance Report

Merging #550 will not alter performance

_{Comparing carljm:struct-panic-main (5c541dc) with master (08820ea)}

Summary

✅ 8 untouched benchmarks

MichaReiser

Nice! I don't think I understand salsa well enough at this point to say with high confidence that this is correct. We may have to wait for @nikomatsakis's opinion (but can merge the fix in our fork to unblock)

src/tracked_struct/struct_map.rs

MichaReiser · 2024-08-02T07:04:02Z

src/tracked_struct/struct_map.rs

@@ -201,6 +209,50 @@ where
        unsafe { C::struct_from_raw(data.as_raw()) }
    }

+    fn get_and_validate_last_changed<'db>(


I would find comment describing how it is different from get_from_map useful because the difference is very subtle

Added the comment.

One of my questions here for @nikomatsakis is whether instead of adding a new get_and_validate_last_changed, maybe the implementation of get_from_map should just always do this. It's possible that other code paths that just call .get are still susceptible to this bug?

But it's also a little weird if a method called get can mutate the stored data.

src/tracked_struct/struct_map.rs

MichaReiser · 2024-08-02T07:22:18Z

src/tracked_struct/tracked_field.rs

        let id = input.unwrap();
-        let data = self.struct_map.get(current_revision, id);
+        let data = self


I don't think I understand salsa enough to make a real assessment but changing the last changed in maybe_changed_after seems slightly off to me.

I also wonder if we'll run into problems with it if the setup is slightly different because, as far as I understand, maybe_changed_after is only called from deep verify memo. That means, it would get by passed if all queries only perform a shallow compare. I'm not sure if that matters for tracked fields or if some other mechanism would take over.

But maybe it is allright because it is kind of similar to what FunctionIngredient::maybe_changed_after does.

I also wonder if we'll run into problems with it if the setup is slightly different because, as far as I understand, maybe_changed_after is only called from deep verify memo. That means, it would get by passed if all queries only perform a shallow compare.

Yes, this relates to my question above about whether it's possible for other code paths besides maybe_changed_after which call get to run into this bug.

But I think it is safe as-is, because any query of low enough durability to possibly re-run in the current revision will also go through deep verify, which will check maybe_changed_after on all of its inputs. So a struct that never gets maybe_changed_after called on it in a given revision can't be accessed in that revision. I think?

But maybe it is allright because it is kind of similar to what FunctionIngredient::maybe_changed_after does.

Yes, it seems like it is already part of the contract of maybe_changed_after (and verification in general) that it can validate things to the current revision.

tests/tracked_struct_not_validated_due_to_durability.rs

nikomatsakis · 2024-08-05T16:35:38Z

Will attempt to read this tomorrow-ish.

MichaReiser · 2024-08-06T11:57:46Z

I don't have a good repro yet but I just ran into the following error.

2024-08-06 13:52:36.553 [info]    0: red_knot_server::server::Server::run::{{closure}}
             at /home/micha/astral/ruff/crates/red_knot_server/src/server.rs:140:29
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/alloc/src/boxed.rs:2077:9
   2: std::panicking::rust_panic_with_hook
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:799:13
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:656:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/sys_common/backtrace.rs:171:18
   5: rust_begin_unwind
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:652:5
   6: core::panicking::panic_fmt
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panicking.rs:72:14
   7: salsa::tracked_struct::struct_map::StructMap<C>::get_from_map
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/tracked_struct/struct_map.rs:200:9
   8: salsa::tracked_struct::struct_map::StructMap<C>::get
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/tracked_struct/struct_map.rs:177:9
   9: salsa::tracked_struct::IngredientImpl<C>::lookup_struct
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/tracked_struct.rs:382:9
  10: red_knot_python_semantic::semantic_index::definition::_::<impl salsa::id::LookupId for red_knot_python_semantic::semantic_index::definition::Definition>::lookup_id
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/components/salsa-macro-rules/src/setup_tracked_struct.rs:144
2024-08-06 13:52:36.553 [info] :21
  11: <red_knot_python_semantic::types::infer::infer_definition_types::Configuration_ as salsa::function::Configuration>::id_to_input
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/components/salsa-macro-rules/src/setup_tracked_fn.rs:196:29
  12: salsa::function::execute::<impl salsa::function::IngredientImpl<C>>::execute::{{closure}}
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/execute.rs:46:58
  13: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/panic/unwind_safe.rs:272:9
  14: std::panicking::try::do_call
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:559:40
  15: __rust_try
  16: std::panicking::try
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panicking.rs:523:19
  17: std::panic::catch_unwind
             at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/panic.rs:149:14
  18: salsa::cycle::Cycle::catch
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/cycle.rs:42:15
  19: salsa::function::execute::<impl salsa::function::IngredientImpl<C>>::execute
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/execute.rs:46:27
  20: salsa::function::fetch::<impl salsa::function::IngredientImpl<C>>::fetch_cold
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/fetch.rs:104:14
  21: salsa::function::fetch::<impl salsa::function::IngredientImpl<C>>::compute_value::{{closure}}
             at /home/micha/.cargo/git/checkouts/salsa-fa0738f776139e4c/ece083e/src/function/fetch.rs:46:29
  22: core::option::Option<T>::or_else

Which may indicate that we need to validate all tracked structs?

carljm · 2024-08-06T16:29:24Z

Yeah, that traceback does suggest that there's still a path to accessing an un-validated struct :/

When you say you don't have a good repro, do you mean that you can't reproduce it consistently at all? That's going to make things difficult...

MichaReiser · 2024-08-06T16:32:35Z

@carljm I haven't spent time on finding a consistent repro. I ran into it when I switched between different files in the LSP

nikomatsakis

OK, I get the overall bug. Let me think on this.

nikomatsakis · 2024-08-08T11:12:44Z

My thought is that we should just update to the most recent revision during the get operation. I think the only way to have an issue is to cheat and leak the data through thread local storage. We do need to update to prevent later writes from occurring in that scenario, however

…

On Wed, Aug 7, 2024, at 9:02 AM, Niko Matsakis wrote: ***@***.**** commented on this pull request. OK, I get the overall bug. Let me think on this. — Reply to this email directly, view it on GitHub <#550 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABF4ZUZJEYIJAPF3RSKNITZQGZ6BAVCNFSM6AAAAABL35N3WOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMRSGY4TQNRZHE>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

carljm · 2024-08-08T16:29:54Z

Always updating on the get path makes sense to me and should simplify this PR. I'll update to that approach.

carljm · 2024-08-13T00:37:17Z

Updated to always validate on get. Some things I'm not sure about here:

This means there can be cases (as in the added test) where the same struct is validated on get, and then later re-validated via mark_validated_output in the same revision. This means validate can no longer assert that the struct isn't already validated. I'm not sure what the comment "Never update a struct twice in the same revision" was attempting to prevent, but it doesn't seem like setting data.created_at = current_revision when they are already equal is going to cause a problem?
I wondered whether validating on get might mean we no longer need mark_validated_output on tracked structs at all, but making it a no-op caused another test (deletion-cascade) to fail with the "access obsolete struct" panic.
This change means both get and update now need to take &Zalsa rather than just a current_revision, so that in get_from_map we can call zalsa.last_changed_revision with the durability, which we only have access to after we get the data out of the map.

carljm · 2024-08-13T00:37:53Z

Test failure looks like a compiler improvement in nightly catching an unmatchable pattern, not related to this PR.

davidbarsky · 2024-08-13T17:22:55Z

Test failure looks like a compiler improvement in nightly catching an unmatchable pattern, not related to this PR.

I've ignored it; it should be fixed soonish: rust-lang/rust#129031. If you rebase, CI will pass.

nikomatsakis · 2024-08-20T14:27:16Z

I THINK this is fixed by #564

MichaReiser · 2024-08-20T20:08:05Z

I can confirm this is fixed (https://github.com/astral-sh/ruff/pull/13016/files).

Did we add a test for it?

nikomatsakis · 2024-08-20T20:31:35Z

@MichaReiser I don't think I added a specific test, I'm going to close this PR, but a PR with a test would be great

carljm mentioned this pull request Aug 2, 2024

Fix verification of tracked struct from high-durability query MichaReiser/salsa#1

Merged

carljm force-pushed the struct-panic-main branch from 924b53d to b30e9a7 Compare August 2, 2024 06:32

carljm marked this pull request as ready for review August 2, 2024 06:47

MichaReiser approved these changes Aug 2, 2024

View reviewed changes

MichaReiser requested a review from nikomatsakis August 2, 2024 07:24

This was referenced Aug 2, 2024

Weaken assertion in validate MichaReiser/salsa#3

Merged

[red-knot] update benchmark to run on tomllib astral-sh/ruff#12635

Merged

carljm force-pushed the struct-panic-main branch from c8f6317 to b2ec4b9 Compare August 5, 2024 16:15

carljm force-pushed the struct-panic-main branch from b2ec4b9 to 1ac16e2 Compare August 5, 2024 16:39

carljm force-pushed the struct-panic-main branch from 1ac16e2 to 986fddf Compare August 6, 2024 16:30

nikomatsakis reviewed Aug 7, 2024

View reviewed changes

carljm force-pushed the struct-panic-main branch from 986fddf to 8d553da Compare August 13, 2024 00:30

carljm added 2 commits August 14, 2024 07:55

Fix verification of tracked struct from high-durability query

bb1c60b

always validate in get path

5c541dc

carljm force-pushed the struct-panic-main branch from 8d553da to 5c541dc Compare August 14, 2024 14:55

nikomatsakis closed this Aug 20, 2024

carljm mentioned this pull request Aug 28, 2024

add test for high-durability dependency validation #569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix verification of tracked struct from high-durability query #550

Fix verification of tracked struct from high-durability query #550

carljm commented Aug 2, 2024 •

edited

Loading

netlify bot commented Aug 2, 2024 •

edited

Loading

codspeed-hq bot commented Aug 2, 2024 •

edited

Loading

MichaReiser left a comment •

edited

Loading

MichaReiser Aug 2, 2024

carljm Aug 2, 2024

MichaReiser Aug 2, 2024

carljm Aug 2, 2024 •

edited

Loading

nikomatsakis commented Aug 5, 2024

MichaReiser commented Aug 6, 2024

carljm commented Aug 6, 2024

MichaReiser commented Aug 6, 2024

nikomatsakis left a comment

nikomatsakis commented Aug 8, 2024 via email

carljm commented Aug 8, 2024

carljm commented Aug 13, 2024

carljm commented Aug 13, 2024

davidbarsky commented Aug 13, 2024

nikomatsakis commented Aug 20, 2024

MichaReiser commented Aug 20, 2024

nikomatsakis commented Aug 20, 2024

Fix verification of tracked struct from high-durability query #550

Fix verification of tracked struct from high-durability query #550

Conversation

carljm commented Aug 2, 2024 • edited Loading

netlify bot commented Aug 2, 2024 • edited Loading

✅ Deploy Preview for salsa-rs canceled.

codspeed-hq bot commented Aug 2, 2024 • edited Loading

CodSpeed Performance Report

Merging #550 will not alter performance

Summary

MichaReiser left a comment • edited Loading

Choose a reason for hiding this comment

MichaReiser Aug 2, 2024

Choose a reason for hiding this comment

carljm Aug 2, 2024

Choose a reason for hiding this comment

MichaReiser Aug 2, 2024

Choose a reason for hiding this comment

carljm Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

nikomatsakis commented Aug 5, 2024

MichaReiser commented Aug 6, 2024

carljm commented Aug 6, 2024

MichaReiser commented Aug 6, 2024

nikomatsakis left a comment

Choose a reason for hiding this comment

nikomatsakis commented Aug 8, 2024 via email

carljm commented Aug 8, 2024

carljm commented Aug 13, 2024

carljm commented Aug 13, 2024

davidbarsky commented Aug 13, 2024

nikomatsakis commented Aug 20, 2024

MichaReiser commented Aug 20, 2024

nikomatsakis commented Aug 20, 2024

carljm commented Aug 2, 2024 •

edited

Loading

netlify bot commented Aug 2, 2024 •

edited

Loading

codspeed-hq bot commented Aug 2, 2024 •

edited

Loading

MichaReiser left a comment •

edited

Loading

carljm Aug 2, 2024 •

edited

Loading