Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 8 additions & 9 deletions crates/ruff_db/src/parsed.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,11 @@ use crate::source::source_text;
/// reflected in the changed AST offsets.
/// The other reason is that Ruff's AST doesn't implement `Eq` which Salsa requires
/// for determining if a query result is unchanged.
#[salsa::tracked(returns(ref), no_eq, heap_size=ruff_memory_usage::heap_size)]
///
/// The LRU capacity of 200 was picked without any empirical evidence that it's optimal,
/// instead it's a wild guess that it should be unlikely that incremental changes involve
/// more than 200 modules. Parsed ASTs within the same revision are never evicted by Salsa.
#[salsa::tracked(returns(ref), no_eq, heap_size=ruff_memory_usage::heap_size, lru=200)]
pub fn parsed_module(db: &dyn Db, file: File) -> ParsedModule {
let _span = tracing::trace_span!("parsed_module", ?file).entered();

Expand Down Expand Up @@ -92,14 +96,9 @@ impl ParsedModule {
self.inner.store(None);
}

/// Returns the pointer address of this [`ParsedModule`].
///
/// The pointer uniquely identifies the module within the current Salsa revision,
/// regardless of whether particular [`ParsedModuleRef`] instances are garbage collected.
pub fn addr(&self) -> usize {
// Note that the outer `Arc` in `inner` is stable across garbage collection, while the inner
// `Arc` within the `ArcSwap` may change.
Arc::as_ptr(&self.inner).addr()
/// Returns the file to which this module belongs.
pub fn file(&self) -> File {
self.file
}
}

Expand Down
43 changes: 36 additions & 7 deletions crates/ruff_memory_usage/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,17 +1,46 @@
use std::sync::{LazyLock, Mutex};
use std::cell::RefCell;

use get_size2::{GetSize, StandardTracker};
use ordermap::{OrderMap, OrderSet};

thread_local! {
pub static TRACKER: RefCell<Option<StandardTracker>>= const { RefCell::new(None) };
}
Comment on lines +6 to +8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would appreciate a comment explaining what on earth is going on here. Also, interesting that this no longer needs to be multi-threaded..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never multi-threaded. It's just that it was a static before, so we had to use a Mutex


struct TrackerGuard(Option<StandardTracker>);

impl Drop for TrackerGuard {
fn drop(&mut self) {
TRACKER.set(self.0.take());
}
}

pub fn attach_tracker<R>(tracker: StandardTracker, f: impl FnOnce() -> R) -> R {
let prev = TRACKER.replace(Some(tracker));
let _guard = TrackerGuard(prev);
f()
}

fn with_tracker<F, R>(f: F) -> R
where
F: FnOnce(Option<&mut StandardTracker>) -> R,
{
TRACKER.with(|tracker| {
let mut tracker = tracker.borrow_mut();
f(tracker.as_mut())
})
}

/// Returns the memory usage of the provided object, using a global tracker to avoid
/// double-counting shared objects.
pub fn heap_size<T: GetSize>(value: &T) -> usize {
static TRACKER: LazyLock<Mutex<StandardTracker>> =
LazyLock::new(|| Mutex::new(StandardTracker::new()));

value
.get_heap_size_with_tracker(&mut *TRACKER.lock().unwrap())
.0
with_tracker(|tracker| {
if let Some(tracker) = tracker {
value.get_heap_size_with_tracker(tracker).0
} else {
value.get_heap_size()
}
})
}

/// An implementation of [`GetSize::get_heap_size`] for [`OrderSet`].
Expand Down
10 changes: 10 additions & 0 deletions crates/ty_ide/src/symbols.rs
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,16 @@ pub(crate) fn symbols_for_file_global_only(db: &dyn Db, file: File) -> FlatSymbo
global_only: true,
};
visitor.visit_body(&module.syntax().body);

if file
.path(db)
.as_system_path()
.is_none_or(|path| !db.project().is_file_included(db, path))
{
// Eagerly clear ASTs of third party files.
parsed.clear();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that each time completions are requested, every file will need to be re-parsed?

I checked out this PR and tried it out. I used a single Python source file containing just array. I opened it in VS Code and requested completions at the end of array. On main, I get:

2025-12-02 09:48:31.234246045 DEBUG request{id=13 method="textDocument/completion"}: Completions request returned 326 suggestions in 149.088978ms
2025-12-02 09:48:34.461071428 DEBUG request{id=15 method="textDocument/completion"}: Completions request returned 326 suggestions in 17.225306ms
2025-12-02 09:48:36.677139939 DEBUG request{id=17 method="textDocument/completion"}: Completions request returned 326 suggestions in 19.559503ms
2025-12-02 09:48:40.656242428 DEBUG request{id=19 method="textDocument/completion"}: Completions request returned 326 suggestions in 18.719669ms

And with your branch I get:

2025-12-02 09:50:26.152804396 DEBUG request{id=14 method="textDocument/completion"}: Completions request returned 326 suggestions in 166.780705ms
2025-12-02 09:50:29.762248593 DEBUG request{id=16 method="textDocument/completion"}: Completions request returned 326 suggestions in 19.985813ms
2025-12-02 09:50:33.345613225 DEBUG request{id=18 method="textDocument/completion"}: Completions request returned 326 suggestions in 19.429199ms
2025-12-02 09:50:37.048968976 DEBUG request{id=20 method="textDocument/completion"}: Completions request returned 326 suggestions in 18.763721ms

So there still seems to be significant caching happening?

Copy link
Member Author

@MichaReiser MichaReiser Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that each time completions are requested, every file will need to be re-parsed?

We still cache symbols_for_file_global_only. But yes, we'll have to reparse the AST if a third party file changes, but that should be very unlikely and not something we can't really avoid because we obviously have to recompute symbols_for_file_global_only in that case.

The main drawback of this is if we had any other query that iterates over all third-party modules, parses them, and does stuff. But I don't think there's any such query?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see now. Yeah this is great. And no, I'm not aware of any other queries that do this. I agree we'd have to be careful introducing one. Maybe add a comment here warning of that eventuality?


FlatSymbols {
symbols: visitor.symbols,
}
Expand Down
6 changes: 5 additions & 1 deletion crates/ty_project/src/db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ pub use self::changes::ChangeResult;
use crate::CollectReporter;
use crate::metadata::settings::file_settings;
use crate::{ProgressReporter, Project, ProjectMetadata};
use get_size2::StandardTracker;
use ruff_db::Db as SourceDb;
use ruff_db::diagnostic::Diagnostic;
use ruff_db::files::{File, Files};
Expand Down Expand Up @@ -129,7 +130,10 @@ impl ProjectDatabase {
/// Returns a [`SalsaMemoryDump`] that can be use to dump Salsa memory usage information
/// to the CLI after a typechecker run.
pub fn salsa_memory_dump(&self) -> SalsaMemoryDump {
let memory_usage = <dyn salsa::Database>::memory_usage(self);
let memory_usage = ruff_memory_usage::attach_tracker(StandardTracker::new(), || {
<dyn salsa::Database>::memory_usage(self)
});

let mut ingredients = memory_usage
.structs
.into_iter()
Expand Down
9 changes: 5 additions & 4 deletions crates/ty_python_semantic/src/ast_node_ref.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
use std::fmt::Debug;
use std::marker::PhantomData;

#[cfg(debug_assertions)]
use ruff_db::files::File;
use ruff_db::parsed::ParsedModuleRef;
use ruff_python_ast::{AnyNodeRef, NodeIndex};
use ruff_python_ast::{AnyRootNodeRef, HasNodeIndex};
Expand Down Expand Up @@ -44,7 +46,7 @@ pub struct AstNodeRef<T> {
// cannot implement `Eq`, as indices are only unique within a given instance of the
// AST.
#[cfg(debug_assertions)]
module_addr: usize,
file: File,

_node: PhantomData<T>,
}
Expand Down Expand Up @@ -72,7 +74,7 @@ where
Self {
index,
#[cfg(debug_assertions)]
module_addr: module_ref.module().addr(),
file: module_ref.module().file(),
#[cfg(debug_assertions)]
kind: AnyNodeRef::from(node).kind(),
#[cfg(debug_assertions)]
Expand All @@ -88,8 +90,7 @@ where
#[track_caller]
pub fn node<'ast>(&self, module_ref: &'ast ParsedModuleRef) -> &'ast T {
#[cfg(debug_assertions)]
assert_eq!(module_ref.module().addr(), self.module_addr);

assert_eq!(module_ref.module().file(), self.file);
// The user guarantees that the module is from the same file and Salsa
// revision, so the file contents cannot have changed.
module_ref
Expand Down
2 changes: 1 addition & 1 deletion crates/ty_python_semantic/src/module_resolver/module.rs
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ impl std::fmt::Debug for Module<'_> {
}

#[allow(clippy::ref_option)]
#[salsa::tracked(returns(ref))]
#[salsa::tracked(returns(ref), heap_size=ruff_memory_usage::heap_size)]
fn all_submodule_names_for_package<'db>(
db: &'db dyn Db,
module: Module<'db>,
Expand Down