Add a subcommand to generate dependency graphs #13402

charliermarsh · 2024-09-18T23:46:42Z

Summary

This PR adds an experimental Ruff subcommand to generate dependency graphs based on module resolution.

A few highlights:

You can generate either dependency or dependent graphs via the --direction command-line argument.
Like Pants, we also provide an option to identify imports from string literals (--detect-string-imports).
Users can also provide additional dependency data via the include-dependencies key under [tool.ruff.import-map]. This map uses file paths as keys, and lists of strings as values. Those strings can be file paths or globs.

The dependency resolution uses the red-knot module resolver which is intended to be fully spec compliant, so it's also a chance to expose the module resolver in a real-world setting.

The CLI is, e.g., ruff graph build ../autobot, which will output a JSON map from file to files it depends on for the autobot project.

github-actions · 2024-09-19T00:45:20Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

AlexWaygood

Couple of comments from skimming!

crates/ruff_graph/src/collector.rs

crates/ruff_graph/src/resolver.rs

charliermarsh · 2024-09-19T02:48:44Z

This needs tests prior to merging but I would like feedback.

MichaReiser

Overall looks good.

I think we're creating too many Db instances which will hurt performance and I would recommend to use ModuleName and SystemPath over QualifiedName and Path to avoid conversions at the leaf.

crates/ruff/src/args.rs

crates/ruff/src/commands/graph.rs

MichaReiser · 2024-09-19T06:24:13Z

crates/ruff/src/commands/graph.rs

+    // Create a database for each source root.
+    let databases = package_roots
+        .values()
+        .filter_map(|package| package.as_deref())
+        .filter_map(|package| package.parent())
+        .map(Path::to_path_buf)
+        .map(|source_root| Ok((source_root.clone(), ModuleDb::from_src_root(source_root)?)))
+        .collect::<Result<BTreeMap<_, _>>>()?;


What's the motivation for creating a new db for each source root?

The caching is local to each database, and creating a new database for each package means that we trade less congestion with reduced cache efficiency. This might be fine if the resolved modules in each package root are distinct but I suspect that the files contain cross-package imports and many stdlib imports that now need to resolved multiple times.

The recommended way to parallelism in Salsa is to implement a snapshot method for your Database that creates a shallow clone:

#[must_use] pub fn snapshot(&self) -> Self { Self { workspace: self.workspace, storage: self.storage.clone(), files: self.files.snapshot(), system: Arc::clone(&self.system), } }

ruff/crates/red_knot_workspace/src/db.rs

Lines 82 to 90 in ecab04e

#[must_use]

pub fn snapshot(&self) -> Self {

Self {

workspace: self.workspace,

storage: self.storage.clone(),

files: self.files.snapshot(),

system: Arc::clone(&self.system),

}

}

And you can then create a unique database handle for each task:

let files = WorkspaceFiles::new(db, self); let result = Arc::new(std::sync::Mutex::new(Vec::new())); let inner_result = Arc::clone(&result); let db = db.snapshot(); let workspace_span = workspace_span.clone(); rayon::scope(move |scope| { for file in &files { let result = inner_result.clone(); let db = db.snapshot(); let workspace_span = workspace_span.clone(); scope.spawn(move |_| { let check_file_span = tracing::debug_span!(parent: &workspace_span, "check_file", file=%file.path(&db)); let _entered = check_file_span.entered(); let file_diagnostics = check_file(&db, file); result.lock().unwrap().extend(file_diagnostics); }); } }); Arc::into_inner(result).unwrap().into_inner().unwrap()

ruff/crates/red_knot_workspace/src/workspace.rs

Lines 206 to 220 in 599103c

rayon::scope(move |scope| {

for file in &files {

let result = inner_result.clone();

let db = db.snapshot();

let workspace_span = workspace_span.clone();

scope.spawn(move |_| {

let check_file_span = tracing::debug_span!(parent: &workspace_span, "check_file", file=%file.path(&db));

let _entered = check_file_span.entered();

let file_diagnostics = check_file(&db, file);

result.lock().unwrap().extend(file_diagnostics);

});

}

});

Edit: I see below that you already did this. I'm still curious for why we need multiple Dbs. I think what we do here is too granular. At least if I remember how package roots works correctly. I think it doesn't just generate a db for the top-level packages but also for all sub packages. That means that we have two database for a.b.c: one for package a and another for b if c is a module and a and a.b are packages. The result is that we double-resolve modules for a.b, depending on whether they come from the a or a.b. database.

I also worry that relative imports don't work due to that because I think we might baile out at the src_root.

Can't we generate a single database or a database for each common ancestor of the packages (which ideally is the src directory)?

If not, can we add some package paths to SearchPathSettings::extra_paths?

If not, can we add some package paths to SearchPathSettings::extra_paths?

This isn't really "correct" since extra_paths take priority over even the stdlib search path in module resolution. But maybe it'll produce the right answer in most cases.

In the real world, if you have multiple independent first-party packages in a workspace (which may or may not depend on each other), you're likely to have them all editably installed into a venv, and red-knot would statically discover all the first-party packages by reading the pth files in the site-packages directory and add them as search paths that way. But maybe that approach doesn't work here.

The source roots here are actually just the root folders. So, like, in Django, it's just a small set of top-level folders:

["/Users/crmarsh/workspace/django", "/Users/crmarsh/workspace/django/tests", "/Users/crmarsh/workspace/django/tests/admin_scripts/custom_templates", "/Users/crmarsh/workspace/django/tests/admin_scripts/custom_templates/project_template", "/Users/crmarsh/workspace/django/tests/i18n/sampleproject", "/Users/crmarsh/workspace/django/tests/migrations/faulty_migrations/namespace", "/Users/crmarsh/workspace/django/tests/migrations/test_fake_initial_case_insensitive"]

We could use a single database with extra paths -- that's actually the way I did it initially, though Alex said it's not the intended use-case. The downside, I think, is that it would allow those source roots to import files across one another. Maybe that's correct though! I can change it.

The source roots here are actually just the root folders.

Possibly I'm still misunderstanding, but if everything's importable from the workspace root, doesn't that imply you can resolve all modules and submodules relative to a single search path, src_root, which would point to the workspace root?

Alex is obviously the authority when it comes to search paths and I'm probably wrong haha 😆

The intended setup is to use a single database for an application (except VS code where you can have multiple workspaces). That gives the best cache reuse. If that's possible, great. If we aren't hitting any perf issues, that's great too.

Looking at the roots for Django that were listed, I think those are all independent roots (as in, they aren't all importable from a single source root), but they can import each other. So I think it would be more efficient to use a single db, by using multiple you are likely repeating work. It's true that the red-knot resolver (because it was built around the more restrictive assumptions of PEP 561 for Python type checkers, rather than a generic "ordered sys.path" abstraction) doesn't obviously accommodate this use case, but I think extra-paths is a totally fine way to handle it, and better than multiple databases. (In fact I think this is pretty much the sort of use-case extra-paths exists to handle.)

This isn't really "correct" since extra_paths take priority over even the stdlib search path in module resolution.

I don't understand this comment: all first-party paths (both the main source root and the extra paths) should always take priority over the stdlib in module resolution. According to PEP 561 order, the first two items in the resolution order are extra paths and then the first-party root. So if you have multiple "first party roots", given that they have to go in some resolution order, I don't see how any incorrectness is introduced by putting N-1 of them as extra paths, and whichever one you want last as the main first-party root.

This isn't really "correct" since extra_paths take priority over even the stdlib search path in module resolution.

I don't understand this comment: all first-party paths (both the main source root and the extra paths) should always take priority over the stdlib in module resolution.

Ugh, you're quite right. Blame it on the holiday haze... yeah, using extra_paths should be fine then, in that case. Sorry for causing unnecessary work @charliermarsh :(

In the real world, if you have multiple independent first-party packages in a workspace (which may or may not depend on each other), you're likely to have them all editably installed into a venv, and red-knot would statically discover all the first-party packages by reading the pth files in the site-packages directory and add them as search paths that way. But maybe that approach doesn't work here.

In many real-world cases (including Django), you'll have little sub-trees of Python code that aren't packaged at all or installed via Python packaging tools, but still end up on the runtime sys.path in some more bespoke way. That's the sort of scenario extra-paths is supposed to be for.

crates/ruff_db/src/system/os.rs

crates/ruff_graph/src/collector.rs

crates/ruff_workspace/src/settings.rs

MichaReiser · 2024-09-19T06:37:39Z

crates/ruff_workspace/src/options.rs

@@ -3306,6 +3313,48 @@ pub struct FormatOptions {
    pub docstring_code_line_length: Option<DocstringCodeLineWidth>,
 }

+/// Configures Ruff's import map generation.
+#[derive(
+    Clone, Debug, PartialEq, Eq, Default, Deserialize, Serialize, OptionsMetadata, CombineOptions,


Should we remove the OptionsMetadata trait for now to exclude the settings from the schema?

crates/ruff_graph/src/resolver.rs

crates/ruff/src/commands/graph.rs

carljm

Very cool!

carljm · 2024-09-19T22:46:33Z

crates/ruff/src/commands/analyze_graph.rs

+
+    // Resolve all package roots.
+    let package_roots = resolver
+        .package_roots(


This logic (determining a package root for an arbitrary file by walking up the tree looking for __init__.py) is part of Ruff so it must be fairly battle-tested, but I'm a little surprised if it doesn't often fail on big internal codebases. In my experience, "accidental namespace packages" (because someone just forgot an __init__.py, and imports worked anyway) are pretty common. Maybe a lot of people use linters (like Ruff, perhaps!) that prevent this?

Because of that, I tend to be more on the side of, if you're going to be doing multi-file analysis, you should explicitly configure the package root(s), rather than trying to discover files and packages automatically, which is easily broken by namespace packages and therefore requires explicitly configuring namespace packages.

crates/ruff/src/commands/analyze_graph.rs

crates/ruff_graph/src/collector.rs

carljm · 2024-09-19T23:04:06Z

crates/ruff_graph/src/collector.rs

+
+    fn visit_expr(&mut self, expr: &'ast Expr) {
+        if self.string_imports {
+            if let Expr::StringLiteral(ast::ExprStringLiteral { value, range: _ }) = expr {


This is more aggressive than I realized -- we don't even look for calls to __import__ or importlib.import_module, we just look for any string that might be a module path.

It seems like this could easily be fooled by cases where an import-path constant is defined in one file (like a Django settings file) but the string is imported and then used in a dynamic import in a different module.

But if it's good enough to match existing tools on the codebases we care about, great!

## 0.6.9 ### Preview features - Fix codeblock dynamic line length calculation for indented docstring examples ([#13523](astral-sh/ruff#13523)) - \[`refurb`\] Mark `FURB118` fix as unsafe ([#13613](astral-sh/ruff#13613)) ### Rule changes - \[`pydocstyle`\] Don't raise `D208` when last line is non-empty ([#13372](astral-sh/ruff#13372)) - \[`pylint`\] Preserve trivia (i.e. comments) in `PLR5501` autofix ([#13573](astral-sh/ruff#13573)) ### Configuration - \[`pyflakes`\] Add `allow-unused-imports` setting for `unused-import` rule (`F401`) ([#13601](astral-sh/ruff#13601)) ### Bug fixes - Support ruff discovery in pip build environments ([#13591](astral-sh/ruff#13591)) - \[`flake8-bugbear`\] Avoid short circuiting `B017` for multiple context managers ([#13609](astral-sh/ruff#13609)) - \[`pylint`\] Do not offer an invalid fix for `PLR1716` when the comparisons contain parenthesis ([#13527](astral-sh/ruff#13527)) - \[`pyupgrade`\] Fix `UP043` to apply to `collections.abc.Generator` and `collections.abc.AsyncGenerator` ([#13611](astral-sh/ruff#13611)) - \[`refurb`\] Fix handling of slices in tuples for `FURB118`, e.g., `x[:, 1]` ([#13518](astral-sh/ruff#13518)) ### Documentation - Update GitHub Action link to `astral-sh/ruff-action` ([#13551](astral-sh/ruff#13551)) ## 0.6.8 ### Preview features - Remove unnecessary parentheses around `match case` clauses ([#13510](astral-sh/ruff#13510)) - Parenthesize overlong `if` guards in `match..case` clauses ([#13513](astral-sh/ruff#13513)) - Detect basic wildcard imports in `ruff analyze graph` ([#13486](astral-sh/ruff#13486)) - \[`pylint`\] Implement `boolean-chained-comparison` (`R1716`) ([#13435](astral-sh/ruff#13435)) ### Rule changes - \[`lake8-simplify`\] Detect `SIM910` when using variadic keyword arguments, i.e., `**kwargs` ([#13503](astral-sh/ruff#13503)) - \[`pyupgrade`\] Avoid false negatives with non-reference shadowed bindings of loop variables (`UP028`) ([#13504](astral-sh/ruff#13504)) ### Bug fixes - Detect tuples bound to variadic positional arguments i.e. `*args` ([#13512](astral-sh/ruff#13512)) - Exit gracefully on broken pipe errors ([#13485](astral-sh/ruff#13485)) - Avoid panic when analyze graph hits broken pipe ([#13484](astral-sh/ruff#13484)) ### Performance - Reuse `BTreeSets` in module resolver ([#13440](astral-sh/ruff#13440)) - Skip traversal for non-compound statements ([#13441](astral-sh/ruff#13441)) ## 0.6.7 ### Preview features - Add Python version support to ruff analyze CLI ([#13426](astral-sh/ruff#13426)) - Add `exclude` support to `ruff analyze` ([#13425](astral-sh/ruff#13425)) - Fix parentheses around return type annotations ([#13381](astral-sh/ruff#13381)) ### Rule changes - \[`pycodestyle`\] Fix: Don't autofix if the first line ends in a question mark? (D400) ([#13399](astral-sh/ruff#13399)) ### Bug fixes - Respect `lint.exclude` in ruff check `--add-noqa` ([#13427](astral-sh/ruff#13427)) ### Performance - Avoid tracking module resolver files in Salsa ([#13437](astral-sh/ruff#13437)) - Use `forget` for module resolver database ([#13438](astral-sh/ruff#13438)) ## 0.6.6 ### Preview features - \[`refurb`\] Skip `slice-to-remove-prefix-or-suffix` (`FURB188`) when non-trivial slice steps are present ([#13405](astral-sh/ruff#13405)) - Add a subcommand to generate dependency graphs ([#13402](astral-sh/ruff#13402)) ### Formatter - Fix placement of inline parameter comments ([#13379](astral-sh/ruff#13379)) ### Server - Fix off-by one error in the `LineIndex::offset` calculation ([#13407](astral-sh/ruff#13407)) ### Bug fixes - \[`fastapi`\] Respect FastAPI aliases in route definitions ([#13394](astral-sh/ruff#13394)) - \[`pydocstyle`\] Respect word boundaries when detecting function signature in docs ([#13388](astral-sh/ruff#13388)) ### Documentation - Add backlinks to rule overview linter ([#13368](astral-sh/ruff#13368)) - Fix documentation for editor vim plugin ALE ([#13348](astral-sh/ruff#13348)) - Fix rendering of `FURB188` docs ([#13406](astral-sh/ruff#13406))

charliermarsh requested review from carljm, MichaReiser and AlexWaygood as code owners September 18, 2024 23:46

charliermarsh force-pushed the charlie/import-map branch 6 times, most recently from 7cc8f9f to 7b94f03 Compare September 19, 2024 00:24

charliermarsh force-pushed the charlie/import-map branch 4 times, most recently from 9e912c8 to ccde874 Compare September 19, 2024 00:58

AlexWaygood reviewed Sep 19, 2024

View reviewed changes

crates/ruff_graph/src/collector.rs Outdated Show resolved Hide resolved

crates/ruff_graph/src/resolver.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Sep 19, 2024

View reviewed changes

crates/ruff/src/commands/graph.rs Outdated Show resolved Hide resolved

charliermarsh added 7 commits September 19, 2024 14:57

Add scaffolding

dce56cb

Remove tracing

70401fc

Use a shared database

7dda81e

Add dynamic import detection

963cfa6

Rename import-map to graph

a9477c1

Make it a subcommand

e996daa

Review feedback

94af7e2

charliermarsh force-pushed the charlie/import-map branch 3 times, most recently from b4626e1 to 2de42c5 Compare September 19, 2024 22:33

carljm approved these changes Sep 19, 2024

View reviewed changes

Gate behind preview; rename to analyze

0c9df89

charliermarsh force-pushed the charlie/import-map branch 2 times, most recently from 6adc038 to a3914bb Compare September 20, 2024 00:22

charliermarsh added 2 commits September 19, 2024 20:24

Add some basic tests

43a52cb

Write relative to CWD

c7ba18b

charliermarsh force-pushed the charlie/import-map branch from a3914bb to c7ba18b Compare September 20, 2024 00:34

Use a single db

1c7e262

charliermarsh force-pushed the charlie/import-map branch from e00731e to 1c7e262 Compare September 20, 2024 00:59

charliermarsh merged commit 4e935f7 into main Sep 20, 2024
19 of 20 checks passed

charliermarsh deleted the charlie/import-map branch September 20, 2024 01:06

BrewTestBot mentioned this pull request Sep 20, 2024

ruff 0.6.6 Homebrew/homebrew-core#191304

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a subcommand to generate dependency graphs #13402

Add a subcommand to generate dependency graphs #13402

charliermarsh commented Sep 18, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

AlexWaygood left a comment

charliermarsh commented Sep 19, 2024

MichaReiser left a comment

MichaReiser Sep 19, 2024

AlexWaygood Sep 19, 2024

charliermarsh Sep 19, 2024 •

edited

Loading

charliermarsh Sep 19, 2024

AlexWaygood Sep 19, 2024 •

edited

Loading

MichaReiser Sep 19, 2024

carljm Sep 19, 2024 •

edited

Loading

AlexWaygood Sep 19, 2024

carljm Sep 19, 2024 •

edited

Loading

MichaReiser Sep 19, 2024

carljm left a comment

carljm Sep 19, 2024 •

edited

Loading

carljm Sep 19, 2024 •

edited

Loading

	#[must_use]
	pub fn snapshot(&self) -> Self {
	Self {
	workspace: self.workspace,
	storage: self.storage.clone(),
	files: self.files.snapshot(),
	system: Arc::clone(&self.system),
	}
	}

	rayon::scope(move \|scope\| {
	for file in &files {
	let result = inner_result.clone();
	let db = db.snapshot();
	let workspace_span = workspace_span.clone();

	scope.spawn(move \|_\| {
	let check_file_span = tracing::debug_span!(parent: &workspace_span, "check_file", file=%file.path(&db));
	let _entered = check_file_span.entered();

	let file_diagnostics = check_file(&db, file);
	result.lock().unwrap().extend(file_diagnostics);
	});
	}
	});

Add a subcommand to generate dependency graphs #13402

Add a subcommand to generate dependency graphs #13402

Conversation

charliermarsh commented Sep 18, 2024 • edited Loading

Summary

github-actions bot commented Sep 19, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

AlexWaygood left a comment

Choose a reason for hiding this comment

charliermarsh commented Sep 19, 2024

MichaReiser left a comment

Choose a reason for hiding this comment

MichaReiser Sep 19, 2024

Choose a reason for hiding this comment

AlexWaygood Sep 19, 2024

Choose a reason for hiding this comment

charliermarsh Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

charliermarsh Sep 19, 2024

Choose a reason for hiding this comment

AlexWaygood Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

MichaReiser Sep 19, 2024

Choose a reason for hiding this comment

carljm Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

AlexWaygood Sep 19, 2024

Choose a reason for hiding this comment

carljm Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

MichaReiser Sep 19, 2024

Choose a reason for hiding this comment

carljm left a comment

Choose a reason for hiding this comment

carljm Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

carljm Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

charliermarsh commented Sep 18, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

`ruff-ecosystem` results

charliermarsh Sep 19, 2024 •

edited

Loading

AlexWaygood Sep 19, 2024 •

edited

Loading

carljm Sep 19, 2024 •

edited

Loading

carljm Sep 19, 2024 •

edited

Loading

carljm Sep 19, 2024 •

edited

Loading

carljm Sep 19, 2024 •

edited

Loading