-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[ty] Implement global handling and load-before-global-declaration syntax error
#17637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
Thanks for working on this! My preference for red-knot would be to avoid the double AST visit if we can -- and I think we fairly easily can. I admit I'm a little surprised to see that it has 0% perf impact in the benchmark. I do wonder if this result would hold up on much larger projects? In particular, semantic indexing is a cost we pay on every file, even third-party files we aren't checking, and this could make a difference when checking a project with lots of big third-party dependencies. I think the way I would do this in red-knot, without the double visit, is for semantic indexing to do nothing more than record which names are marked global in each scope, and handle the rest of it in type checking. When checking a scope we'd again record which symbols we've seen marked as global when we hit the global statement, and on every |
Thanks for taking a look!
I was definitely surprised by this too. I haven't tried a larger project, but it does sound like that would cause problems.
That approach definitely makes sense to me for red-knot. However, if I'm following correctly, I think this might conflict a bit with how the semantic syntax errors are currently detected in the semantic indexing phase. I may need to move the embedded |
I was assuming the latter. Is there any need for all semantic syntax errors to be emitted from |
Not strictly, but the visitor assumes that at least, the entire function body is visited in one go. I think that would also prevent us from moving this to type inference because we aren't visiting the body eagerly (and always in chunks without preserving any state in between) |
|
Could we instead record all global usages in Alternatively. I would also be okay if that one check would be implemented separately between Red Knot and Ruff. It isn't a complicated check either way. |
This sounds promising, I'll give it a try!
Yeah I don't really mind implementing it separately either. I think we might run into something similar for a couple of other errors (nonlocal-and-global (PLE0115) comes to mind, but I think there's at least one more |
Summary -- This is a very rough draft of `global` handling that literally copies and pastes the `ruff_python_semantic::globals` module into `red_knot_python_semantic` and wires it into the `SemanticIndexBuilder`. If we actually want to keep this implementation, I think we should pull the `globals` module out into some other crate that can be used by both `ruff_python_semantic` and `red_knot_python-semantic`. I think the biggest downside of this approach, from talking with @MichaReiser, is that we visit classes, functions, and modules twice: once in `Globals::from_body` and again in the normal visitor. I don't think there's any easy way around this because we need to know if there are any relevant `global` statements in the current scope when visiting a given `Expr::Name`. For correct usage, I think the `global` statement will always come first, but to detect the semantic error, we specifically want the instances where this is not the case. Part of the motivation for opening such a rough draft is to see the benchmark results from codspeed. Test Plan -- Updated all of the related mdtests to remove the TODOs (and add quotes I forgot on the messages). This PR resolves all of the semantic syntax error TODOs and none of the typing-related TODOs.
Co-authored-by: Alex Waygood <[email protected]>
ffcd7cd to
6e6422d
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
The primer report here looks fantastic, and makes me think we should try to get this into the alpha if we can! |
global handlingglobal handling and load-before-global-declaration syntax error
MichaReiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
| /// Map from the file-local [`FileScopeId`] to the [`Globals`] it contains. | ||
| globals_by_scope: FxHashMap<FileScopeId, Globals>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea here for using a hash map that only very few scopes have globals (it's not a densly populated array), which is why it doesn't make sense to use an IndexVec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
|
I think you could do something like this to implement the changes @MichaReiser's suggesting (which I agree make sense -- a Suggested patchdiff --git a/crates/ty_python_semantic/src/semantic_index.rs b/crates/ty_python_semantic/src/semantic_index.rs
index 584064564c..7978ee189d 100644
--- a/crates/ty_python_semantic/src/semantic_index.rs
+++ b/crates/ty_python_semantic/src/semantic_index.rs
@@ -43,8 +43,6 @@ pub(crate) use self::use_def::{
type SymbolMap = hashbrown::HashMap<ScopedSymbolId, (), FxBuildHasher>;
-type Globals = FxHashSet<ruff_python_ast::name::Name>;
-
/// Returns the semantic index for `file`.
///
/// Prefer using [`symbol_table`] when working with symbols from a single scope.
@@ -179,7 +177,7 @@ pub(crate) struct SemanticIndex<'db> {
scope_ids_by_scope: IndexVec<FileScopeId, ScopeId<'db>>,
/// Map from the file-local [`FileScopeId`] to the [`Globals`] it contains.
- globals_by_scope: FxHashMap<FileScopeId, Globals>,
+ globals_by_scope: FxHashMap<FileScopeId, FxHashSet<ScopedSymbolId>>,
/// Use-def map for each scope in this file.
use_def_maps: IndexVec<FileScopeId, Arc<UseDefMap<'db>>>,
@@ -260,8 +258,14 @@ impl<'db> SemanticIndex<'db> {
self.scope_ids_by_scope.iter().copied()
}
- pub(crate) fn globals_by_scope(&self, scope_id: FileScopeId) -> Option<&Globals> {
- self.globals_by_scope.get(&scope_id)
+ pub(crate) fn symbol_is_global_in_scope(
+ &self,
+ symbol: ScopedSymbolId,
+ scope: FileScopeId,
+ ) -> bool {
+ self.globals_by_scope
+ .get(&scope)
+ .is_some_and(|globals| globals.contains(&symbol))
}
/// Returns the id of the parent scope.
diff --git a/crates/ty_python_semantic/src/semantic_index/builder.rs b/crates/ty_python_semantic/src/semantic_index/builder.rs
index aa0dee089c..39e94fc98a 100644
--- a/crates/ty_python_semantic/src/semantic_index/builder.rs
+++ b/crates/ty_python_semantic/src/semantic_index/builder.rs
@@ -51,8 +51,6 @@ use crate::semantic_index::SemanticIndex;
use crate::unpack::{Unpack, UnpackKind, UnpackPosition, UnpackValue};
use crate::{Db, Program};
-use super::Globals;
-
mod except_handlers;
#[derive(Clone, Debug, Default)]
@@ -108,7 +106,7 @@ pub(super) struct SemanticIndexBuilder<'db> {
use_def_maps: IndexVec<FileScopeId, UseDefMapBuilder<'db>>,
scopes_by_node: FxHashMap<NodeWithScopeKey, FileScopeId>,
scopes_by_expression: FxHashMap<ExpressionNodeKey, FileScopeId>,
- globals_by_scope: FxHashMap<FileScopeId, Globals>,
+ globals_by_scope: FxHashMap<FileScopeId, FxHashSet<ScopedSymbolId>>,
definitions_by_node: FxHashMap<DefinitionNodeKey, Definitions<'db>>,
expressions_by_node: FxHashMap<ExpressionNodeKey, Expression<'db>>,
imported_modules: FxHashSet<ModuleName>,
@@ -1089,6 +1087,7 @@ impl<'db> SemanticIndexBuilder<'db> {
self.scopes_by_node.shrink_to_fit();
self.generator_functions.shrink_to_fit();
self.eager_snapshots.shrink_to_fit();
+ self.globals_by_scope.shrink_to_fit();
SemanticIndex {
symbol_tables,
@@ -1904,12 +1903,11 @@ where
self.mark_unreachable();
}
ast::Stmt::Global(ast::StmtGlobal { range: _, names }) => {
- let enclosing_scope_info = self.current_scope_info();
- let enclosing_scope_id = enclosing_scope_info.file_scope_id;
- let enclosing_symbol_table = &self.symbol_tables[enclosing_scope_id];
for name in names {
- let bound = enclosing_symbol_table.symbol_id_by_name(name).is_some();
- if bound {
+ let symbol_id = self.add_symbol(name.id.clone());
+ let symbol_table = self.current_symbol_table();
+ let symbol = symbol_table.symbol(symbol_id);
+ if symbol.is_bound() || symbol.is_declared() || symbol.is_used() {
self.report_semantic_error(SemanticSyntaxError {
kind: SemanticSyntaxErrorKind::LoadBeforeGlobalDeclaration {
name: name.to_string(),
@@ -1923,7 +1921,16 @@ where
self.globals_by_scope
.entry(scope_id)
.or_default()
- .insert(name.id.clone());
+ .insert(symbol_id);
+ }
+ walk_stmt(self, stmt);
+ }
+ ast::Stmt::Delete(ast::StmtDelete { targets, range: _ }) => {
+ for target in targets {
+ if let ast::Expr::Name(ast::ExprName { id, .. }) = target {
+ let symbol_id = self.add_symbol(id.clone());
+ self.current_symbol_table().mark_symbol_used(symbol_id);
+ }
}
walk_stmt(self, stmt);
}
diff --git a/crates/ty_python_semantic/src/types/infer.rs b/crates/ty_python_semantic/src/types/infer.rs
index 21b7699691..490ade86e6 100644
--- a/crates/ty_python_semantic/src/types/infer.rs
+++ b/crates/ty_python_semantic/src/types/infer.rs
@@ -1373,17 +1373,18 @@ impl<'db> TypeInferenceBuilder<'db> {
let file_scope_id = binding.file_scope(self.db());
let symbol_table = self.index.symbol_table(file_scope_id);
- let symbol_name = symbol_table.symbol(binding.symbol(self.db())).name();
let use_def = self.index.use_def_map(file_scope_id);
let mut bound_ty = ty;
+ let symbol_id = binding.symbol(self.db());
- let is_global = self
- .index
- .globals_by_scope(file_scope_id)
- .is_some_and(|globals| globals.contains(symbol_name));
+ let skip_non_global_scopes = !file_scope_id.is_global()
+ && self
+ .index
+ .symbol_is_global_in_scope(symbol_id, file_scope_id);
let global_use_def_map = self.index.use_def_map(FileScopeId::global());
- let declarations = if is_global && !file_scope_id.is_global() {
+ let declarations = if skip_non_global_scopes {
+ let symbol_name = symbol_table.symbol(symbol_id).name();
match self
.index
.symbol_table(FileScopeId::global())
@@ -5211,12 +5212,15 @@ impl<'db> TypeInferenceBuilder<'db> {
let current_file = self.file();
- let is_global = self
- .index
- .globals_by_scope(file_scope_id)
- .is_some_and(|globals| globals.contains(symbol_name));
+ let skip_non_global_scopes = !file_scope_id.is_global()
+ && symbol_table
+ .symbol_id_by_name(symbol_name)
+ .is_some_and(|symbol_id| {
+ self.index
+ .symbol_is_global_in_scope(symbol_id, file_scope_id)
+ });
- if is_global && !file_scope_id.is_global() {
+ if skip_non_global_scopes {
return symbol(
db,
FileScopeId::global().to_scope_id(db, current_file), |
Co-authored-by: Alex Waygood <[email protected]>
AlexWaygood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thank you! I'd love for @carljm to take a quick look too though
Co-authored-by: Alex Waygood <[email protected]>
carljm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! A few minor comments if you feel inspired, but this is good to go. Thank you!!
| reveal_type(x) # revealed: Literal["56"] | ||
| ``` | ||
|
|
||
| ```py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to give this a separate heading. Two unnamed code blocks under a single heading are merged into a single file for checking. Seems to be ok here at the moment, but this can lead to future confusion when we're dealing with global scope. E.g. at the moment the first test is only showing Literal[42] because the second test also assigns x = 42 -- if the second test assigned something different to x, the first test would see that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, thanks! This was a good excuse to come up with better headings anyway.
| print(x) # error: [unresolved-reference] "Name `x` used when not defined" | ||
| x = "56" | ||
| print(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason to use print rather than reveal_type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, just avoiding two comments on the first one haha I've updated it!
| # error: [invalid-assignment] "Object of type `Literal[""]` is not assignable to `int`" | ||
| z = "" | ||
|
|
||
| z: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a set of cases that we aren't testing, which is when we mark a name global in a nested scope, and then declare that name with a new type in that scope, e.g.
x: int = 1
def f():
global x
x: str = "foo"We should definitely error on that assignment, since it's an assignment to the global with the wrong type. I think we should probably error any time a global name is declared with a type in the local scope, even if the type is the same -- that kind of local re-declaration of a global gives the wrong impression and doesn't make sense.
It's OK if this is a TODO in this PR, but we should at least add a TODO for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's actually a (currently unimplemented?) syntax error to provide a local-scope annotation for a name declared global. So I don't think we should emit an error about it in type inference — that would lead to unnecessary double diagnostics.
We should implement the syntax error (either here or in a followup PR) though! And we should add tests now.
>>> x: int = 1
>>> def f():
... global x
... x: str = "foo"
File "<string>", line 3
SyntaxError: annotated name 'x' can't be globalThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, didn't realize that was a syntax error -- even if the declaration comes after the global statement. Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to the global mdtest as a TODO and added the syntax error to #17412!
| .index | ||
| .symbol_is_global_in_scope(symbol_id, file_scope_id); | ||
|
|
||
| let global_use_def_map = self.index.use_def_map(FileScopeId::global()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can push this down into the only branch where it's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into some "borrowed value does not live long enough" errors when trying to move this into the if or match. Happy to try again if I'm missing something here, though. I think the declarations binding just below this is storing a reference to global_use_def_map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to make it a little lazier, e.g.
diff --git a/crates/ty_python_semantic/src/types/infer.rs b/crates/ty_python_semantic/src/types/infer.rs
index 490ade86e6..0c3a9efc5c 100644
--- a/crates/ty_python_semantic/src/types/infer.rs
+++ b/crates/ty_python_semantic/src/types/infer.rs
@@ -1382,15 +1382,17 @@ impl<'db> TypeInferenceBuilder<'db> {
.index
.symbol_is_global_in_scope(symbol_id, file_scope_id);
- let global_use_def_map = self.index.use_def_map(FileScopeId::global());
- let declarations = if skip_non_global_scopes {
+ let global_use_def_map =
+ skip_non_global_scopes.then(|| self.index.use_def_map(FileScopeId::global()));
+
+ let declarations = if let Some(global_use_def) = &global_use_def_map {
let symbol_name = symbol_table.symbol(symbol_id).name();
match self
.index
.symbol_table(FileScopeId::global())
.symbol_id_by_name(symbol_name)
{
- Some(id) => global_use_def_map.public_declarations(id),
+ Some(id) => global_use_def.public_declarations(id),
// This case is a syntax error (load before global declaration) but ignore that here
None => use_def.declarations_at_binding(binding),There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, it's just one hashmap lookup
| let skip_non_global_scopes = !file_scope_id.is_global() | ||
| && symbol_table | ||
| .symbol_id_by_name(symbol_name) | ||
| .is_some_and(|symbol_id| { | ||
| self.index | ||
| .symbol_is_global_in_scope(symbol_id, file_scope_id) | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks similar to something we do above in add_binding; we could extract a method?
|
Thank you all for the very helpful reviews (and the help implementing this)! I'll merge this later today in case anyone wants one more look at the recent small changes. |
carljm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
| .index | ||
| .symbol_is_global_in_scope(symbol_id, file_scope_id); | ||
|
|
||
| let global_use_def_map = self.index.use_def_map(FileScopeId::global()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, it's just one hashmap lookup
* main: [ty] Respect the gradual guarantee when reporting errors in resolving MROs (#17962) Convert `Message::SyntaxError` to use `Diagnostic` internally (#17784) [ty] Support extending `__all__` with a literal tuple or set as well as a literal list (#17948) [ty] Make `unused-ignore-comment` disabled by default for now (#17955) [ty] Change default severity for `unbound-reference` to `error` (#17936) [ty] Ignore `possibly-unresolved-reference` by default (#17934) [ty] Default to latest supported python version (#17938) [ty] Generate and add rules table (#17953) Update the schemastore script to match changes in ty (#17952) [ty] Implement `global` handling and `load-before-global-declaration` syntax error (#17637)
* dcreager/default-typevars: clean up the diff remove trait track in type again clippy Better expansion of default typevars specialize_partial enum for TypeMapping [ty] Respect the gradual guarantee when reporting errors in resolving MROs (#17962) Specialize trait Convert `Message::SyntaxError` to use `Diagnostic` internally (#17784) [ty] Support extending `__all__` with a literal tuple or set as well as a literal list (#17948) [ty] Make `unused-ignore-comment` disabled by default for now (#17955) [ty] Change default severity for `unbound-reference` to `error` (#17936) [ty] Ignore `possibly-unresolved-reference` by default (#17934) [ty] Default to latest supported python version (#17938) [ty] Generate and add rules table (#17953) Update the schemastore script to match changes in ty (#17952) [ty] Implement `global` handling and `load-before-global-declaration` syntax error (#17637)
Summary
This PR resolves both the typing-related and syntax error TODOs added in #17563 by tracking a set of
globalbindings for each scope. As discussed below, we avoid the additional AST traversal from ruff by collectingNames fromglobalstatements while building the semantic index and emit a syntax error if theNameis already bound in the current scope at the point of theglobalstatement. This has the downside of separating the error from theSemanticSyntaxChecker, but I plan to explore using this approach in theSemanticSyntaxCheckeritself as a follow-up. It seems like this may be a better approach for ruff as well.Test Plan
Updated all of the related mdtests to remove the TODOs (and add quotes I forgot on the messages).
There is one remaining TODO, but it requires
nonlocalsupport, which isn't even incorporated into theSemanticSyntaxCheckeryet.