[ty] Override __file__ to str when applicable on imported modules#22333
[ty] Override __file__ to str when applicable on imported modules#22333AlexWaygood merged 14 commits intoastral-sh:mainfrom
__file__ to str when applicable on imported modules#22333Conversation
Typing conformance resultsNo changes detected ✅ |
|
|
Thank you for working on this. There are a few failing tests. Would you mind taking a look at them. |
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
no-matching-overload |
0 | 56 | 0 |
invalid-argument-type |
0 | 40 | 0 |
possibly-missing-attribute |
0 | 5 | 0 |
invalid-return-type |
0 | 1 | 3 |
invalid-assignment |
2 | 0 | 0 |
| Total | 2 | 102 | 3 |
d8133e1 to
a0afc62
Compare
|
Finished reviewing the eco-system impact (see PR description) I think everything looks good, so this should be ready for further review. |
__file__ to str on imported modules__file__ to str when applicable on imported modules
AlexWaygood
left a comment
There was a problem hiding this comment.
Thank you, and sorry for the delayed review!!
This looks pretty good, but the comment needs to be updated and I think we should add a test for namespace packages -- see my comment inline
| // We special-case `__file__` here because we know that for a successfully imported | ||
| // Python module, that hasn't been explicitly overridden it is always a string, | ||
| // even though typeshed says `str | None`. | ||
| // This is not assumed to be the case for stubs though, where we will stick with typeshed |
There was a problem hiding this comment.
This comment is nearly accurate!
The way it works at runtime is that a pure-Python non-namespace-package module will always have __file__ set to a string; a namespace-package module will always have __file__ set to None, and a C extension might not have it set at all:
% uv pip install google-cloud-ndb
Using Python 3.13.8 environment at: test-env
Resolved 22 packages in 647ms
Prepared 9 packages in 1.07s
Installed 21 packages in 92ms
+ certifi==2026.1.4
+ charset-normalizer==3.4.4
+ google-api-core==2.29.0
+ google-auth==2.47.0
+ google-cloud-core==2.5.0
+ google-cloud-datastore==2.23.0
+ google-cloud-ndb==2.4.0
+ googleapis-common-protos==1.72.0
+ grpcio==1.76.0
+ grpcio-status==1.76.0
+ idna==3.11
+ proto-plus==1.27.0
+ protobuf==6.33.4
+ pyasn1==0.6.1
+ pyasn1-modules==0.4.2
+ pymemcache==4.0.0
+ pytz==2025.2
+ redis==6.4.0
+ requests==2.32.5
+ rsa==4.9.1
+ urllib3==2.6.3
(test-env) ~/dev % py
Python 3.13.8 (main, Oct 10 2025, 12:46:18) [Clang 20.1.4 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import google
>>> google.__file__ is None # `google` is a namespace package
True
>>>
>>> import itertools
>>> itertools.__file__ # `itertools` is a C extension
Traceback (most recent call last):
File "<python-input-3>", line 1, in <module>
itertools.__file__
AttributeError: module 'itertools' has no attribute '__file__'. Did you mean: '__name__'?It looks to me like your code is correct for the namespace package case, because you only apply the special case here if module.file(db) is Some(), and module.file(db) will only be None if it's a namespace package. But the comment here needs to be revised, and you could also consider adding an explicit test for the namespace-package case. (Namespace packages are pure-Python packages that have no __init__.py(i) file in them -- they are just directories that contain other Python modules/packages.)
We could also possibly get rid of the exemption for stubs that you have here? The rationale for treating stubs separately is that they might represent C extensions, and C extensions won't necessarily set __file__. But that doesn't affect the type of the __file__ member, it just affects whether the module has the __file__ member to begin with. We could consider emitting a possibly-unresolved-attribute warning about that, but I think it would probably be too noisy: just because a module is represented by a stub file doesn't mean that it definitely is a C extension, it just could be a C extension.
There was a problem hiding this comment.
We could also possibly get rid of the exemption for stubs that you have here?
Thanks for the added context, I mainly took this behaviour steer from an existing tests comment:
ruff/crates/ty_python_semantic/resources/mdtest/scopes/moduletype_attrs.md
Lines 102 to 104 in 853bb00
str (which matches runtime)
There was a problem hiding this comment.
thanks, yeah, that makes sense!
de6dd23 to
dbc2588
Compare
| reveal_type(a.__file__) # revealed: str | ||
| ``` | ||
|
|
||
| ## Namespace package does not have `__file__` |
There was a problem hiding this comment.
Ah -- no, namespace packages do have __file__, but __file__ is set to None for namespace packages. So I guess your logic in crates/ty_python_semantic/src/place.rs isn't quite right. It's C extensions that don't necessarily have __file__ set at all -- but we don't have a great way of knowing whether a module is a C extension or not right now.
There was a problem hiding this comment.
Looking at this more deeply, it seems we don't understand any types.ModuleType attributes as being available on namespace packages right now! Which is something we should fix, but not in this PR. So I'll adjust the comments in this PR and land it.
Thank you!!
There was a problem hiding this comment.
Thanks for landing this @AlexWaygood 😁
Yeah, as I started looking into this there is a lot of places similar to:
let Some(module) = resolve_module(self.db, self.file, &module_name) else {
continue;
};
so only Module::File is being considered and Module::Namespace is skipped.
Seems like a known_namespace_symbol or similar is needed in place.rs to handle a scaled back version of the logic being done on modules 🤔
I might continue poking around in this area and if I make some progress will create an Issue to track the TODO from the mdtest
There was a problem hiding this comment.
Oh, sorry, I'm about to make a PR to fix this up 😶 I should have said!!
There was a problem hiding this comment.
Nice 🚀 no worries still a good opportunity to checkout your PR and learn some more of the dataflows
Summary
Resolves: astral-sh/ty#860
Typeshed defines
__file__asstr | Nonebut whentyhas successfully imported a module we can assumed the modules__file__is str.This is already the case for the
__file__of the local/current file: #18071This expands to handle cases like
from mod.a import __file__ as path_of_aNotes:
This feels like this logic should ideally be centralized and have the current file and imported files operate in the same way (i.e they all pre-populate a set of known symbol overrides) as currently there is overrides for current file and imported files spread across a few parts of
place.rsbut felt like that refactoring is beyond my current experience with the codebase so simple fix only 😆Test Plan
Added new mdtest, not sure if there is an existing test file which this can be merged into instead
Tasks
Failing mdtest which indicates the__file__==strmight not hold up on stubs, need to look into this further and see if this needs to expand the handling.Another failing test for the case where a module overrides__file__(setting toNonein the tests case) which the current is ignoringsysand other similar statically linked std library modules that won't have__file__? pyrefly and pyright both revealstrin this case even though it will be an error at runtime, so probably fineEcosystem review
A bigger impact than I was expecting though most look good
thoughprefectstands out as a bit different to the rest.no-matching-overload- look good these are cases like the original bug where a__file__was getting passed intodirnameand other similar overloadded methods which did not expect to ever receive aNone.invalid-argument-type- similarly these look generally correct, mainly cases where a method was expecting something pathlike and didn't want the| Nonepossibly-missing-attributeon string method calls (split,strip,rstrip, etc.) - generally looks good, these are cases wherestringmethods were unhappy with being called withstr | Nonebut are now working.jaxhas a few that don't appear to match the above cases but looks more like a non-determinism caseprefectthe cases that stand out as potentially problematic are the newinvalid-awaitinvalid-awaitcases have went after now respecting the override definitionmesonbuildcase is interesting as theResolved by respecting the explicit definitionexcept ImportErrorisn't experienced bytybut only whenmesonbuildis run on windows. Not sure it's a case to handle intythough 🤔 as the cases where libraries are writing code in expectation that the import succeeds is much more common.__file__🤔 Ah so I think this is because of a [no-matching-overload] going away at https://github.com/scikit-learn/scikit-learn/blob/fafe37e414cf22448e861371c537fa2c476639b7/sklearn/utils/_testing.py#L926 which meanskwargsis getting a narrower type definition.