-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ty] Add option for ty to find modules in paths defined in PYTHONPATH #20016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add the option for looking up modules from paths in the PYTHONPATH environment variable.
|
Thanks for your work on this and also creating a documentation PR. I'm not yet convinced that we want to read PYTHONPATH but I'm curious to hear what other thinks. But I also must admit that I don't fully understand the use case. I'd prefer if we could take a step back and start with an issue where we discuss the problem, what other type checkers do, and then decide what we want for ty. E.g. an alternative is that ty supports |
This feature was previously requested at astral-sh/ty#547, and the person who filed that issue was kind enough to do a writeup of what all other existing major type checkers do here |
|
I posted some questions in the issue: astral-sh/ty#547 (comment) |
|
Thanks for your work on this! A couple quick notes specifically regarding this implementation, based only on the PR description (haven't had a chance to dig into the code changes):
|
|
That you for you attention on this. In retrospect, I should have written a bit more about our use case as a motivation, but I was rushing a bit to get this filled opened before going to bed and should have taken a bit more time. I also apologize, there are comments here and over in the ty repo as well. I will try to answer each thread in the respective repo, but I imagine there will be some overlap. Our software stack is divided into roughly 100 packages of mixed c++ and python code. When we make a stack version, each package has a snapshot taken, and it put into our packaging and distribution framework. On install. each package version is installed along side other versions of the package. To then make use of the stack, there is software which sets the appropriate LD_LIBRARY_PATH, PATH, PYTHONPATH, etc environment variables to make the version you selected active in your current shell session. We have various requirements that we be able to reproduce data processings with various old software version for scientific reproducibility reasons (and not all our users update at the same frequency). These software stacks can be used on a personal laptop, but are also deployed in shared environments in our high performance compute centers, with different users being on different version at the same time. PYTHONPATH is perfect for managing this dynamic per shell setup, exactly like dynamic linkers for our c++ code. The development workflow for building features or fixing bugs is then to clone the packages which need changes, and set them up. This then replaces the standard version installs of the packages with the git versions by changing the respective environment variables such that we can make use of the already compiled shared packages at a certain version for most packages and only need to have git versions for what we are currently developing. This also extends to scenarios where you are working on a ticket branch in packages A and B, sitting on top of others branches in C and D, and using the versioned stack install for the rest. All this work eventually comes back in line when we merge to the upstream package, but during development it is nice for the type checker to be able to see the signatures of the other packages that are also in development, but are not in the tree of the current package. There are of course workarounds by writing out files to manage these paths, but this means that the files will be frequently modified to reflect the modules that python will actually be reading when it executes, and it raises the likelihood that a file might not get updated when it should be and wrong signatures are used. As for the other path change, the motivation is similar, though I think this too will affect others in various ways. When we develop our packages, each one is put under a common python module scope i.e. Under the current behavior of ty, as it searches through the list of search paths (say packageA path first) it will encounter I outlined this using our I don't have a strong feeling that it must be attached to the site-package implementation, it just seemed like a convenient fit as it is paths that python itself discovers and reads. As you say though resolution order is important, and I see your point that python will read PYTHONPATH before it reads site-packages, and so getting the same resolution behavior in the type checker is probably important as well. I really apologize about the wall of text. I tried to be as informative as I could be to motivate my thinking on this PR, but not overwhelm with unimportant details. I hope I struck the right balance. I'm happy discuss more or make any changes you might see fit. |
|
The scheme you are describing with But we definitely already support this behavior of namespace packages, so I'm still not sure why any change was needed in this PR. Probably this requires more investigation. If a problem can be reproduced outside this PR (e.g. with our existing (Edited to add: here is a playground example showing that we already support this "namespace package" layout in extra-paths.) |
|
@carljm yes, you are right that indeed that is akin to namespace packaging. I should have been a bit more explicit in mentioning we do in fact use I can completely respect not wanting to support this legacy style of cross package linking in your codebase, though the overhead is a single if statement, every bit of code is more behavior. I would of course appreciate support for that behavior, don't get me wrong. This discussion might help me in pushing for more modernization of the import system. |
|
@natelust my concern here isn't primarily code complexity, it's correctness; we shouldn't consider packages to be importable that aren't actually importable at runtime. As far as I'm aware, our current behavior is correct (matches runtime). What confuses me here is that you say you do have |
|
Ok, I understand where you are coming from now. You are correct that there is an additional hook in the As found in the python docs This was the common work around in init files to get this behavior pre-namespaces, though it is still used in various cases. As to your question, is this something you should recognize. I kind of see a few options:
|
|
I think we will not go for the third option; that opens up too many possibilities of giving incorrect results in (the vast majority of) cases which do not use You're welcome to open a separate issue for |
|
Though that makes this PR less useful for my immediate use, it is quite reasonable. My next actions will be:
Is there anything else I should consider when re-working this? |
|
I think the spec for It seems like @AlexWaygood and I are agreed that we should do this |
|
Thanks for the feedback I will work on that then, it should be a quick switch. As far as the extend_path functionality its clear to only run that additional check in the one mach arm it will pertain to, which should limit any io overhead. I can think of a few options off the top of my head for detection
If you have opinions I am happy to follow what you think here. Otherwise I will go mock up some of those and see what seems to be the best fit. |
|
It should be the third option, and use the existing |
|
👍 thanks for the input again |
|
The main challenge that I see with supporting astral-sh/ty#838 outlines an alternative: Allow users to manually specify some packages that should be treated as namespace packages even if there's an |
|
Does airflow use
I don't think this is true. |
|
Yes, airflow uses This sounds worth exploring if we can support it lazily. But only if it comes at a minimal overhead (this is legacy after all). I suggest that we first do a simple text search for |
|
I'm pretty sure if we are looking for I don't think supporting this is high priority, but if we do support it I think a solution that "just works" when it works at runtime is much, much better for users than a solution that requires users to manually repeat their codebase structure in config (and to first even understand that's what they need to do, which is likely the biggest hurdle.) So I don't think we should even consider a settings-based approach except as a last resort if we can't make anything else work. |
I've found this PR trying to find out how to use |
|
I don't see any use cases for ignoring |
|
I think we are agreed on respecting |
|
Hey all, sorry I had an emergency come up recently and was away from things for a little while. I should have all the above points addressed fairly soon and update the pr. Again sorry for the delay. |
|
I've done the requested modifications and opened up #20441, based off of this code/PR. |
|
Thank you for working on this. I'll close this PR because |
Fixes astral-sh/ty#547. Add the option for looking up modules from paths in the PYTHONPATH environment variable.
Summary
I work on the rubin observatory data management team. Due to a long legacy of python use and our nature of development and deployment of code we have a setup where much of our code is in different packages which are dynamically setup and loaded through use of the PYTHONPATH environment variable. Though many projects make use of conda envs or venvs, we are likely far from the only user of this python feature. However, at the moment this makes ty (and several other type checkers) unable to check a large majority of our cross package code.
I have implemented a patch that adds a configuration option which when set will cause ty to read the paths defined in PYTHONPATH. These paths get added to the internal site-packages Vec, as that is what they logically are from the perspective of python. The only accompanying change is when looking up modules on all the search paths, I deferred failure until after all paths were checked. This is to account for a situation like:
So instead of the checker finding the stub
top_modulein path1 when looking uppath2_module.pyand failing to find the module, it will wait until all search paths are checked.I am also opening an accompanying pr on the ty repo to update the documentation on the new configuration option.
Test Plan
I built ty with these changes and tested it operationally, and it is really nice how fast it is able to check everything. If more extensive tests are needed at this point I may need a small pointer on where best this test should go.