Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: major LOAD_PATH machinery simplification #27633

Merged
merged 2 commits into from
Jun 19, 2018
Merged

Conversation

StefanKarpinski
Copy link
Member

@StefanKarpinski StefanKarpinski commented Jun 18, 2018

JuliaLang/Pkg.jl#360 inspired me to look at direnv and think that we should consider not doing anything dynamic with
the load path based on the current environment and instead let that be done with external tools like direnv which specialize in that sort of thing.

Among other changes, in order to facilitate that, this PR expands empty JULIA_LOAD_PATH entries by replacing the first one with DEFAULT_LOAD_PATH and ignoring the subsequent ones. This means that in a shell script one can do this:

export JULIA_LOAD_PATH="$(pwd):$JULIA_LOAD_PATH"

The effect of this will to prepend the current working directory to the LOAD_PATH regardless of whether JULIA_LOAD_PATH is set or not.

This PR also makes @stdlib a syntax for the standard library path, which is handy since otherwise specifying it is annoying. As part of that change, it introduces Sys.STDLIB — remembering the incanation to find the stdlib directory was kind of a drag.

Finally, this PR defers all dynamic expansion of LOAD_PATH entries until lookup time. These dynamic behavior include:

  1. Expanding @stdlib to Sys.STDLIB.
  2. Expanding @v#.# to @v0.7 for example; more generally this is replacing the first three # in a named env with components of the Julia version number (i.e. VERSION.major, etc.).
  3. Finding named environments in DEPOT_PATH.
  4. Appending Project.toml to project directories.

@staticfloat
Copy link
Member

If I'm understanding this correctly, this means that $(pwd) is no longer on the default load path. This means that I now have to run export JULIA_LOAD_PATH="$(pwd):$JULIA_LOAD_PATH" every time I want to e.g. test the package I'm developing on, or I have to use something like direnv to do it for me?

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Jun 18, 2018

The current directory isn't in the default LOAD_PATH in 0.6 (or earlier) either, so that's not exactly a regression in terms of functionality.

@staticfloat
Copy link
Member

The current directory isn't in the default LOAD_PATH in 0.6 or earlier either,

That's because the workflow of 0.6 is to put everything into a globally-accessible environment (~/.julia/v0.6). With 0.7, it's been my experience that I am usually switching between 2-3 environments at a time while working on a project. For example:

  • ~/.julia/dev/BinaryProvider while hacking on BinaryProvider itself, where I want to run e.g. pkg> test to run the BinaryProvider tests with respect to its own Project and Manifest files
  • ~/.julia/dev/BinaryBuilder, where I'm updating BinaryBuilder's code that depends on BinaryProvider and doing the same kind of repo-local testing
  • ~/.julia/v0.7 where I use packages in the default globally-accessible environment

Right now this is easy because julia will automatically detect that there's a Project.toml in those directories when I start Julia up. I realize that I could automate the same kind of thing with bash scripts, but that seems a bit like telling someone that projects shouldn't include a make install step because if users want to, they can just copy files into /usr/local or other locations if they so wish. That's true, but in my experience, the "use current directory by default" ergonomics feel pretty good, and it's what I want more often than it isn't. Perhaps others disagree, I'd like to hear how your package development workflows feel.

I think most of the changes in this PR are good; I like the shorthand you've introduced into LOAD_PATH. I think we should default to the "clever" behavior and allow for easy opting out into simpler behaviors. To that end, I submit that we should:

  • Add $(pwd) to DEFAULT_LOAD_PATH, so that when you start up Julia it looks in that initial directory, and environments do not auto-switch when you cd around. This keeps things simple, and automates what most people would do with direnv anyway. I'm assuming that directories that don't have a Project.toml file in them get silently ignored.
  • Add Pkg.activate() as an officially-blessed programmatic way of running push!(Base.LOAD_PATH, pwd()) and doing whatever Pkg3 magic is necessary to make Julia feel like this directory was an environment all along. This allows a user who wants to cd ~/.julia/dev/Foo and pkg> activate to get around the fact that they didn't start julia up in the "correct" directory.

@garrison
Copy link
Member

the "use current directory by default" ergonomics feel pretty good

would you put . in your PATH? The ergonomics of this indeed feel pretty good, but there is an important reason it is not recommended.

@staticfloat
Copy link
Member

would you put . in your PATH? The ergonomics of this indeed feel pretty good, but there is an important reason it is not recommended.

I don't think these are quite comparable; this behavior is gated upon the existence of a Project.toml. This is more similar in spirit to having a .direnv file that appends to PATH, in which case I would respond "yes, I do put $(pwd) into my PATH, when there is a .direnv file that tells me to". What I'm arguing is that we should do that within Julia rather than relying on an external tool to do it instead.

I am against having . be in LOAD_PATH; that does seem unnecessarily slippery and can lead to a lot of strange behavior, which is why I suggested we introduce Pkg.activate() to make switching environments explicitly easier. $(pwd) is importantly different than ..

@KristofferC
Copy link
Member

KristofferC commented Jun 18, 2018

I share the concerns with @staticfloat that this might be a bit annoying. Also, it seems a bit odd to swap one file Project.toml to determine if that project should be loaded to another file .envrc which requires some external program to use. What is the gain in that?

Some workflow examples:

  • I want to update the Manifest in docs. Currently, I would cd there and do e.g. ] up. What would the workflow be now?

  • I have a project with a few packages. To one of these packages, I add a dependency. Now I want this to reflect in the Manifest. Currently, I would shell> cd Package, pkg> add Dependency, shell> cd .., pkg> resolve. Say I am on windows and I am not too tech savy with these envdirs. What would the workflow now be?

If we have to tell people to use some external tool to do this quite basic package management functionalities then we have in my opinion sort of failed. The suggestion of using the pwd when starting julia and having a command that updates it seems like a good compromise.

@garrison
Copy link
Member

Any program is going to have security problems if it can execute arbitrary code given the existence of a suitably-named file in the current working directory upon startup. This is the entire reason direnv has a mechanism for making sure such environment settings are desired and approved before any code is executed.

@garrison
Copy link
Member

Let me stick to a more concrete example: my ~/Downloads is essentially completely out of my control; it is not difficult for a web site to have a file downloaded there without me noticing. Yet, I'd still like to be able to safely run julia from a shell that happens to its cwd set to my Downloads directory.

@KristofferC
Copy link
Member

KristofferC commented Jun 18, 2018

What if there needs to be an explicit command to activate the pwd environment?

@garrison
Copy link
Member

What if there needs to be an explicit command to activate the pwd environment?

This should eliminate my concern (though it would be good to think carefully through the details of how this would work with a security mind-set).

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Jun 18, 2018

Edit: Due to an outdated page, I wrote this before seeing most of the above responses but there's still an interesting observation at the end.

The CurrentEnv (and direnv) behavior is more clever than that: it looks for a parent directory with a .git in it or a Project.toml file and uses that as the current directory. We could add an easy syntax for that in JULIA_LOAD_PATH like @ with no name following it and expand it for the user. Then setting export JULIA_LOAD_PATH="@:" would prepend the current env to the default. The problem with that (which direnv addresses) is that it's a potential security risk: if I start julia from inside of some repo that I just cloned or updated, it's an easy attack vector since putting a JSON.jl file in that directory can cause using JSON to execute any code at all. JuliaLang/Pkg.jl#360 went further and proposed pro-actively pre-loading the current environment if there is one, at which point, merely starting julia in a directory would be a huge attack vector. direnv deals with this by requiring to explicitly run direnv allow . for the location and contents of a given .envrc file (i.e. you have to approve a .envrc in a given location but if its contents change you have to reapprove it is my understanding). Of course, direnv has a much worse problem because otherwise simply cding into a directory with a .envrc file can cause arbitrary code to execute. We have to at least run julia and maybe load a package, but still, it’s a concern.

One consideration we have is that if you could trivially load something anyway, then being fussy about automatically adding it to the load path is silly. So, for example, if ~/.julia/dev/Foo is in one of the environments that’s already reachable from our load path, then there’s no harm in just adding it without any further ado.

@StefanKarpinski
Copy link
Member Author

What if I put the CurrentEnv functionality back to make this less contentious? Is the rest of this fine with people? I can get this in the state where the only difference between looking in the current environment and not looking in the current environment is the default load path value.

@staticfloat
Copy link
Member

One consideration we have is that if you could trivially load something anyway, then being fussy about automatically adding it to the load path is silly.

I had also thought of this; "whitelisting" certain directories. In the end, I don't like it, because I want the rules to be dead-simple and easily manipulated by other pieces of code.

We could add an easy syntax for that in JULIA_LOAD_PATH like @ with no name following it and expand it for the user.

Whatever we do, I think having intelligent shorthands for all pieces of functionality here is important, so 👍 from me.

I can get this in the state where the only difference between looking in the current environment and not looking in the current environment is the default load path value.

I think the big missing piece is a programmatic way of toggling that difference at runtime. E.g. my concerns about workflow could be all but eliminated with the existence of Pkg.activate(dir=pwd()) and a ~/.julia.rc that just has a Pkg.activate() within it. I agree that we should not be opening ourselves up to drive-by-code execution in any way, but if we can provide the basic building blocks to make things convenient for users, we can all live happily and securely. (Just not both at the same time)

@StefanKarpinski
Copy link
Member Author

I think the big missing piece is a programmatic way of toggling that difference at runtime.

Does this qualify?

bash$ JULIA_LOAD_PATH="@:" julia

If I add back the @ syntax for current environment, then that's all that will be required. We can have a command-line option for that as well but that's a whole other bikeshed to paint.

E.g. my concerns about workflow could be all but eliminated with the existence of Pkg.activate(dir=pwd()) and a ~/.julia.rc that just has a Pkg.activate() within it.

I'd prefer to avoid getting into the Pkg.activate business: this is now a centralized database of state we need to keep around somewhere and maintained. It's doable but probably somewhat annoying. Moreover, that's exactly the kind of problem that direnv has already solved.

With the @ syntax, you can accomplish this by putting the following in your ~/.bashrc or equivalent:

export JULIA_LOAD_PATH="@:$JULIA_LOAD_PATH"

You can put the exact same thing in your project's .envrc if you're using direnv to make it so that it only happens in certain approved projects. [direnv hooks into your shell to make sure that this is in your environment when you cd into a subdirectory of your project; direnv doesn't need the @ syntax since it knows where the root of the project is.]

@staticfloat
Copy link
Member

I'd prefer to avoid getting into the Pkg.activate business: this is now a centralized database of state we need to keep around somewhere and maintained.

I don't think we're on the same page as to what Pkg.activate() would do; all I would want Pkg.activate() to do would be to modify the LOAD_PATH and whatever other things need to happen for Pkg3 to switch my current environment for the currently running session. Like what we were talking about here. Perhaps all that needs to happen is insert!(LOAD_PATH, 1, pwd()), but I'm just talking about something to do the following:

$ cd ~/.julia/dev/Foo
$ julia
....
(v0.7) pkg> activate
(Foo) pkg> 

So there would be no centralized database. I'm totally in agreement with you that we don't want that. I suspect, however, that what we want is slightly more involved than an insert!(LOAD_PATH, 1, pwd()); we actually want to do something more like run the CurrentEnv() search rules once for the current directory, and add those locations to the current LOAD_PATH immediately.

While the JULIA_LOAD_PATH="@:" syntax is good (as I said above, I like the shorthand in envvars), It doesn't address the key concern from #27411: That we want a way to, at runtime, do an activation/environment switch/load path manipulation. What I mean by that is, if I am in a Julia shell, and I'm already halfway through debugging and suddenly realize that I want to move over into a new environment, I should be able to do that easier than manually mucking around with Base.LOAD_PATH; previously I would do this through cd. If we can make this an explicit Pkg action, I think everyone will be happy. This explicit Pkg action can be used to mimic whatever kind of environment switching the user wants to perform.

I also think we need to clarify really quick; when you say If I add back the @ syntax for current environment, does that mean that we have a CurrentEnv() object that will dynamically change the environment when you cd? So if I set JULIA_LOAD_PATH="@:", start julia, then cd to a remote location, my environment will change? I do not think that's what we want, and I think that's not what was being discussed in #27411. If @ gets immediately canonicalized to the current directory and remains that way forever, then it gives half of what I think we need; we also need a way to do "that" at runtime, as well as at startup.

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Jun 19, 2018

I think I get what you're saying now:

  1. LOAD_PATH would be as it is in this PR: ["@v#.#", "@stdlib"]
  2. Have @ syntax for "current env" but not in the default LOAD_PATH.
  3. Have a pkg> activate command as you describe above, details tbd.

Does that seem right? I think that's a good arrangement. I will do 2 and 3 in separate PRs because we need to figure out the exact semantics of @ and how pkg> activate needs to work.

@staticfloat
Copy link
Member

Yes, that sounds perfect to me.

@staticfloat
Copy link
Member

I admit I'm not certain we want @ to change along with cd; but we can work that out in another issue. I can see pros and cons for both sides there.

@StefanKarpinski
Copy link
Member Author

I admit I'm not certain we want @ to change along with cd;

I'm working on a version where it doesn't, so we can choose.

@StefanKarpinski
Copy link
Member Author

I've got the @ thing working, so next problem...

How should pkg> activate work? There are a few different things that we may want it to control:

  1. The contents of LOAD_PATH: the environment in question could be added to the front or the back of the LOAD_PATH; in the front, it takes precedence over everything else, in the back it has the lowest precedence. However, anything that's already loaded is fixed, so putting a new environment at the front of the load path can lead to an inconsistent state that could not be gotten into by any particular order of the load path. If we push it onto the back of the load path, then the state will be consistent.

  2. What environment pkg> operations operate on. This is currently defined by LOAD_PATH[1] but it doesn't have to be that way. pkg> activate could leave LOAD_PATH untouched but change which environment package operations modify.

  3. Arguments. We've talked about doing shell> cd dir; pkg> activate but why not just support pkg> activate dir and skip the cd part?

Thoughts, @staticfloat? (Or anyone else)

@staticfloat
Copy link
Member

  1. I think the interaction between LOAD_PATH and which "environment" you're operating within should be as simple as possible. I think LOAD_PATH[1] makes a lot of sense, but I think in actuality it should be "first element of LOAD_PATH for which certain requirements are true". (Things like it must contain a Project.toml, etc... it needs to be an actual environment).

  2. I think activate should modify both LOAD_PATH and the "current environment" (as stated above, I believe those two should be inextricably linked). Otherwise we could get weird behaviors like pkg> add Foo followed by using Foo doesn't work, and I don't see any reason why we would want that to be possible.

  3. Agreed; we should just do all of the possibilities. Pkg.activate(dir=pwd())/pkg> activate/pkg> activate <dir>

As for where Pkg.activate() should place things, I think top of the stack is the only thing that makes sense. I wouldn't want to activate, then add Foo and have that get installed into some global directory despite the fact that I just activated. Similarly, if we somehow tracked that we activated and so installed into an environment that is not at the front of the LOAD_PATH, then we could get into the situation where I add Foo, it installs one version into LOAD_PATH[end], then I try to using Foo and I load an older version that happened to be installed within LOAD_PATH[1].

So I think the simplest thing makes sense; the current environment is LOAD_PATH[1], all Pkg operations use that LOAD_PATH, activate pushes things onto the front of LOAD_PATH. Further questions:

  1. Do we allow multiple copies of the same path to be pushed onto LOAD_PATH? I vote yes, no need to be overly clever.

  2. Do we allow for a deactivate? If so, what happens to the modules that have been loaded from that location?

  3. What happens if someone gets clever and says activate @stdlib, then tries to add Foo?

@StefanKarpinski StefanKarpinski force-pushed the sk/loadpath branch 2 times, most recently from 65f0b47 to 8df73eb Compare June 19, 2018 07:38
@KristofferC
Copy link
Member

So I think the simplest thing makes sense; the current environment is LOAD_PATH[1], all Pkg operations use that LOAD_PATH, activate pushes things onto the front of LOAD_PATH.

FWIW, this is pretty much exactly what I had in mind as well.

@StefanKarpinski
Copy link
Member Author

Such sweet sorrow: CI straight flush with merge conflicts.

This makes LOAD_PATH just a vector of strings again. Some
special syntaxes in JULIA_LOAD_PATH are handled specially:

- split on `:` or `;` (on Windows)
- replace the first empty entry with DEFAULT_LOAD_PATH
- ignore the remaining empty entries
- replace `@` with `current_env()`

Other special syntaxes are left alone and expanded during
load path lookup:

- occurrences of `#` in `@...` entries to version digits
- `@name` is looked up in the depot path
- `@` is replaced with `current_env()`

The last functionality is not accessible via JULIA_LOAD_PATH
in this version since `@` in that is expanded early. This does
allow putting a literal `@` in LOAD_PATH to get late expansion
to `current_env()` instead of early expansion.

Fixes #27411
@StefanKarpinski
Copy link
Member Author

macOS failure is #26725

@StefanKarpinski StefanKarpinski merged commit 307f59b into master Jun 19, 2018
@StefanKarpinski StefanKarpinski deleted the sk/loadpath branch June 19, 2018 16:05
@StefanKarpinski
Copy link
Member Author

Good enough CI results given the previous straight flush.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs tests Unit tests are required for this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants