Skip to content
This repository has been archived by the owner on Oct 25, 2023. It is now read-only.

fix: do not call git checkout unless necessary #64

Merged
merged 2 commits into from
May 15, 2022

Conversation

gebner
Copy link
Member

@gebner gebner commented May 14, 2022

If the dependency is already checked out at the right revision, then we only call git rev-parse HEAD. In particular, we don't call git checkout --detach $rev, which must not be called from multiple processes in parallel.

See #63.

@tydeu
Copy link
Member

tydeu commented May 14, 2022

The core fix looks fine. However, we do still need to suppress straight piping of git output. As these functions may be run inside a build process that is not suppose to be writing to stderr directly. To properly preserve the output, the log monad would need to trickle down to these utilities.

@gebner
Copy link
Member Author

gebner commented May 14, 2022

As these functions may be run inside a build process that is not suppose to be writing to stderr directly.

I'm not sure I understand what you're saying. We're already printing all kinds of informational messages to stderr (such as the current "trying to update" message). The git commands are also executed sequentially, so we don't need to buffer the output to prevent mixing up messages either. Does writing to stderr break anything? Or are you suggesting to use logError simply for consistency?

The big issue with buffering the git output is that this breaks the progress information. Some packages (like Unicode.lean) take a long time to download, and it's nice to see what's going on.

@tydeu
Copy link
Member

tydeu commented May 14, 2022

I'm not sure I understand what you're saying. We're already printing all kinds of informational messages to stderr (such as the current "trying to update" message).

It shouldn't be. If it is, that is a bug. The "trying to update" message is logged. Whether that log is actually printed to stderr is dependent upon the CLI configuration (and, more generally, the Log monad provided).

The big issue with buffering the git output is that this breaks the progress information. Some packages (like Unicode.lean) take a long time to download, and it's nice to see what's going on.

I think this could be better resolved by having Lake provide is own form of progress information (or maybe parsing Git's?). Furthermore, this will still break in many of Lake's current applications (e.g., print-paths) where the out pipe is not a normal terminal (which is good example of why I think transparent piping is generally bad).

@gebner
Copy link
Member Author

gebner commented May 14, 2022

Whether that log is actually printed to stderr is dependent. upon the CLI configuration

I'm not sure I follow, the resolveDeps function is always called with the eio loggers. There is nothing to configure there, logError always goes to stderr in the git functions.

Furthermore, this will still break in many of Lake's current applications (e.g., print-paths) where the out pipe is not a normal terminal (which is good example of why I think transparent piping is generally bad).

lake print-paths already prints lots of stuff to stderr. How would printing additional messages break anything?

@tydeu
Copy link
Member

tydeu commented May 14, 2022

I'm not sure I follow, the resolveDeps function is always called with the eio loggers. There is nothing to configure there, logError always goes to stderr in the git functions.

At the moment, yes, this how it is practically used. But I do have plans to have more variation in log leveling (e.g., -qoption to quiet output and a-v` option for more verbose logging)

Furthermore, this will still break in many of Lake's current applications (e.g., print-paths) where the out pipe is not a normal terminal (which is good example of why I think transparent piping is generally bad).

lake print-paths already prints lots of stuff to stderr. How would printing additional messages break anything?

This was about the progress information, which as you mentioned is broken when buffered. The stderr for print-paths is similarly buffered and displayed by the server, so it also breaks the formatting.

More generally, at a conceptual level, it is (in my understanding) good practice in functional programming to relegate side-effects to a monad as it allows the code to be easily adapted to different environments. Lake does this by relegating all logging to the log monad. I am not alone in this, Shake uses a similar design.

In my view, any logging, including that of the git output, should go through Lake's logging mechanisms. I am not against expanding this mechanisms to able to preserve / pass through more information if necessary, but these functions should not just directly be writing to the terminal.

@gebner
Copy link
Member Author

gebner commented May 15, 2022

This was about the progress information, which as you mentioned is broken when buffered. The stderr for print-paths is similarly buffered and displayed by the server, so it also breaks the formatting.

To clarify: by broken I simply meant that you don't see how far the download is along, if you display the output only after git is finished. The only broken part is the delay in progress reporting. There is never any broken formatting.

Like most programs, git detects if it is connected to a tty and disables any fancy terminal features otherwise. This is what the server sees:

$ ~/lake/build/bin/lake print-paths 2>&1 | cat
info: CMark: cloning https://github.com/xubaiw/CMark.lean to ./lean_packages/CMark
Cloning into './lean_packages/CMark'...
HEAD is now at 0c59e4f refactor: use default c compiler
...

While this is what you see on the command line:

$ ~/lake/build/bin/lake print-paths
info: CMark: cloning https://github.com/xubaiw/CMark.lean to ./lean_packages/CMark
Cloning into './lean_packages/CMark'...
remote: Enumerating objects: 66, done.
remote: Counting objects: 100% (66/66), done.
remote: Compressing objects: 100% (56/56), done.
remote: Total 66 (delta 13), reused 59 (delta 7), pack-reused 0
Receiving objects: 100% (66/66), 105.14 KiB | 2.70 MiB/s, done.
Resolving deltas: 100% (13/13), done.
...

More generally, at a conceptual level, it is (in my understanding) good practice in functional programming to relegate side-effects to a monad as it allows the code to be easily adapted to different environments. Lake does this by relegating all logging to the log monad.

Good practices are typically a tradeoff, and should only matter insofar as they result in good outcomes. Disabling progress output from long-running commands so that it fits into a logging API seems like a questionable tradeoff to me, in particular when this adds no value to the user.

One option could be to add a allowOutputToStderr : Bool flag to MonadLog, and either inherit stderr or buffer it depending on the value of the flag. Would that be an acceptable compromise?

I am not alone in this, Shake uses a similar design.

AFAICT shake only does that in actions (to prevent interleaving), putStrLn is heavily used otherwise. And even then, shake explicitly supports inheriting stderr in commands (see the WithStderr flag).

@tydeu
Copy link
Member

tydeu commented May 15, 2022

To clarify: by broken I simply meant that you don't see how far the download is along, if you display the output only after git is finished. The only broken part is the delay in progress reporting. There is never any broken formatting.

Ah, sorry, this is just miscommunication on my part. The absence of the progress information is what I meant by "broken formatting".

Good practices are typically a tradeoff, and should only matter insofar as they result in good outcomes. Disabling progress output from long-running commands so that it fits into a logging API seems like a questionable tradeoff to me, in particular when this adds no value to the user.

I disagree that it adds no potential value to the user. It makes it easier for users of the Lake library to redirect output of running processes should they desire. For instance, the server could eventually use this split up the log lines into VSCode info/warning/error messages. The same is true for a potentially GUI-based user of the library.

It also makes Lake's logs follow a consistent format, which is easier to parse / filter by end users. I am not against showing progress information, I would just like to be able to lift that information to Lake's level and present it in a consistent form, rather than just letting whatever git decides to output get piped through (and thus ending any chance of consistent formatting).

AFAICT shake only does that in actions (to prevent interleaving).

That is what I meant. Note the Git utilities are available to user builds as well and I have already seen instances where libraries are using them. The same interleaving problem can thus arise in Lake. Furthermore, it is already possible for Lake, to be checking out multiple dependencies simultaneously, thus interleaving can already occur there.

In summary, I think I understand where you are coming from and I share your desire to provide more information about the Git processes. I just don't think this is the way to do it at the moment.

@gebner
Copy link
Member Author

gebner commented May 15, 2022

Okay, I've split off the stderr changes to #67. I'm not sure we'll reach a consensus on that point any time soon.

Furthermore, it is already possible for Lake, to be checking out multiple dependencies simultaneously, thus interleaving can already occur there.

This is a reasonable point, but I've never seen lake check out dependencies in parallel. Is there anything I have to do to enable it?

It also makes Lake's logs follow a consistent format, which is easier to parse / filter by end users. I am not against showing progress information, I would just like to be able to lift that information to Lake's level and present it in a consistent form, rather than just letting whatever git decides to output get piped through (and thus ending any chance of consistent formatting).

It is just as consistent as the output of the build commands FWIW. There we also have a header > command args followed by the literal command output.

@tydeu
Copy link
Member

tydeu commented May 15, 2022

I've never seen lake check out dependencies in parallel. Is there anything I have to do to enable it?

I checked over the code and appears I was mistaken. I guess I confused what I wanted to do with what I have done so far. So I guess it is not a good reason at the moment, but it will hopefully be eventually.

It is just as consistent as the output of the build commands FWIW. There we also have a header > command args followed by the literal command output.

Note that command output for builds does go through the logging interface, though. See the code for proc.

Regardless, thanks for splitting it up! The remaining changes look good so I will merge,

@tydeu tydeu merged commit 42ea5f2 into leanprover:master May 15, 2022
@tydeu tydeu added bug Something isn't working performance Timing and resource usage labels May 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working performance Timing and resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants