fix: do not call git checkout unless necessary #64

gebner · 2022-05-14T13:48:21Z

If the dependency is already checked out at the right revision, then we only call git rev-parse HEAD. In particular, we don't call git checkout --detach $rev, which must not be called from multiple processes in parallel.

See #63.

tydeu · 2022-05-14T18:32:23Z

The core fix looks fine. However, we do still need to suppress straight piping of git output. As these functions may be run inside a build process that is not suppose to be writing to stderr directly. To properly preserve the output, the log monad would need to trickle down to these utilities.

gebner · 2022-05-14T19:21:15Z

As these functions may be run inside a build process that is not suppose to be writing to stderr directly.

I'm not sure I understand what you're saying. We're already printing all kinds of informational messages to stderr (such as the current "trying to update" message). The git commands are also executed sequentially, so we don't need to buffer the output to prevent mixing up messages either. Does writing to stderr break anything? Or are you suggesting to use logError simply for consistency?

The big issue with buffering the git output is that this breaks the progress information. Some packages (like Unicode.lean) take a long time to download, and it's nice to see what's going on.

tydeu · 2022-05-14T19:28:45Z

I'm not sure I understand what you're saying. We're already printing all kinds of informational messages to stderr (such as the current "trying to update" message).

It shouldn't be. If it is, that is a bug. The "trying to update" message is logged. Whether that log is actually printed to stderr is dependent upon the CLI configuration (and, more generally, the Log monad provided).

The big issue with buffering the git output is that this breaks the progress information. Some packages (like Unicode.lean) take a long time to download, and it's nice to see what's going on.

I think this could be better resolved by having Lake provide is own form of progress information (or maybe parsing Git's?). Furthermore, this will still break in many of Lake's current applications (e.g., print-paths) where the out pipe is not a normal terminal (which is good example of why I think transparent piping is generally bad).

gebner · 2022-05-14T19:42:31Z

Whether that log is actually printed to stderr is dependent. upon the CLI configuration

I'm not sure I follow, the resolveDeps function is always called with the eio loggers. There is nothing to configure there, logError always goes to stderr in the git functions.

Furthermore, this will still break in many of Lake's current applications (e.g., print-paths) where the out pipe is not a normal terminal (which is good example of why I think transparent piping is generally bad).

lake print-paths already prints lots of stuff to stderr. How would printing additional messages break anything?

tydeu · 2022-05-14T21:18:32Z

I'm not sure I follow, the resolveDeps function is always called with the eio loggers. There is nothing to configure there, logError always goes to stderr in the git functions.

At the moment, yes, this how it is practically used. But I do have plans to have more variation in log leveling (e.g., -qoption to quiet output and a-v` option for more verbose logging)

Furthermore, this will still break in many of Lake's current applications (e.g., print-paths) where the out pipe is not a normal terminal (which is good example of why I think transparent piping is generally bad).

lake print-paths already prints lots of stuff to stderr. How would printing additional messages break anything?

This was about the progress information, which as you mentioned is broken when buffered. The stderr for print-paths is similarly buffered and displayed by the server, so it also breaks the formatting.

More generally, at a conceptual level, it is (in my understanding) good practice in functional programming to relegate side-effects to a monad as it allows the code to be easily adapted to different environments. Lake does this by relegating all logging to the log monad. I am not alone in this, Shake uses a similar design.

In my view, any logging, including that of the git output, should go through Lake's logging mechanisms. I am not against expanding this mechanisms to able to preserve / pass through more information if necessary, but these functions should not just directly be writing to the terminal.

gebner · 2022-05-15T14:40:18Z

This was about the progress information, which as you mentioned is broken when buffered. The stderr for print-paths is similarly buffered and displayed by the server, so it also breaks the formatting.

To clarify: by broken I simply meant that you don't see how far the download is along, if you display the output only after git is finished. The only broken part is the delay in progress reporting. There is never any broken formatting.

Like most programs, git detects if it is connected to a tty and disables any fancy terminal features otherwise. This is what the server sees:

$ ~/lake/build/bin/lake print-paths 2>&1 | cat
info: CMark: cloning https://github.com/xubaiw/CMark.lean to ./lean_packages/CMark
Cloning into './lean_packages/CMark'...
HEAD is now at 0c59e4f refactor: use default c compiler
...

While this is what you see on the command line:

$ ~/lake/build/bin/lake print-paths
info: CMark: cloning https://github.com/xubaiw/CMark.lean to ./lean_packages/CMark
Cloning into './lean_packages/CMark'...
remote: Enumerating objects: 66, done.
remote: Counting objects: 100% (66/66), done.
remote: Compressing objects: 100% (56/56), done.
remote: Total 66 (delta 13), reused 59 (delta 7), pack-reused 0
Receiving objects: 100% (66/66), 105.14 KiB | 2.70 MiB/s, done.
Resolving deltas: 100% (13/13), done.
...

More generally, at a conceptual level, it is (in my understanding) good practice in functional programming to relegate side-effects to a monad as it allows the code to be easily adapted to different environments. Lake does this by relegating all logging to the log monad.

Good practices are typically a tradeoff, and should only matter insofar as they result in good outcomes. Disabling progress output from long-running commands so that it fits into a logging API seems like a questionable tradeoff to me, in particular when this adds no value to the user.

One option could be to add a allowOutputToStderr : Bool flag to MonadLog, and either inherit stderr or buffer it depending on the value of the flag. Would that be an acceptable compromise?

I am not alone in this, Shake uses a similar design.

AFAICT shake only does that in actions (to prevent interleaving), putStrLn is heavily used otherwise. And even then, shake explicitly supports inheriting stderr in commands (see the WithStderr flag).

tydeu · 2022-05-15T19:08:22Z

To clarify: by broken I simply meant that you don't see how far the download is along, if you display the output only after git is finished. The only broken part is the delay in progress reporting. There is never any broken formatting.

Ah, sorry, this is just miscommunication on my part. The absence of the progress information is what I meant by "broken formatting".

Good practices are typically a tradeoff, and should only matter insofar as they result in good outcomes. Disabling progress output from long-running commands so that it fits into a logging API seems like a questionable tradeoff to me, in particular when this adds no value to the user.

I disagree that it adds no potential value to the user. It makes it easier for users of the Lake library to redirect output of running processes should they desire. For instance, the server could eventually use this split up the log lines into VSCode info/warning/error messages. The same is true for a potentially GUI-based user of the library.

It also makes Lake's logs follow a consistent format, which is easier to parse / filter by end users. I am not against showing progress information, I would just like to be able to lift that information to Lake's level and present it in a consistent form, rather than just letting whatever git decides to output get piped through (and thus ending any chance of consistent formatting).

AFAICT shake only does that in actions (to prevent interleaving).

That is what I meant. Note the Git utilities are available to user builds as well and I have already seen instances where libraries are using them. The same interleaving problem can thus arise in Lake. Furthermore, it is already possible for Lake, to be checking out multiple dependencies simultaneously, thus interleaving can already occur there.

In summary, I think I understand where you are coming from and I share your desire to provide more information about the Git processes. I just don't think this is the way to do it at the moment.

See leanprover#63.

gebner · 2022-05-15T19:40:13Z

Okay, I've split off the stderr changes to #67. I'm not sure we'll reach a consensus on that point any time soon.

Furthermore, it is already possible for Lake, to be checking out multiple dependencies simultaneously, thus interleaving can already occur there.

This is a reasonable point, but I've never seen lake check out dependencies in parallel. Is there anything I have to do to enable it?

It also makes Lake's logs follow a consistent format, which is easier to parse / filter by end users. I am not against showing progress information, I would just like to be able to lift that information to Lake's level and present it in a consistent form, rather than just letting whatever git decides to output get piped through (and thus ending any chance of consistent formatting).

It is just as consistent as the output of the build commands FWIW. There we also have a header > command args followed by the literal command output.

tydeu · 2022-05-15T20:13:09Z

I've never seen lake check out dependencies in parallel. Is there anything I have to do to enable it?

I checked over the code and appears I was mistaken. I guess I confused what I wanted to do with what I have done so far. So I guess it is not a good reason at the moment, but it will hopefully be eventually.

It is just as consistent as the output of the build commands FWIW. There we also have a header > command args followed by the literal command output.

Note that command output for builds does go through the logging interface, though. See the code for proc.

Regardless, thanks for splitting it up! The remaining changes look good so I will merge,

gebner mentioned this pull request May 14, 2022

Cannot run lake env in parallel #63

Closed

gebner added 2 commits May 15, 2022 21:29

perf: do not resolve full git object names

56211ed

fix: do not call git checkout unless necessary

36eeaaa

See leanprover#63.

gebner force-pushed the git-fixes branch from a62c64b to 36eeaaa Compare May 15, 2022 19:29

gebner mentioned this pull request May 15, 2022

fix: do not suppress git output #67

Closed

tydeu merged commit 42ea5f2 into leanprover:master May 15, 2022

tydeu added bug Something isn't working performance Timing and resource usage labels May 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: do not call git checkout unless necessary #64

fix: do not call git checkout unless necessary #64

gebner commented May 14, 2022 •

edited

Loading

tydeu commented May 14, 2022

gebner commented May 14, 2022

tydeu commented May 14, 2022 •

edited

Loading

gebner commented May 14, 2022

tydeu commented May 14, 2022

gebner commented May 15, 2022

tydeu commented May 15, 2022

gebner commented May 15, 2022

tydeu commented May 15, 2022

fix: do not call git checkout unless necessary #64

fix: do not call git checkout unless necessary #64

Conversation

gebner commented May 14, 2022 • edited Loading

tydeu commented May 14, 2022

gebner commented May 14, 2022

tydeu commented May 14, 2022 • edited Loading

gebner commented May 14, 2022

tydeu commented May 14, 2022

gebner commented May 15, 2022

tydeu commented May 15, 2022

gebner commented May 15, 2022

tydeu commented May 15, 2022

gebner commented May 14, 2022 •

edited

Loading

tydeu commented May 14, 2022 •

edited

Loading