-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prompt times don't match the real times in a screen recording, and zsh4humans seem to be getting quite a good advantage #5
Comments
To my knowledge, it does not. The procedure you've used for benchmarking is underspecified, so I cannot replicate your results. It also doesn't model real-life use of zsh. Starting login shell and immediately closing it is quite unusual, and so is overriding It's vital to perform all measurements in a container to allow everyone to create the same environment. I've created a script that spins up a docker container and sets up zsh in it. You can use it like so: curl -fsSLO https://raw.githubusercontent.com/romkatv/zsh-bench/master/playground
chmod +x ./playground
./playground zim Instead of After a few minutes you should have interactive zsh with You can install additional software with If you want to record a video of starting a login zsh, use this command: clear; print start; zsh -li Once "start" appears on the screen, start the timer. Once "repo" (the current directory) appears, stop it. That's first prompt lag. If you decide to base your benchmark results on a video recording, please specify the exact command that everyone can use to record video the same way. Any non-reproducible step in your methodology will void the results, so be precise. Instead of video you can record raw output of clear; script -fqet$HOME/timing ~/typescript Run a few commands and exit zsh with cat -v ~/typescript | less And the timing data: less ~/timing The format of the timing data is documented in
You can replay a recorded session like this: clear; scriptreplay -t ~/timing ~/typescript Don't resize your TTY or it won't replay properly. I would very much prefer if you share You can also try using You can run ~/zsh-bench/zsh-bench --iters 16 All benchmark data that I've published was acquired with You can edit rc files if you need to test something (install If you find anything unexpected, or if your measurements match that of
Correct.
If you send a clean PR, I'll merge it.
If you take a minimum of several runs, these fluctuations won't matter.
Minimum is better in benchmarks of this kind, especially when ran on commodity hardware with CPU scaling, background services and whatnot. Minimum fails only when workloads are different on different iterations but in these benchmarks they are the same. |
Here are sample results from my dev machine. I've abbreviated all prompts to Machine info:
Setup:
Zim:
zsh4humans:
Script logs: Note that zsh4humans starts tmux, as indicated by sed -i -E '/start-tmux/ s/integrated/no/' ~/.zshrc |
The only thing I haven't specified is what I've used to do the screen recording. I've used Kap, but any screen recording software will do.
Starting a login shell and immediately closing it is enough to measure the time for the first prompt to appear. The prompt should not take a different time to appear because the first command after it is
It does not look like you'd love to see that happening, as you didn't even bother to understand how to replicate my results, and already tried to minimize what I did in every sentence of your paragraph above.
That's exactly what I did. I just printed the framework name instead of "start", which I hope you agree should not make any difference.
This is just because you say so. You find it very easy to say anything just to keep your current position. EDIT: Actually it's recommended to compare all data points instead of just the average. See Don't Compare Averages. And before you say the article title means that it's better to use minimum instead of averages, it does not. The article starts with "In business meetings, it's common to compare groups of numbers by comparing their averages", and then goes on from there. |
You can start whatever you want in you framework, and whatever runs as part of the shell startup until the first prompt appears should be counted as the time from the shell start to the first prompt appearance. |
In order to replicate your results I need the full list of commands that I can run on a freshly installed OS.
If a benchmark doesn't measure real life performance, it's a problem with the benchmark. You've added
The closer you get to real life usage, the less likely that you are measuring wrong things. In real life people don't override
For the sake of efficiency, let's stay on the subject matter and avoid speculations about each other's motivation, character, attitude, etc. I'm willing to replicate your results if you give me instructions as specific as the ones I gave you. It should be literally "start with a freshly installed OS and run this series of commands". All commands must have their source available. I would greatly appreciate if the OS is also free and open source -- Linux, FreeBSD, etc. If I get different results when I run the same commands as you, we'll narrow it down to things we've failed to control (machine, cosmic rays, etc.). Another way to do this is to start with the specific commands that we already have -- the ones I posted in my previous comment. I can run them and you can run them. You don't have to accept the numbers printed by zsh-bench as authoritative. Go ahead and record TTY on video, or via
Let's postpone this discussion. It's significantly less important but it can derail us.
This is obviously true. I'm probably missing a point you are trying to make here. |
After reading the issue description a few more times here's my theory why you see the discrepancy in first prompt lag as reported by zsh-bench and as measured by yourself. Firstly, when you run curl -fsSLO https://raw.githubusercontent.com/romkatv/zsh-bench/master/playground
chmod +x ./playground
./playground zim ... and then run There is also the issue with initialization of zsh in the procedure you've used. I already mentioned it in zimfw/zsh-framework-benchmark#8 (comment) (the second bullet point). In order to let zsh4humans initialize, you need to start it at least once and let zle enter idle state. In practice this happens naturally because people aren't sending buffered input with The most reliable way to warmup zsh after installing a config is by the following sequence:
zsh-bench does this automatically. After this you can run your benchmark as before although I once again should mention that buffering "exit" as input isn't achieving anything useful. Edit: When you are running |
Cool, let me work on that. I purposely want to keep the steps where the screen is recorded and the recording frames are verified as manual steps, as I want to make sure no automation is interfering with the results.
Yes, I'm using macOS, and yes there seems to be more going on in the /etc startup files in macOS than in Ubuntu, which might explain why zsh-bench reports a quicker time than the recording I did in my macOS. What do you think about running the benchmarks with
Sure. Let me use your playground container, or another container as close to it as possible so the recordings can be equivalent to the expect results from zsh-bench.
Good to know. I'll make sure to do that as the first step before actually doing the frames counting in the recordings.
I know. I just wanted to show how the results fluctuate between subsequent
I thought you were measuring the first command lag like this: Type the command before the prompt appears, and measure how long it takes for the command output to actually appear after the prompt appears. (If we type it after the prompt appears, there's already a lag between the time the prompt appeared and the time the command was typed, at least when typing manually.) If not, then how are you measuring it? And which command are you specifically using in zsh-bench? |
I'd rather have both of these automated and then we can also manually perform the same steps to double-check that the automation produces the same results. (Frankly, I don't think recording a video is a wise approach. Recording the raw writes to the TTY is both easier and more precise. But if you want to go with video, that's fine with me. It's more work but the numbers should match eventually after we iron our potential quirks with vsync, adaptive frame rate, codecs, buffering, etc.)
You can also run
The config called "zim" is for the default experience. It measures the performance of zim that a potential user would see if they installed zim and made no modifications. Similarly with "zsh4humans". The config called
It looks like the first run was slow and the second two runs had identical first prompt lag. Perhaps something was running on your machine at the same time? Try using
Yes, that's how zsh-bench measures first command lag. But you aren't measuring first command lag, you are measuring first prompt lag. If you want to measure first command lag, you need to buffer I forgot to mention that you can replay the screen of the TTY that zsh-bench is acting on. Let's say you are in rm -rf -- '' /tmp/zsh-bench-*(D) && ~/zsh-bench/zsh-bench --iters 1 --keep Edit: I've added the crucial Once this command finishes, you can see what zsh-bench was doing: clear; scriptreplay -t /tmp/zsh-bench-*/timing /tmp/zsh-bench-*/out This works only if your This method is very useful when debugging zsh configs. It allows you to see errors if there are any and to generally verify that things work as they are supposed to. You'll see that at first zsh-bench issues a short command that sources something. That's the first command and it gets sent to the TTY right away without waiting for the first prompt. The sourced file prints, and the result of that printing signifies execution of the first command. Then zsh-bench starts measuring input latency. It prints a fairly long command composed of a bunch of escaped Then zsh-bench measures command lag by spamming ENTER and seeing how many prompts it would get. Once you play around with |
Sorry, just catching up on all this and discussion in the other benchmarking repo. @ericbn My memory is hazy and I don't remember if there is any interaction with login shells when using Awesome that you were able to generate so many stats. Reading all of this was really the best kind of issue report :) @romkatv this is a really cool tool, although I'll admit I haven't looked deeply into your zsh4humans project. If the claim to fame is that it is fast, awesome! Fixing the horrendously slow prompt lag on slower devices (first generation raspberry pis, old old intel chips, etc) that came with other frameworks was one of my goals when I started zim. Part of the original design of zim was also to allow the "heavier" tasks to run at the login shell, with the idea that it usually only happens once per interactive session. That is to say, when you sit down and login or connect, you're only going to do it once per session/terminal. This included things like In that same line referenced above, you can see that an explicit and seemingly superfluous Design obviously wasn't perfect, but was a good compromise vs trying to make things asynchronous until the cows come home. By happenstance, working on my original zsh framework benchmarking tool is where I learned precisely how painful that is ;) My only question, do you run login shells frequently? If you open a new terminal is it always a login shell? I ask only because it doesn't match my personal experience. I'd be interested to see these benchmarks run on a variety of devices. There is a huge difference in speed based on IO devices. This is why the original benchmarks for zim ran on a variety of devices. Machines with an NVMe will perform better with a Celeron chip than a machine with an HDD and a Ryzen Threadripper in some circumstances. (Zim was knowingly designed with file IO in mind, hence the cheers! 🍻 |
Not particularly. Why do you ask?
Yes. Note that this is the default in Apple Terminal and iTerm2. You can change it but then
Cool. If you run the benchmarks on other devices, please share the results.
It's NVMe M.2. I added this information to the doc. |
Out of curiosity I've benchmarked zim and zsh4humans on a Raspberry Pi 4 4G Model B with a micro SD card for storage.
Edit: See more accurate results below. |
Edit: the contents below were edited by @romkatv. I, @Eriner, did not author any of the text below.
What do you mean by "pronounced"? Which conclusions did you draw from the benchmark results I posted above? I reran the benchmarks today with proper isolation and Raw:
Normalized:
For comparison, here are the results from my desktop:
On the pi all latencies are between 2.3 and 5.9 times higher. That's a fairly narrow range considering the differences between these machines on different dimensions: CPU architecture, single core performance, number of cores, I/O, etc. It's worth noting that the pi performs remarkably well. |
Hey @ericbn, have you made any progress on that? |
@romkatv I meant that when things are slow, it is more obvious. Rather than just using benchmarks comparing against vanilla/no rcs, running in various real world conditions is more demonstrative of the problem. That is to say, seeing that it takes 300ms to load a prompt on a Rpi means something to me as a reviewer, whereas just knowing that it takes XXX% longer to load vs vanilla doesn't convey as much practical information (to me, personally). I've mentioned this to @ericbn in the past (I think, maybe I kept it to myself) but I've always wondered what load times would be like if all sourced files were written into a single file and sourced. Obviously there are practical reasons that make this tricky both in the short-term for the test and long-term for utilization (variable scoping, change management, etc). However, I'd bet that there is a massive speed up purely from concatenating all of the files - much less random read disk IO.
In terms of conclusions, not having experimented myself and going purely by intuition, I'd guess that disk IO is one of the symptoms that zim is manifesting. Put another way (again knowing nothing about zsh4humans), I'd assume that zsh4humans reads fewer files on disk and has less read IO. And maybe not, please correct me if I'm wrong. |
I guess it tells you exactly what it tells you and nothing more. If the benchmark told you that it takes 100ms on MacBook (or Acer, Dell, or whatever machine you are actually using), would you find this less useful than the data point from Rpi?
I agree that presenting benchmark numbers as "XXX% slower than vanilla" is not useful (in zsh-bench "vanilla" is no-rcs). I'm not aware of any benchmark that actually does this.
Now we know.
Apparently not.
Which symptoms do you mean?
Under the benchmark conditions zsh4humans reads more files than zim. |
Put yet another way, I don't see a prompt taking a reasonable amount of time to load as a problem. Sub 1s? Acceptable. Sub 500ms? Good. Sub 200ms? Great. Sub 100ms? Amazing. I had/have a problem with any prompt taking 1s+ to load, as it begins to actually impact productivity. This problem tends to rear its ugly head when using low-specc'ed devices, but not with modern machines. |
My thesis is that everyone prefers software where user actions have instantaneous -- or rather indistinguishable from instantaneous -- effect over software with detectable lag, everything else being equal. The nice thing about this thesis is that it's very likely true and it doesn't require introducing subjective numbers and grades (500ms -- good, 200ms -- great, etc.). |
Absolutely. "don't make me wait" is something I think everyone agrees with :)
Yes.
What do we know? I was just theorizing out loud - actually building a system like that is completely out of scope for this discussion. Zim doesn't do this.
Latency, however it is measured in this benchmark.
Really? out of curiosity, what are the numbers? (I still haven't cloned this repo or run any benchmarks yet, again just catching up in this thread). I haven't had time to examine methodology or anything (yet). |
Thanks for correcting my mistake. I thought zim did that but now I realize that I confused it with zpm and I don't have a config for it in zsh-bench.
Zsh with all configs has latency. I don't think I understand your point.
You are commenting on an issue in zsh-bench. Might as well read the homepage. |
Ha, I've read the homepage and this issue (and skimmed the scripts, but only skimmed). I just didn't see any info about differences between the number of file accesses during benchmarking of each set of scripts. The number of files accesses will also (probably?) be different than just the number of files in the repo, and will also change between login vs non-login shells, zcompile vs no zcompile, autoloads, etc.) This is primarily to satisfy my own personal curiosity, I'm not positive there is a performance corollary. Just curious is all.
Let me rephrase: If random disk IO is the problem, having a large number of file accesses will exacerbate this problem. Measuring performance on a raspberry pi with a common microSD will show a good measurement for a "worst case scenario". If there is indeed a corollary between file accesses and performance, then it stands to reason frameworks with a larger number of file accesses may be much less performant than those with fewer file accesses. This may reveal optimization strategies that can be used to mitigate a common underlying problem: slow random disk read IO. |
zsh-bench measures only things users care about. That is, something that can be observed. It's a benchmark, not a profiler.
Shouldn't be to hard to find out. |
I posted the benchmark results from rpi4 in this thread together with my interpretation. Basically, all latencies are higher by a similar factor and this factor is small. So it stands to reason that we don't gain anything from benchmarking on the pi, so we might as well benchmark on hardware that is actually popular so that we can optimize for the larger number of users. Did you interpret the benchmark results differently? |
eh, I think it's going to be tougher than one may think. It may be easier to sniff syscalls than it will be to use zsh.
Yes. It is a reproducible environment. It is also a "worst case scenario"; any benefits measure on an RPI will carry over to any other machine. Again, I suspect disk IO is the problem, so swapping various microSDs (all else equal) may tell more of a story. I'll get around to running the benchmarks here, but I'll have to review the code first. Pretty sure I have microsds of varying quality laying around... |
Run on a random dedicated KVM slice I lease:
As I have stated earlier, I suspect random reads from a slow-performing disk are a major contributing factor to the slowness, thus here are some tests that drop disk caches at varying intervals. Note that these aren't a great simulation of slow IO, just the easiest method I could come up with (without swapping physical disks). while true; do echo 1 > /proc/sys/vm/drop_caches && sleep $_TIME; done
And running again, stopping the cache dropping just to validate the original results:
I understand that this is a poor excuse for a reproducible test; it's just what I could do quickly. Based on these results, I stand by my supposition that random disk read speed (and number of files read) play a significant role in measured speeds. I'll investigate more, but if this holds true then reducing the number of sourced files may offer a significant speed increase. |
And more to the ticket at hand, your results do seem inconsistent. This is from your raw benchmark data:
Where the differentials are from a random KVM slice I ran for the test above.
That doesn't seem to be the case. It seems like there can be quite a lot of variation, and using a multiplier based on a subjective scale really takes the context out of these measurements. |
Just to clarify, by inconsistent you mean that benchmark results from your VM look different from the results I've got on my desktop, right? If so, it's because of 223b536. Basically, if you create a bunch of files and immediately add them to git index, zim's prompt will be slow. But if you wait a bit between creating the files and adding them, zim's prompt will be fast. I added that sleep so that zim is benchmarked under more favorable conditions (even though in practice files are indeed often created and immediately added to the index; e.g., this happens when you clone a repo). If you want to reproduce this on your machine, run this command and notice that prompt lags: () {
emulate -L zsh -o err_return -o pipe_fail
local tmp
tmp=$(mktemp -d)
cd -- $tmp
print -rC1 -- {1..10}/{1..10}/{1..10} | xargs mkdir -p --
git init
print -rC1 -- {1..10}/{1..10}/{1..10}/{1..10} | xargs touch --
(( sleep_before_add )) && sleep 1
git add .
git commit --quiet --allow-empty --allow-empty-message --no-gpg-sign -m ''
} It doesn't always lag because it's inherently racy. Run the command a few times and you'll catch lag at some point. If you now set On a related note, if you run I haven't updated the benchmark results after adding that Edit: I've published new benchmark results.
Can you share the reasoning behind this statement? It's possible but I wonder why you think it's true. |
I added sync
/proc/sys/vm/drop_caches <<<N Here are the results from my machine:
The effect of Based on these results I predict that reducing the number of files read by zim on startup will decrease cold startup latency by less than 50% and will have negligible effect on hot startup. |
Opened zimfw/zimfw#441. |
50% is huge. Also, you have an NVMe! It's going to be fast no matter what! ;D It'll rear its ugly head more with lower-classed IO devices (HDD, microSD, etc).
I am once again stating that the device configuration plays a heavy role in the outcome, and that "the ballpark should be the same" isn't an accurate statement, as demonstrated by all of the results above. It is clear that the performance of each varies based on hardware performance to a significant degree. This is, again, why I think benchmarking on an RPI is good - if the shell scripts run with adequate speed on an RPI, they'll run with adequate speed just about anywhere. Based on the few comments above, I wonder if it is (in large part) simply the synchronous calls to |
The quote "the ballpark should be the same" is from https://github.com/romkatv/zsh-bench#how-fast-is-fast. It's talking about human perception of latency, which is independent of hardware. E.g., I can distinguish between 0ms and 10ms command lag but cannot distinguish between 0ms and <10ms. So for me the perception threshold of command lag is 10ms. For you this value might be different but "the ballpark should be the same".
Obviously.
Obviously. Do I understand correctly that in your opinion it follows that it's a good use of one's time to benchmark on Rpi if one's goal is to make zim fast on popular hardware? I think this does not follow. If one wanted to make zim fast on popular hardware, it would be more efficient to benchmark zim on popular hardware. This way optimizations would transfer to the goal better. It's not my time that's at stake though (I've no interest in optimizing zim), and I believe neither it is yours.
It really isn't. Diffing git index against workdir is faster if index is newer than files. |
If by "it" you mean zim then it's a matter of what you call fast. As you can see in the benchmark results, zim has two latencies above 100% on my machine in a "hot" benchmark -- first prompt lag and command lag. This means I can perceive lag when using zim. You can still call it fast but not in an objective way. "Fast enough for me" would be a more precise characterization. It's different with input lag. Since zim is below 100% on this metric, I cannot differentiate it from zero input lag. Here zim is objectively fast, no asterisks required. |
The percentages (scale) is your subjective timing, not mine. "Fast" is your own subjective opinion, as my subjective opinion is mine.
Once again, yes. If you run these benchmarks and they're adequately performant on a slow, slow rpi, they will be adequately performant anywhere. There is zero reason to benchmark on fast machines if you're benchmarking on slow machines - fast machines will always be faster than slow machines. If there is a problem with speed, it will manifest on an rpi or slower hardware whereas it may not on high-spec machines manufactured in the year 2020. |
by "it" I meant disk IO. Specifically, dropping disk caches won't mean as much when testing with an NVMe; the disk IO is so fast that the caches (almost) don't matter. Comparing these results while running on a system with an microSD or HDD will yield vastly different results, I am nearly certain. |
I misunderstood this - I thought this was in reference to the results of the benchmark, not the subjective threshold values you have selected as the scale. This comment makes more sense in this context. |
OK, let's use "indistinguishable from instant" instead of "objectively fast". Even though different people have different perception, these values can still be measured objectively for each person without resorting to judgement. It's easy to check -- using a formal study protocol -- whether you can distinguish between the current prompt lag of zim and zero prompt lag. I'm sure you can by the way. As I mentioned earlier, I really like "indistinguishable from instant" as the goal of optimization because it doesn't require judgement and because all users prefer software that's indistinguishable from instant (all else being equal). Zim may be fast enough for you but if you made it indistinguishable from instant users would choose the new version over old.
I wonder why you repeated this statement despite my replying "obviously" to it the first time. This statement is tautologically true. Did you think I might think it false?
Not true. Let's say you want to make zim indistinguishable from instant on a representative dev machine -- say, a laptop or a desktop less than a decade old. A rather useful goal considering the fraction of zim users with such machines. You run a benchmark on Rpi and optimize something to reduce latency by 50%. Are you done? No. You still don't know whether zim is indistinguishable from instant on a representative dev machine. So you have to run a benchmark on the machine that you actually care about. You might find that your optimizations had little effect because Rpi is very different from a laptop or a desktop. One might wonder why you benchmarked on Rpi in the first place.
When I see so many obviously true statements, I start to suspect that our communication isn't working.
Maybe. Maybe not. If I cared about the performance of zsh on a cacheless Rpi, I would run this benchmark.
LOL
What do you mean by "fast"? Faster than microSD? Of course. Fast enough to make zim indistinguishable from instant? No. Hot or cold, my machine is not fast enough for that. Proposal: Let's stop this conversation. |
I think you're misunderstanding my underlying statement here. I know that you know that slow machines are slower than fast machines, lol. I use quite a few rpis in my homelab and all are using zim. I notice some latency, but it isn't what I would consider to be "slow" or "problematic". My point in benchmarking these devices is that I use them, and making zim (perceptively) fast on these devices was one of my original drivers for the project.
I can't think of a single modification that could be made that would improve speed on an rpi that wouldn't improve performance on more performant hardware, thus I believe measuring the "lowest common denominator" of at least my personal usage is reasonable and correct. I'm not suggesting that this be benchmarked on ESP8266 microcontrollers or anything crazy here. It seems that you're concerned about performance that is representative of a "dev machine", which happens to have an NVMe drive. I am concerned about performance on less modern hardware. Rpis are also much more 'reproducible' hardware than whatever hardware is in your personal dev machine. If you still don't see the rationale in benchmarking on an rpi, feel free to ignore this comment. |
We are in agreement. You should benchmark on the machines on which you want to improve performance. Since your goal is to make your code fast on Rpi, it's logical to benchmark on Rpi.
I can. Rpi has different relative costs of sequential I/O, random access I/O and execution of instructions, so an optimization that reduces one and increases the other can reduce latency on Rpi while increasing it on a desktop machine. Only optimizations that strictly reduce something without increasing anything else have the property you've described, but then it goes both ways.
I want my code to be fast on the machines my users run. I invest my time optimizing code proportionally to the impact it'll have. If 1% of of my users are on Rpi, I'll invest 1% of my time optimizing for it. zsh4humans on Rpi has latencies slightly above my perception threshold -- the highest is 148%. It's pretty good, so I don't feel inclined to spend my time optimizing for Rpi. For comparison, the highest latency of zim on my desktop machine is 330%.
Why does it matter? My goal is to make my software fast on real hardware that my users have. If their hardware isn't readily available for me to benchmark on, it doesn't imply that I should benchmark on Rpi. Proposal: Let's drop this discussion. I'm not gaining anything out of it and I suspect neither do you. |
Good luck, and neat project. Cheers |
Thanks! Have a great day 👍 |
I'm closing the issue as resolved. Summary: @ericbn reported that by using a different methodology to measure first prompt lag he obtained results that disagree with
See #5 (comment) for details. I've added a new section to the documentation to aid in debugging and validation of zsh-bench results. If in the future you find that latencies reported by zsh-bench are incorrect, please open a new issue. Make sure to provide instructions that would allow me to reproduce your results. |
FYI: I ported zsh-bench to BSD and ran a couple of benchmarks on MacBook Air (M1, 2020). Methodology:
() {
builtin emulate -L zsh -o err_return
command sudo dscl . -create /Users/$1
command sudo dscl . -create /Users/$1 UserShell /bin/zsh
command sudo dscl . -create /Users/$1 RealName "Zsh Bench User"
command sudo dscl . -create /Users/$1 UniqueID 1042
command sudo dscl . -create /Users/$1 PrimaryGroupID 20
command sudo dscl . -create /Users/$1 NFSHomeDirectory /Users/$1
command sudo mkdir /Users/$1
command sudo chown $1:staff /Users/$1
} zsh-bench-user
sudo -u zsh-bench-user -i zsh -f
git clone https://github.com/romkatv/zsh-bench.git ~/zsh-bench
() {
emulate -L zsh -o extended_glob -o err_return
rm -rf -- '' ~/^zsh-bench(ND)
~/zsh-bench/configs/$1/setup
} zim
~/zsh-bench/zsh-bench --iters 2
Raw results:
Normalized:
These results are surprisingly similar to what I've obtained on my dev machine running Linux. M1 is really fast! The only significant difference is first prompt lag in zsh4humans. It's 54% on the laptop compared to 20% on the desktop. I haven't looked why that might be but confirmed that zsh-bench reports the right number. |
@ericbn Hey, Eric. I saw that you've unarchived zimfw/zsh-framework-benchmark and made significant changes in zimfw/zsh-framework-benchmark#9. This looks great! I also saw that you've implemented several optimizations in zim and I can confirm that they have user-visible effect. This might be too early but I couldn't resist cross-validating benchmark results from your PR against zsh-bench. Hopefully you'll find this data useful. I ran zimfw/zsh-framework-benchmark and zsh-bench on the same zsh configs with this command: docker run -w /root -e TERM=xterm-256color -i --rm alpine sh -ues &>validation.log <<-\END
apk add git curl zsh perl ncurses expect util-linux
sed -i.bak -E '/^root:/ s|[^:]*$|/bin/zsh|' /etc/passwd
rm /etc/passwd.bak
export SHELL=/bin/zsh
git clone https://github.com/romkatv/zsh-bench.git
git clone -b time_to_prompt https://github.com/zimfw/zsh-framework-benchmark.git
cd zsh-framework-benchmark
rm frameworks/zinit-turbo.zsh frameworks/zplug.zsh
git apply <<-\PATCH
diff --git a/frameworks/zsh4humans.zsh b/frameworks/zsh4humans.zsh
index 6b61795..c0fcf6b 100644
--- a/frameworks/zsh4humans.zsh
+++ b/frameworks/zsh4humans.zsh
@@ -13,6 +13,9 @@ cp zsh4humans-${v}/{.zshenv,.zshrc} ./
# install and load zimfw/git
sed -i.bak -E 's|^(z4h install [^|]*)|\1 zimfw/git|' .zshrc
sed -i.bak -E 's|^(z4h load [^|]*)|\1 $Z4H/zimfw/git|' .zshrc
+sed -i.bak -E '1i\
+zstyle :z4h: start-tmux no
+' .zshrc
rm -r v${v}.tar.gz zsh4humans-${v} .zshrc.bak
# initialize zsh4humans
HOME=${PWD} zsh -ic 'exit' </dev/null
PATCH
zsh -fc 'SHLVL=1 source ./run.zsh -k'
zsh -fueco extended_glob '
for cfg in vanilla ./results/^vanilla(/:t); do
print -r -- "==> using $PWD/results/$cfg as home directory"
HOME=$PWD/results/$cfg ../zsh-bench/zsh-bench --standalone --git no --iters 100
done'
END (Note that this command has tabs in it; they cannot be replaced with spaces.) zsh-framework-benchmark was failing on the installation of zplug and on benchmarking of zinit-turbo, so I removed them. I didn't try to debug them.
zsh-bench is executed with Here is the full output: validation.log I've processed the raw output with this command: () {
emulate -L zsh -o err_return -o no_unset
local line
while true; do
IFS= read -r line
[[ $line == framework,mean,stddev,min,max ]] && break
done
local -a frameworks zfb_results
while true; do
IFS=, read -rA line
(( $#line == 5 )) || break
frameworks+=($line[1])
zfb_results+=($line[4])
done
local -a zb_results
while true; do
IFS= read -r line
[[ $line == first_prompt_lag_ms=* ]] || continue
zb_results+=(${line#*=})
(( $#zb_results == $#frameworks )) && break
done
local -i i
print -r -- '| framework | zsh-framework-benchmark (ms) | zsh-bench (ms) | diff (ms) |'
print -r -- '|-|-:|-:|-:|'
for i in {1..$#frameworks}; do
local fw=$frameworks[i]
local -F x='1e3 * zfb_results[i]'
local -F y='zb_results[i]'
local -F d='y - x'
printf '| %s | %.3f | %.3f | %+.3f |\n' $fw $x $y $d
done
} <validation.log >validation.md Here's the resulting table:
Latency reported by zsh-framework-benchmark matches first_prompt_lag from zsh-bench within 1ms. There are two exceptions where zsh-bench reports higher latency: zinit-light with +5.5ms (+3.5%) and ohmyzsh with +19ms (+32%). These deviation are large enough to require an explanation. I haven't looked at them yet. P.S. I would've opened an issue at zimfw/zsh-framework-benchmark but issues are disabled there. |
Hi @romkatv. Thanks for cross-checking zsh-framework-benchmark with zsh-bench! As part of the patch for frameworks/zsh4humans.zsh, can we also install zsh-users/zsh-history-substring-search with it, so the framework configuration is closer to the others used in zsh-framework-benchmark? |
zsh-users/zsh-history-substring-search is enabled by default in zsh4humans. |
@ericbn Both exceptions have disappeared with your fix: zimfw/zsh-framework-benchmark#10
Now all zsh-framework-benchmark results match first_prompt_lag from |
@ericbn I spoke too soon. first_prompt_lag reported by zsh-bench appears to be about 0.5ms lower. To figure out whether this effect is real I created >>! $1/.zshenv <<\END
setopt no_rcs
PS1="%~%# "
END With this config the difference in benchmark results is clear. The effect is real. To decide which results are correct I created the same clear && print st""art && exec /bin/zsh -l I then looked at the typescript to see how long it takes between "start" and "~#" appearing on the screen. The numbers match what zsh-bench reports. In other words, zsh-framework-benchmark reports slightly higher startup latency than it should. Perhaps there is unaccounted overhead somewhere? |
To summarize what happened since this issue was opened: Thanks to @romkatv's input and to his thoughts in zsh-bench, this is what we've improved in Zim:
2023-02-26 EDIT: We were still insisting in delaying the completion dumpfile check, but we're finally doing the same improved dumpfile check that zsh4humans uses during startup. |
Thanks for the update! |
It's hard to be sure how exactly zsh-bench is measuring what it claims to be measuring, by looking at its code.
I'm going to trust a screen recording as the source of truth for the prompt times in my machine. I've recorded zim and zsh4humans, 3 times each. Each time the steps are:
exit
and ENTER, so it's used as input in the next steps.HOME
as the specific framework installation dir. I consider the counting ends when the prompt appears.This translates to:
for f in zim zsh4humans; do repeat 3 do; sleep 1; print ${f}; HOME=${PWD:h}/${f} zsh -li; done; done
The steps above are executed in a dir with a git repo with 1,000 directories and 10,000 files, set up using the code here and here.
Each framework was installed with the setup script here and here, respectively for zim and zsh4humans.
This should be enough to guarantee that the recordings are using the same scenario used by zsh-bench.
This is the screen recording:
This is the times extracted from checking the recording frames:
The recording has 30fps, so each frame represents 33.333ms. We can consider the error of each measurement above to be +/- 33.333ms.
This is what I get when running zsh-bench on my same machine. Ran 3 times each of the mentioned frameworks:
What is odd
zim first_prompt_lag_ms times in zsh-bench fluctuate quite a lot. It's an average of 169.598ms with stdev of 46.601! In the recordings the average was 288.889ms with a stdev of 19.245, which was actually a stdev of 0.577 in terms of frames (it's just one frame more in one recordings). zsh-bench should be more precise than a 30fps recording.
Also from zsh-bench's code looks like this output is the min value. It would be good to know what is the stdev of values inside a zsh-bench run, to make sure the values are not fluctuating.
I could compare the minimum recording time with the minimum zsh-bench time for each framework, but let's use the averages since the stdev of zsh-bench for zim was so high. So for completeness of information, the first_prompt_lag_ms average for zsh4humans is 30.213ms, with a stdev of 0.344. And in recordings the average was 244.444ms (stdev was 50.918, but again it was actually just one frame more in one of the recordings).
Then, it's odd that zim is 1.703x faster in zsh-bench than the recordings -- I was expecting much closer values --, and zsh4humans is 8.091x faster in zsh-bench than the recordings!
If anything, zsh-bench if favoring zsh4humans by a lot, both in terms of more stable measurements, than giving it times way faster than the real ones.
The text was updated successfully, but these errors were encountered: