Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hyperfine and jq to improve evaluate.sh #182

Merged
merged 12 commits into from
Jan 9, 2024

Conversation

hundredwatt
Copy link
Contributor

@hundredwatt hundredwatt commented Jan 6, 2024

From #105

For lack of a better name, I called this script evaluate2.sh

$ ./evaluate2.sh
Usage: evaluate2.sh <fork name> (<fork name 2> ...)
image

evaluate2.sh Outdated

echo ""
TEMP_FILE=$(mktemp)
OPTS="--warmup 1 --runs 5 --export-json $TEMP_FILE"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I configured 1 warmup run... this differs from evaluate.sh so we could always change this to 0

@hundredwatt
Copy link
Contributor Author

hundredwatt commented Jan 6, 2024

I added output of the raw times in addition to the trimmed means, eg:

fork,trimmed_mean
spullara,0.52814592528
royvanrijn,0.49758428661333337

fork,raw_times
spullara,0.51465599528,0.53491328628,0.52979903628,0.51972545328,0.53921453728
royvanrijn,0.49466057828000004,0.49655945328,0.49698236928,0.4992110372800001,0.50892974428

@hundredwatt hundredwatt force-pushed the evaluate-hyperfine branch 3 times, most recently from de445e7 to 363803b Compare January 7, 2024 02:51
@gunnarmorling
Copy link
Owner

Very cool! One problem I see is that this moves the location of time measurement from just the java call to the launch script, which penalizes any contenders which call sdk for setting up a specific distro (as an example, I see the same time for merykitty as before, but +300ms for royvanrijn). We'd somehow have to extract this step.

@hundredwatt
Copy link
Contributor Author

New look, based on: #105 (comment)

image

Copy link
Contributor Author

@hundredwatt hundredwatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gunnarmorling I updated the script, please see my review comments below with some callouts and questions

evaluate2.sh Outdated Show resolved Hide resolved
evaluate2.sh Outdated Show resolved Hide resolved
evaluate2.sh Outdated Show resolved Hide resolved
evaluate2.sh Show resolved Hide resolved
evaluate2.sh Outdated Show resolved Hide resolved
evaluate2.sh Outdated Show resolved Hide resolved
Copy link
Owner

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hundredwatt, thanks, I think this is getting into great shape. A few comments inline. Apart from those, is there a way we can capture the output of the contenders? We'd need that to compare that to the expected output (see process_output.java, which then of course wouldn't have to handle the output of time any more).

@hundredwatt
Copy link
Contributor Author

@gunnarmorling Yes, we can do hyperfine --output <FILE>, I'll take a look at process_output.java shortly

Copy link
Contributor Author

@hundredwatt hundredwatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all comments from previous reviews

evaluate2.sh Outdated Show resolved Hide resolved
@gunnarmorling
Copy link
Owner

I'll take a look at process_output.java shortly

Excellent, thx! It's invoked via ./process.sh (and sorry for the code, I know it's a hot mess... ;)

@hundredwatt
Copy link
Contributor Author

Remaining TODO:

  • Save output to file
  • Incorporate functionality from process_output.java

evaluate2.sh Outdated Show resolved Hide resolved
@AlexanderYastrebov
Copy link
Contributor

AlexanderYastrebov commented Jan 9, 2024

I'll take a look at process_output.java shortly

BTW bash uses builtin time, the GNU /usr/bin/time supports TIME environment variable, see https://man7.org/linux/man-pages/man1/time.1.html

bash:

$ time sleep 1

real    0m1,009s
user    0m0,003s
sys     0m0,006s

/usr/bin/time has quite verbose default format which could be reduced just to real to avoid any kind of parsing.

$ /usr/bin/time sleep 1
0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 2240maxresident)k
0inputs+0outputs (0major+78minor)pagefaults 0swaps

$ export TIME="%e"
$ /usr/bin/time sleep 1
1.00

Note also that default sh in Ubuntu is dash

~$ ll /bin/sh
lrwxrwxrwx 1 root root 4 Sep 27  2018 /bin/sh -> dash*

that does not have time builtin so all scripts that use #!/bin/sh and time (not /usr/bin/time) in Ubuntu print

0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 2240maxresident)k
0inputs+0outputs (0major+78minor)pagefaults 0swaps

source sdk also does not work with dash.

I.e. for the next challenge baseline should stick to #!/bin/bash and use /usr/bin/time - then output could be configured via TIME :)

@hundredwatt
Copy link
Contributor Author

process_output.java is used as follows:

  1. Expected output is created via ./eval.sh baseline; cp baseline.out out_expected.txt
  2. Fork output is created via ./eval.sh $fork
  3. Invoke process.sh ./process.sh $fork
  4. process_output.java then:
    a. Verifies the printed aggregation matches baseline
    b. Collects the times
    c. Computes the trimmed mean
    d. Prints the leaderboard line in Markdown table format

evaluate2.sh already handles b. and c., so we just need to add a. and d.

Let me know if I missed anything 😄

@gunnarmorling
Copy link
Owner

Yepp, that sounds exactly right 👍 .

@hundredwatt
Copy link
Contributor Author

Ugh, hyperfine --output <FILE> overwrites FILE on each run... it doesn't append.

For now we'll only check the output of 1 run unless we can find a workaround

@gunnarmorling
Copy link
Owner

gunnarmorling commented Jan 9, 2024

For now we'll only check the output of 1 run unless we can find a workaround

So if we run this script multiple times for multiple contenders, will we check the output of the last run for all contenders (ok)? Or just the last run of the last contender (bad)?

@gunnarmorling
Copy link
Owner

I've created #266 as a follow-up to this one, re-organizing the existing launch scripts to adhere to the structure established here.

@hundredwatt
Copy link
Contributor Author

So if we run this script multiple times for multiple contenders, will we check the output of the last run for all contenders (ok)? Or just the last run of the last contender (bad)?

The former, we're ok 👍

@hundredwatt
Copy link
Contributor Author

I pushed all the changes from process_output.java!

New look:

Happy Path

image

Verification failed

image

Going to do another round of testing on a Fedora box too

@hundredwatt
Copy link
Contributor Author

For fun, the Leaderboard text now scans ./prepare_$fork.sh and extracts the Java version from sdk use if present 😄

@hundredwatt
Copy link
Contributor Author

Oops, forgot the SMT / turbo stuff

Copy link
Contributor Author

@hundredwatt hundredwatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gunnarmorling I think it's ready 😅

New look:

Happy Path

image

Verification failed

image

check_command_installed hyperfine
check_command_installed jq

# Check if SMT is enabled (we want it disabled)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the warnings look like:

image

@gunnarmorling
Copy link
Owner

That's awesome, really great stuff, @hundredwatt! I'm gonna squash everything into one commit and merge it. We can do any necessary fine-tuning in follow-up PRs. Thanks a lot for pulling through with this one!

@gunnarmorling
Copy link
Owner

For fun, the Leaderboard text now scans ./prepare_$fork.sh and extracts the Java version from sdk use if present 😄

Wanted to suggest exactly that, but you beat me to it. Seems we can (re-)generate the leaderboard fully automatically with this change. Thanks again!

@gunnarmorling gunnarmorling merged commit 42e5ca1 into gunnarmorling:main Jan 9, 2024
@hundredwatt
Copy link
Contributor Author

hundredwatt commented Jan 9, 2024 via email

@hundredwatt hundredwatt deleted the evaluate-hyperfine branch January 9, 2024 21:12
then
echo "Usage: evaluate2.sh <fork name> (<fork name 2> ...)"
echo " for each fork, there must be a 'prepare_<fork name>.sh' script and a 'calculate_average_<fork name>.sh' script"
echo " there may be an 'additional_build_steps_<fork name>.sh' script too"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need additional_build_steps_*.sh? I think the stuff could be in prepare_*.sh

They run one after another and there is a single use only:

I don't see why we can not mv additional_build_steps_thomaswue.sh prepare_thomaswue.sh and drop support for additional_build_steps_*.sh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants