Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate2.sh: Check output of warmup run and abort early if failed #333

Merged
merged 14 commits into from
Jan 13, 2024

Conversation

hundredwatt
Copy link
Contributor

@hundredwatt hundredwatt commented Jan 11, 2024

Closes #313

  • Change instructions that create out_expected.txt to create measurements_1B.out
  • Run ./test.sh before hyperfine
  • Replace hyperfine warmup with ./test.sh <fork> measurements_1B.txt and compare its result to measurements_1B.out

New Output

In screenshot below:

  • hundredwatt fork passes tests
  • all_bad fails to pass for all tests
  • bad_1b passes test.sh default suite, but fails on test file

NOTE: for development purposes, I used a 100m row file and only 3 runs. This is reflected in the screenshot below, but not in this PR's code.

image

function print_and_execute() {
echo "+ $@" >&2
"$@"
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the changes in this PR are refactoring to use this method instead of xtrace or echo statements

evaluate2.sh Outdated Show resolved Hide resolved
evaluate2.sh Show resolved Hide resolved
evaluate2.sh Show resolved Hide resolved
test.sh Outdated Show resolved Hide resolved
@hundredwatt
Copy link
Contributor Author

hundredwatt commented Jan 12, 2024

I pushed changes based on @AlexanderYastrebov's review. I had some questions previously, but I closed them and ended up resolving the issues (I believe) so that you two can re-review and merge this before I wake up if it looks good to you 😄

My resolution was: instead of silencing ./test.sh with redirects, I added a --quiet flag to that script to be used only by evaluate2.sh.

This way, it's no longer noisy in the happy path but we still see the diffs printed on test failures :)

I replaced the git diff diff printer with diff plus tocsv.sh, see discussion: #333 (comment)

Happy Path

image

(sending the output from test.sh's invocation of prepare_$fork.sh to /dev/null still didn't silence sdkman's output even if I added 2> &1... not sure why)

On Error

image

@hundredwatt hundredwatt force-pushed the evaluate2-tests branch 6 times, most recently from 7c5c3ed to cdd2117 Compare January 12, 2024 05:55
test.sh Outdated Show resolved Hide resolved
test.sh Outdated Show resolved Hide resolved
@hundredwatt
Copy link
Contributor Author

hundredwatt commented Jan 12, 2024

@AlexanderYastrebov @gunnarmorling I removed the ./test.sh --quiet flag and instead am capturing the test output to a file, then printing it upon failure. Much better now 👍

image

@hundredwatt
Copy link
Contributor Author

Added colors 😄

image

@gunnarmorling
Copy link
Owner

Added colors 😄

Nice. On that diff, any chance we could get word diff? That's why I used git diff --no-index --word-diff at some point, as it easily highlights the diff also for the 10K file.

@AlexanderYastrebov
Copy link
Contributor

@gunnarmorling git word diff has some issues working with process substitution, see #333 (comment)

IMO the diff from #333 (comment) is quite clear (and shows the problem #49)

@gunnarmorling
Copy link
Owner

IMO the diff from #333 (comment) is quite clear

I'm not so sure :) How would you find that difference in the 10K test case? Some for the expected output of the challenge. That's where word diff helps a lot.

@gunnarmorling
Copy link
Owner

Ah, hold on. This is diffing now one station per line due to to-csv, right? Yeah, in that case I'm on board.

@hundredwatt
Copy link
Contributor Author

Ah, hold on. This is diffing now one station per line due to to-csv, right? Yeah, in that case I'm on board.

Yes... so this should be ready to merge 👍

@gunnarmorling
Copy link
Owner

Tested this some more, all looking good. Merging now. Thanks a lot, @hundredwatt! It's a very nice improvement, all in one go now, sweet!

@gunnarmorling gunnarmorling merged commit eff73db into gunnarmorling:main Jan 13, 2024
1 check passed
@gunnarmorling
Copy link
Owner

I'm also gonna delete the old evaluate script, it's not needed any more.

@gunnarmorling
Copy link
Owner

@hundredwatt, so after some more consideration, can we bring back the logging of the test output? I.e. the "Validating..." lines, and also the output of Graal VM native binary builds. Right now, one is flying a bit blind in regards to progress, in particular when the latter takes a few seconds longer.

@hundredwatt
Copy link
Contributor Author

@gunnarmorling Sure: #377

@hundredwatt hundredwatt deleted the evaluate2-tests branch January 13, 2024 16:29
dmitry-midokura pushed a commit to dmitry-midokura/1brc that referenced this pull request Jan 13, 2024
…unnarmorling#333)

* refactor: replace xtrace with "print_and_execute" function

* nit: stylize error messages

* replace out_expected.txt with measurements_1B.out

* print

* prevent errors on cleanup

* run tests and check warmup run output before running benchmark

* move "git diff" pretty diff output to test.sh

* Ensure "set -e" is re-enabled if we followed a "continue" branch

* add timeouts to test.sh invocations

* use diff with tocsv.sh to show differences on failed test

* add --quiet mode to test.sh

* move prepare_$fork.sh invocation to right below hyperfine since test.sh also invokes it

* Revert "add --quiet mode to test.sh"

This reverts commit 13e9fb7.

* use tee to capture test output to a temp file and print contents on failure

---------

Co-authored-by: Jason Nochlin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eagerly abort hyperfine run when output differs
3 participants