Compare results with tolerance #390

AlexanderYastrebov · 2024-01-14T05:36:46Z

As an exercise I've re-implemented #375 idea to compare result numbers with a tolerance:

$ ./test.sh baseline 0 src/test/resources/samples/measurements-rounding-precise.txt
Validating calculate_average_baseline.sh -- src/test/resources/samples/measurements-rounding-precise.txt
Rounding=14.6/25.5/33.6 != Rounding=14.6/25.4/33.6 (avg)

The hassle of parsing highlights the importance of using machine-readable format as suggested in #14

But I think this approach is wrong, the rules should define rounding mode and baseline should be fixed to produce the correct result.

Submissions should be fixed accordingly or qualified as non-passing.

Could have been done earlier but better late than never.

Updates #49

As an exercise I've re-implemented gunnarmorling#375 idea to compare result numbers with a tolerance: ``` $ ./test.sh baseline 0 src/test/resources/samples/measurements-rounding-precise.txt Validating calculate_average_baseline.sh -- src/test/resources/samples/measurements-rounding-precise.txt Rounding=14.6/25.5/33.6 != Rounding=14.6/25.4/33.6 (avg) ``` The hassle of parsing highlights the importance of using machine-readable format as suggested in gunnarmorling#14 **But** I think this approach is wrong, the rules should define rounding mode and baseline should be fixed to produce the correct result. Submissions should be fixed accordingly or qualified as non-passing. Could have been done earlier but better late than never. Updates gunnarmorling#49

gunnarmorling · 2024-01-14T08:32:00Z

Hey @AlexanderYastrebov, yeah, agreed that the "compare with tolerance" approach isn't the best one. I have opened #392 which fixes the baseline and adds a test case to ensure the correct behavior. Note I am not going to re-evaluate existing entries, compliance at the time of original evaluation is what decides (it's a common strategy, for instance also used in JCP when certifying spec implementations). New or updated entries must be compliant with the fixed behavior, by passing the additional test.

gunnarmorling · 2024-01-14T11:57:25Z

So do we still need this in the light of the above, @AlexanderYastrebov?

AlexanderYastrebov · 2024-01-14T12:15:51Z

So do we still need this in the light of the above

No, this was just an exercise.

AlexanderYastrebov closed this Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare results with tolerance #390

Compare results with tolerance #390

AlexanderYastrebov commented Jan 14, 2024

gunnarmorling commented Jan 14, 2024 •

edited

Loading

gunnarmorling commented Jan 14, 2024

AlexanderYastrebov commented Jan 14, 2024

Compare results with tolerance #390

Compare results with tolerance #390

Conversation

AlexanderYastrebov commented Jan 14, 2024

gunnarmorling commented Jan 14, 2024 • edited Loading

gunnarmorling commented Jan 14, 2024

AlexanderYastrebov commented Jan 14, 2024

gunnarmorling commented Jan 14, 2024 •

edited

Loading