Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CSV output format #14

Closed
AlexanderYastrebov opened this issue Jan 2, 2024 · 4 comments
Closed

Use CSV output format #14

AlexanderYastrebov opened this issue Jan 2, 2024 · 4 comments

Comments

@AlexanderYastrebov
Copy link
Contributor

Use proper CSV (or semicolon-separated for that matter) output format

Addis Ababa;33.0;33.0;33.0
Aden;31.1;31.1;31.1,
...

instead of weird not-a-json oneliner output.

This would simplify result comparison (diff) between multiple implementations.
This would also enable parsing of results - think of import into database or e.g. database-based implementation.

@gunnarmorling
Copy link
Owner

I agree that this would have been the better way. I'm hesitating though to change the format at this point, to keep different implementations comparable.

@lluismf
Copy link

lluismf commented Jan 2, 2024

The weird not-a-json oneliner is just the map serialized. Writing a CSV output would be extra cost.

@AlexanderYastrebov
Copy link
Contributor Author

@lluismf You are right, serializing result in a portable format is a game-changer and will severely affect processing performance of 10^9 input rows 👍

@gunnarmorling
Copy link
Owner

Hey, let's keep it friendly :)

I agree that a different output format wouldn't make any difference perf-wise in the grand scheme of things and as said, it would have been the better choice. But I don't think it's that much of an issue to justify changing this while the challenge is running.

So I'd keep this as-is for the time being, and consider it a lesson learned for whenever another challenge of this kind is happening. Thanks all!

@gunnarmorling gunnarmorling closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2024
AlexanderYastrebov added a commit to AlexanderYastrebov/1brc that referenced this issue Jan 3, 2024
It is useful to debug differences between implemenattions, e.g.:
```sh
$ while ./create_measurements.sh 1000 && diff <(./calculate_average_royvanrijn.sh 2>/dev/null | ./tocsv.sh) <(./calculate_average.sh 2>/dev/null | ./tocsv.sh) ; do echo OK; done
Created file with 1,000 measurements in 50 ms
60c60
< Bucharest;-2.9;2.9;6.1
---
> Bucharest;-2.9;2.8;6.1
265c265
< Petropavlovsk-Kamchatsky;0.9;9.3;17.7
---
> Petropavlovsk-Kamchatsky;0.9;9.2;17.7
```

For gunnarmorling#14
AlexanderYastrebov added a commit to AlexanderYastrebov/1brc that referenced this issue Jan 3, 2024
It is useful to debug differences between implementations, e.g.:
```sh
$ while ./create_measurements.sh 1000 && diff <(./calculate_average_royvanrijn.sh 2>/dev/null | ./tocsv.sh) <(./calculate_average.sh 2>/dev/null | ./tocsv.sh) ; do echo OK; done
Created file with 1,000 measurements in 50 ms
60c60
< Bucharest;-2.9;2.9;6.1
---
> Bucharest;-2.9;2.8;6.1
265c265
< Petropavlovsk-Kamchatsky;0.9;9.3;17.7
---
> Petropavlovsk-Kamchatsky;0.9;9.2;17.7
```

For gunnarmorling#14
gunnarmorling pushed a commit that referenced this issue Jan 3, 2024
It is useful to debug differences between implementations, e.g.:
```sh
$ while ./create_measurements.sh 1000 && diff <(./calculate_average_royvanrijn.sh 2>/dev/null | ./tocsv.sh) <(./calculate_average.sh 2>/dev/null | ./tocsv.sh) ; do echo OK; done
Created file with 1,000 measurements in 50 ms
60c60
< Bucharest;-2.9;2.9;6.1
---
> Bucharest;-2.9;2.8;6.1
265c265
< Petropavlovsk-Kamchatsky;0.9;9.3;17.7
---
> Petropavlovsk-Kamchatsky;0.9;9.2;17.7
```

For #14
AlexanderYastrebov added a commit to AlexanderYastrebov/1brc that referenced this issue Jan 14, 2024
As an exercise I've re-implemented gunnarmorling#375 idea to compare result numbers
with a tolerance:
```
$ ./test.sh baseline 0 src/test/resources/samples/measurements-rounding-precise.txt
Validating calculate_average_baseline.sh -- src/test/resources/samples/measurements-rounding-precise.txt
Rounding=14.6/25.5/33.6 != Rounding=14.6/25.4/33.6 (avg)
```
The hassle of parsing highlights the importance of using
machine-readable format as suggested in gunnarmorling#14

**But** I think this approach is wrong, the rules should define rounding
mode and baseline should be fixed to produce the correct result.

Submissions should be fixed accordingly or qualified as non-passing.

Could have been done earlier but better late than never.

Updates gunnarmorling#49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants