-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rounding fix #392
Rounding fix #392
Conversation
6cb1c03
to
f0204d9
Compare
70692e7
to
359c0c2
Compare
…there are some without prepare script
359c0c2
to
6ff399d
Compare
@@ -79,7 +80,7 @@ public static void main(String[] args) throws IOException { | |||
return res; | |||
}, | |||
agg -> { | |||
return new ResultRow(agg.min, agg.sum / agg.count, agg.max); | |||
return new ResultRow(agg.min, (Math.round(agg.sum * 10.0) / 10.0) / agg.count, agg.max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct and just adds another rounding on top of two roundings by toString and println and thus hiding the problem even more.
You see, as I mentioned in #49, the problem can not be fixed when double is used for calculation because not all numbers can be exactly represented as doubles (e.g. 0.1 or 99.9, see https://math.stackexchange.com/questions/2710986/exact-representation-of-floating-point-numbers) and therefore Douple.parseDouble or the summation are already imprecise. Adding any kind of rounding during calculation of average or printing won't fix that.
Consider:
package sum;
import java.math.BigDecimal;
class Sum {
public static void main(String[] args) {
var sum = 0.0;
var sumD = BigDecimal.ZERO;
var rowD = new BigDecimal("99.9");
var count = 1_000_000_000;
for (int i = 0; i < count; i++) {
sum += 99.9;
sumD = sumD.add(rowD);
}
System.out.println(sum);
System.out.println(sumD);
}
}
prints
$ java Sum.java
9.989999883589902E10
99900000000.0
As you can see the sum is not precise even before we do any division.
The proper way is either (slow) to use BigDecimal for the row values and to calculate sum and then apply rounding after average calculation or (fast) use integer summation of row*10 which is possible because input uses fixed format and then again apply rounding at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be something like this:
/*
* Copyright 2023 The original authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package dev.morling.onebrc;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.joining;
import static java.util.stream.Collectors.reducing;
import java.math.BigDecimal;
import java.math.RoundingMode;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Optional;
import java.util.TreeMap;
import java.util.stream.Stream;
public class CalculateAverage_AlexanderYastrebov {
private static class Measurement {
final String name;
private long count;
// min, max and sum hold actual value scaled by 10
private long min;
private long max;
private long sum;
static Measurement parse(String line) {
var parts = line.split(";", 2);
return new Measurement(parts[0], parseMetric(parts[1]));
}
private static long parseMetric(String s) {
return Long.parseLong(s.replaceFirst("[.]", ""));
}
Measurement(String name, long value) {
this.name = name;
this.count = 1;
this.min = this.max = this.sum = value;
}
Measurement add(Measurement m) {
this.min = Math.min(min, m.min);
this.max = Math.max(max, m.max);
this.sum += m.sum;
this.count += m.count;
return this;
}
String getName() {
return name;
}
String format() {
var smin = BigDecimal.valueOf(min)
.divide(BigDecimal.TEN, 1, RoundingMode.UNNECESSARY)
.toPlainString();
var smax = BigDecimal.valueOf(max)
.divide(BigDecimal.TEN, 1, RoundingMode.UNNECESSARY)
.toPlainString();
var savg = BigDecimal.valueOf(sum)
.divide(BigDecimal.valueOf(count * 10), 1, RoundingMode.CEILING)
.toPlainString();
return String.format("%s=%s/%s/%s", name, smin, savg, smax);
}
}
public static void main(String[] args) throws Exception {
var input = "./measurements.txt";
if (args.length == 1) {
input = args[0];
}
try (Stream<String> lines = Files.lines(Paths.get(input))) {
var result = lines.map(Measurement::parse)
.collect(groupingBy(Measurement::getName, TreeMap::new,
collectingAndThen(reducing(Measurement::add), Optional::get)));
var output = result.values().stream()
.map(Measurement::format)
.collect(joining(", ", "{", "}"));
System.out.println(output);
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the calculation isn't correct, and it's certainly not what I would recommend to do in any real-world application.
But does it matter in any practical sense for the challenge at hand? Specifically, can there be any 1B row dataset with values of one fractional digit where the accumulated error would be so significant, that the result with one fractional digit would differ from the result of a correct implementation?
No description provided.