Skip to content

[verifier] Add option to validate array(float) via error margin.#22143

Merged
spershin merged 1 commit intoprestodb:masterfrom
spershin:ArrayFloatingPointErrorMargin
Mar 25, 2024
Merged

[verifier] Add option to validate array(float) via error margin.#22143
spershin merged 1 commit intoprestodb:masterfrom
spershin:ArrayFloatingPointErrorMargin

Conversation

@spershin
Copy link
Contributor

@spershin spershin commented Mar 8, 2024

Description

Basically allow verifier to validate array(float) and array(double) columns in the same way as we validate float and double columns. Later we can extend this to the maps as well.

Motivation and Context

We are planning to use this when comparing Presto Native with Presto Java to reduce the noise from array(double) columns. This will also strengthen the Presto verification as non-deterministic array(double) column could hide a valid correctness issue in another column by marking query SKIPPED in the verifier.

Test Plan

Added new unit test and updated existing one.
Ran verifier on a query producing non-deterministic array(double) (artificial query).
Result is successful: the query passes correctness in the new verifier, while getting skipped in the old one.
Successfully run real use case verifier suite on the new verifier.

== NO RELEASE NOTE ==

@spershin spershin requested a review from a team as a code owner March 8, 2024 23:26
@spershin spershin requested a review from presto-oss March 8, 2024 23:26
@spershin
Copy link
Contributor Author

spershin commented Mar 8, 2024

Submitted PR a bit too early - the validation part has not been implemented yet. But the PR is good to see where it is all going.

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from 9e3445e to 352c585 Compare March 9, 2024 00:49
@spershin spershin changed the title [verifier] Add option yo validate array(float) via error margin. [verifier] Add option to validate array(float) via error margin. Mar 9, 2024
@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from 352c585 to efb5bdb Compare March 9, 2024 03:21
@elharo elharo requested a review from rschlussel March 10, 2024 00:57
public ArrayColumnChecksum(@Nullable Object checksum, @Nullable Object cardinalityChecksum, long cardinalitySum)
public ArrayColumnChecksum(
@Nullable Object checksum,
@Nullable Object sum,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an Object instead of a more specific type? For that matter why is checksum an Object? I assume there's a reason, but this is surprising me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elharo
I'm not sue, I was following the existing code from FloatingPointColumnValidator.
As I can see classes have long for counts and Object for everything else, why - not sure. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's watch out for Chesterton's Fence. Before proceeding we should figure out why checksum is an Object. Presumably there's a reason but maybe that reason applies to the new cardinality checksum and maybe it doesn't. It's also possible checksum shouldn't have been an Object in the first place, in which case we don't have to fix this now, but let's not repeat the mistake. Any idea who could shed some light on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see a good reason why sum is an Object in FloatingPointColumnChecksum. the only places it's used, it's blindly cast to a double https://github.com/prestodb/presto/blob/master/presto-verifier/src/main/java/com/facebook/presto/verifier/checksum/FloatingPointColumnValidator.java#L145-L146. For arrays, the sum is not necessarily a double, and I think it would be better not to assume it's a double here. (i mean you've only implemented it for double type, but that's more incidental. You should be able to do sum validation on arrays of other numeric types)

Regarding why Objects are used for checksums throughout, I'm not sure. I assume it was most convenient, but I also suspect the code could be improved to use more specific types. In case it's helpful for gleaning more, the object usage seems to have been introduced here 04d45ea and here: 7c48103 if that helps at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional thoughts - since we could imagine wanting to add special validation for other types. It might be better to make this argument a ColumnChecksum object and pass in a FloatingPointColumnChecksum instead of adding all the fields from FloatingPointColumnChecksum directly here. That way later we can add validation for other kinds of arrays, and just pass in the appropriate columnchecksum object and do the corresponding validation.

It might be interesting even to flip the logic a bit so ColumnChecksum would have a validate method that implementations override to call the validation appropriate for the type (something like columnChecksum.validate(Checksum other, Column column). That way you don't even need to worry about what type it might be here. you would just call validate on whatever ColumnChecksum you end up with

It's actually weird that currently validation is done by object equality, especially considering that for float checksums that's definitely not what we want. (From a quick read through, I'm pretty sure that FloatingPointColumnChecksum at least is not used in object equality checks, but it's not at all clear and definitely not enforced) I would be very in favor of moving away from this essentially untyped everything is just an Object land.

Finally, I think we have the same problem you're solving here for map types, where it does checksums on the keys and values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elharo

Let's watch out for Chesterton's Fence.

If we follow that rule you mentioned - we should not deviate of what already is in there. I suggest approach of 'as is' and then follow up if we decide to change this. I would like this PR to stay focused on its task, if possible.

@rschlussel

Some additional thoughts - since we could imagine wanting to add special validation for other types.

Yes, we would likely go that way. Maps of doubles is the next candidate, especially for values part. I was thinking about it, however wanted to stay focused on just arrays now to unblock NGA use case quickly. We can adjust few things to fit Maps more as we go and introduce more unconventional support.

Finally, I think we have the same problem you're solving here for map types, where it does checksums on the keys and values.

Not only that, but the structural types. like rows, but not sure if we want to really go there. Let's fo one step at a time? Harder to break existing stuff that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elharo @rschlussel
I believe objects are being used, because the aggregation can return NULL when there are no rows.
I'm currently debugging exactly such case.
To handle these nulls we use Object rather than a direct type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. but then why not use a Double type (which can be null since it's an object) rather than a double primitive type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.
We didn't think about in the beginning?

public ArrayColumnChecksum(@Nullable Object checksum, @Nullable Object cardinalityChecksum, long cardinalitySum)
public ArrayColumnChecksum(
@Nullable Object checksum,
@Nullable Object sum,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see a good reason why sum is an Object in FloatingPointColumnChecksum. the only places it's used, it's blindly cast to a double https://github.com/prestodb/presto/blob/master/presto-verifier/src/main/java/com/facebook/presto/verifier/checksum/FloatingPointColumnValidator.java#L145-L146. For arrays, the sum is not necessarily a double, and I think it would be better not to assume it's a double here. (i mean you've only implemented it for double type, but that's more incidental. You should be able to do sum validation on arrays of other numeric types)

Regarding why Objects are used for checksums throughout, I'm not sure. I assume it was most convenient, but I also suspect the code could be improved to use more specific types. In case it's helpful for gleaning more, the object usage seems to have been introduced here 04d45ea and here: 7c48103 if that helps at all.

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from efb5bdb to 1484bca Compare March 11, 2024 22:17
@spershin
Copy link
Contributor Author

The latest update includes:

  1. Fixed failing tests.
  2. Refactored validate() from FloatingPointColumnValidator class to FloatingPointColumnChecksum.
  3. Fixed query generation for array(double) validator (infinity issue).
  4. Added TestChecksumValidator.testFloatingPointArray() unit test.
  5. Added support of array(double) and array(real) to TestChecksumValidator.testChecksumQuery() unit test.
  6. Improved TestChecksumValidator.testChecksumQuery() to tell which line does not match.

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from 1484bca to acb8c44 Compare March 12, 2024 02:47
@spershin
Copy link
Contributor Author

Fixed Checkstyle errors.

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from acb8c44 to 5892fa1 Compare March 12, 2024 03:01
@spershin
Copy link
Contributor Author

More Checkstyle errors...

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from 5892fa1 to b6b090a Compare March 12, 2024 03:10
@spershin
Copy link
Contributor Author

Aaaand Checkstyle again...

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from b6b090a to b91473d Compare March 12, 2024 06:50
@spershin
Copy link
Contributor Author

Fixed the checksum query.

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch 3 times, most recently from 5b90982 to bf4a7b1 Compare March 13, 2024 23:40
Copy link
Collaborator

@kewang1024 kewang1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @spershin for helping improve the verification of array(double)

Comment on lines +149 to +155
boolean useFloatingPointPath = useFloatingPointPath(column);

ArrayColumnChecksum controlChecksum = toColumnChecksum(column, controlResult, useFloatingPointPath);
ArrayColumnChecksum testChecksum = toColumnChecksum(column, testResult, useFloatingPointPath);

// Not floating point case (we have '$checksum' column).
if (!useFloatingPointPath) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need useFloatingPointPath here, we can infer this information from ArrayColumnChecksum structure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ArrayColumnChecksum 's checksum is not null, then it's regular ArrayColumnChecksum
If it is null, then it's the float point check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ArrayColumnChecksum 's checksum is not null, then it's regular ArrayColumnChecksum If it is null, then it's the float point check

@kewang1024

That was my 1st choice, but it all breaks on the corner case when we get checksums on empty tables (no rows) and then a bunch of checksum columns have nulls.

I realized after that if we are triggering floating point path checking the array element type - we should be consitent and always check using that approach. :)

}
}

public static ValidateResult validate(
Copy link
Collaborator

@kewang1024 kewang1024 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although changed to static function, I think it might make more sense to still put this function in FloatingPointColumnValidator.java

FloatingPointColumnChecksum is only the data structure, FloatingPointColumnValidator will handle the validation logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kewang1024

Initially I kept it in the FloatingPointColumnValidator, but then I thought that this logic is better to be incapsulated in the structure that keeps the data. It already implements equals(), toString().
Make sense to put validate() here too, imho.

Let me know if you really prefer it back to the validator class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally get why we move it to the data structure, but might still prefer it in the validator class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either all validation in should be in the respective checksum classes or none of them(also, might fit better in the validate class if it were a non-static method like equals e.g. floatingPointChecksum.validate(FloatingPointChecksum other)). As things stand it seems a bit out of place, and I agree with @kewang1024 that it would be better in the validate class.

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from bf4a7b1 to 620920d Compare March 14, 2024 21:30
@spershin
Copy link
Contributor Author

The last update:

  1. Move ArrayColumnValidator.validate to be above private methods as per suggestion.
  2. Remove unused ChecksumResult.hasChecksum()
  3. Refactor ArrayColumnChecksum to contain FloatingPointColumnChecksum. Later we can use the same trick for Map keys and values.

@spershin
Copy link
Contributor Author

@kewang1024 , @rschlussel
This change is close to the final change.
Code review is appreciated.
The previous change works successfully on the 10 NGA queries batch.
Will test on the real workload the final change (after comments are addressed) before merging.

Copy link
Collaborator

@kewang1024 kewang1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! Overall looks good, some NIT

}
}

public static ValidateResult validate(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally get why we move it to the data structure, but might still prefer it in the validator class

}

// Class being used to return a result from the validate() method.
public static class ValidateResult
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this class to the validator as well?

return new ArrayColumnChecksum();
}

long rowCount = checksumResult.getRowCount();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this line

private final FloatingPointColumnChecksum floatingPointChecksum;

public ArrayColumnChecksum(@Nullable Object checksum, @Nullable Object cardinalityChecksum, long cardinalitySum)
// Constructor for array(non- floating point) types.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: non-floating

private final Object cardinalityChecksum;
private final long cardinalitySum;
// For array(floating point) we have extra aggregations collected.
private final FloatingPointColumnChecksum floatingPointChecksum;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it Optional

this.checksum = checksum;
this.cardinalityChecksum = cardinalityChecksum;
this.cardinalitySum = cardinalitySum;
this.floatingPointChecksum = new FloatingPointColumnChecksum();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.floatingPointChecksum = Optional.empty()

this.checksum = null;
this.cardinalityChecksum = null;
this.cardinalitySum = 0;
this.floatingPointChecksum = new FloatingPointColumnChecksum();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.floatingPointChecksum = Optional.empty()

this.rowCount = rowCount;
}

// Constructor to handle cases when the result table is empty (sum aggregations returns null).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I would prefer to leave it.
We use it, check the updated changes.

ArrayColumnChecksum testChecksum = toColumnChecksum(column, testResult, useFloatingPointPath);

return ImmutableList.of(new ColumnMatchResult<>(Objects.equals(controlChecksum, testChecksum), column, controlChecksum, testChecksum));
// Not floating point case (we have '$checksum' column).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-floating point to be consistent?

rschlussel
rschlussel previously approved these changes Mar 15, 2024
}
}

public static FloatingPointColumnChecksum.ValidateResult validateFloatingPoint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's weird to me that this logic is over here, but the rest of the validation logic is in ArrayColumnValidator. I would do one or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved validate() logic to Validate classes.

// Check the non floating point members first.
if (!Objects.equals(controlChecksum.getCardinalityChecksum(), testChecksum.getCardinalityChecksum()) ||
!Objects.equals(controlChecksum.getCardinalitySum(), testChecksum.getCardinalitySum())) {
return new FloatingPointColumnChecksum.ValidateResult(false, Optional.of("cardinality mismatch"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this new class? Can we use ColumnMatchResult instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnMatchResult is a templated class and we must return different instantiations of it from ArrayColumnChecksum and FloatingPointColumnChecksum.

I decide to have that small class to carry actual validation result, except all the columns info, which will be provided by concrete validators.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ValidateResult still feel not needed, I don't see particular bad effect of using ColumnMatchResult

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess, we can get rid of it.

{
private final double relativeErrorMargin;
private final double absoluteErrorMargin;
private final boolean arrayFloatingPointErrorMargin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this name unclear. Maybe something like "useErrorMarginForFloatArrays"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced by useErrorMarginForFloatingPointArrays and "use-error-margin-for-floating-point-arrays".

}
}

public static ValidateResult validate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either all validation in should be in the respective checksum classes or none of them(also, might fit better in the validate class if it were a non-static method like equals e.g. floatingPointChecksum.validate(FloatingPointChecksum other)). As things stand it seems a bit out of place, and I agree with @kewang1024 that it would be better in the validate class.

@Inject
public ArrayColumnValidator(VerifierConfig config)
{
this.relativeErrorMargin = config.getRelativeErrorMargin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should inject FloatingPointColumnValidator instead, and then can just call floatingPointColumnValidator.validate() and leave these configs as implementation details of the floatingpointcolumnvalidator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlussel
Do you mean make FloatingPointColumnValidator a member of ArrayColumnValidator and pass the former to the constructor of the latter?
We can do this.

@rschlussel rschlussel dismissed their stale review March 15, 2024 20:43

didn't mean to approve. clicked the wrong button

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from 620920d to 351c344 Compare March 15, 2024 22:13
@spershin
Copy link
Contributor Author

@kewang1024 @rschlussel

Thanks for the review.
I have addressed the comments so far.

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch 2 times, most recently from d732c5f to 039b69e Compare March 15, 2024 23:30
kewang1024
kewang1024 previously approved these changes Mar 20, 2024
return column.getName() + "$checksum";
}

private static String getSumColumnAlias(Column column)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this


return new ArrayColumnChecksum(
checksumResult.getChecksum(getChecksumColumnAlias(column)),
checksumResult.getChecksum(getSumColumnAlias(column)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can directly do FloatingPointColumnValidator.getSumColumnAlias(column)

// Check the non floating point members first.
if (!Objects.equals(controlChecksum.getCardinalityChecksum(), testChecksum.getCardinalityChecksum()) ||
!Objects.equals(controlChecksum.getCardinalitySum(), testChecksum.getCardinalitySum())) {
return new FloatingPointColumnChecksum.ValidateResult(false, Optional.of("cardinality mismatch"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ValidateResult still feel not needed, I don't see particular bad effect of using ColumnMatchResult

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch 2 times, most recently from 44d1c91 to b8c6c4b Compare March 21, 2024 19:18
@spershin
Copy link
Contributor Author

Addressed ArrayColumnChecksum ctor comments.

Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. A few comments, but generally looks good!

{
ArrayColumnChecksum controlChecksum = toColumnChecksum(column, controlResult);
ArrayColumnChecksum testChecksum = toColumnChecksum(column, testResult);
checkArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to confirm there's some other part of verifier that checks that row counts match before getting here, right? and so we have this argument check to sanity check the code, but we shouldn't hit it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about it, honestly - repeating what FloatingPointColumnValidator is doing.

Copy link
Contributor Author

@spershin spershin Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like a sanity check to me.
I think we can remove it from ArrayColumnValidator and move it in FloatingPointColumnValidator from overridden validate() to the public validate(), which is called from the ArrayColumnValidator.

Actually, cannot do that. It needs ChecksumResult and in the public validate() we only have ColumnChecksum.
Got to leave it as is.

@@ -42,10 +44,10 @@ public TestStructuredColumnMismatchResolver()
@Test
public void testResolveArray()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the StructuredColumnsMismatchResolver and then add a test that we don't resolve column mismatches for float arrays if we're using error margin validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlussel

Can I get a bit more context on that?
What does StructuredColumnMismatchResolver do and why it should not resolve column mismatches for float arrays if we're using error margin validation?
Thanks!

}

@Test
public void testFloatingPointArray()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for the empty table case too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was going to, but hit that snag with nulls in the Map.
I can see if I can use that HashMap to generate the nulls.

public String toString()
{
return format("checksum: %s, cardinality_checksum: %s, cardinality_sum: %s", checksum, cardinalityChecksum, cardinalitySum);
if (!floatingPointChecksum.isPresent()) {
Copy link
Contributor

@rschlussel rschlussel Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of repeating everything for constructing the string, you can use MoreObject.toStringHelper to construct a nicely printed string with all the fields. Something like:

MoreObjects.ToStringHelper toStringHelper = MoreObjects.toStringHelper(this)
    .add("cardinality_checksum", cardinalityChecksum)
    .add("cardinality_sum", cardinalitySum)

if(floatingPointChecksum.isPresent() {
    toStringHelper.add("floating_point_checksum", floatingPointchecksum.get());
} 
else {
    toStringHelper.add("checksum", checksum);
}

return toStringHelper.toString();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it's important that this toString matches the toStrings of the other checksums, so I'll add the caveat that the generated string will look slightly different than what you have, see the docs for example output https://guava.dev/releases/19.0/api/docs/com/google/common/base/MoreObjects.html#toStringHelper(java.lang.Object)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlussel
I don't know. I don't feel that this change worth the trouble at the moment.
You are saying "to construct a nicely printed string with all the fields" - are they going to be more nicely printed than with the current code? How?

Maybe we can consider a small refactor followup to move all Checksums to such format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily nicer than the current code. just helps you with not duplicating for the floating point vs. non-floating point case. It would look something like ArrayColumnChecksum{cardinality_checksum=123, cardinality_sum=345, ...}. anyway, fine to leave it as is.

@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from b8c6c4b to f7e5e3d Compare March 22, 2024 20:19
rschlussel
rschlussel previously approved these changes Mar 22, 2024
Basically allow verifier to validate array(float) and array(double)
columns in the same way as we validate float and double columns.
Later we can extend this to the maps as well.
@spershin spershin force-pushed the ArrayFloatingPointErrorMargin branch from f7e5e3d to 0c896e3 Compare March 22, 2024 22:52
@pranjalssh pranjalssh self-requested a review March 22, 2024 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants