Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

Remove exceptions from decimal validator message #613

Merged
merged 2 commits into from
Aug 8, 2023

Conversation

alexbaden
Copy link
Contributor

Using exceptions results in substantial performance loss in tight loops, where this code is often used.

decimal_overflow_validator_.validate(data[i]);
min = std::min(min, data[i]);
max = std::max(max, data[i]);
if (decimal_overflow_validator_.validate(data[i])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we just ignore out-of-range decimal values now or do we assume it's checked on the arrow decimal conversion? Shouldn't this validation always succeed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I am skeptical we ever hit this branch but can't guarantee it. So, we ignore out of range values. I suppose this could pose a problem later, and maybe we should just remove this code altogether and make sure we validate the decimal range on import. Alternatively we could try aborting with an error message (or perhaps an exception) and not specify whether the decimal is above or below the range. It is silly to pay the cost of a function call for every single value in every single integer column, even if that function just returns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one side, making a range check combined with stats computation is very reasonable and reduces the number of scans. On the other side, our only external data source is Arrow, which uses another decimal format and always requires conversion. So, we can probably make the converter responsible for the check and remove the decimal validator from encoders.

@alexbaden alexbaden force-pushed the alex/remove_decimal_validator branch from 30ff9a1 to e40373e Compare August 7, 2023 21:58
@alexbaden
Copy link
Contributor Author

As expected this validation also happens during arrow parse:

C++ exception with description "Invalid: In CSV column #0: Error converting '12345678910.12345' to decimal128(10, 4): precision not supported by type." thrown in the test body.

so as long as the arrow and HDK types match, decimal validation is actually completely unnecessary. I have updated the PR to remove it entirely.
I also added my commits to fixup table stats so I could make sure tests pass in lazy mode.

@alexbaden alexbaden force-pushed the alex/remove_decimal_validator branch from e40373e to ef013f2 Compare August 7, 2023 22:32
Decimal validation occurs on parse, when the input value (string, etc) is converted to an arrow::DecimalXX value.
@alexbaden alexbaden merged commit 5e72166 into main Aug 8, 2023
@alexbaden alexbaden deleted the alex/remove_decimal_validator branch August 8, 2023 02:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants