Skip to content

Conversation

@bersprockets
Copy link
Contributor

@bersprockets bersprockets commented Dec 5, 2022

What changes were proposed in this pull request?

Change InterpretedMutableProjection to use setDecimal rather than setNullAt to set null values for decimals in unsafe rows.

Why are the changes needed?

The following returns the wrong answer:

set spark.sql.codegen.wholeStage=false;
set spark.sql.codegen.factoryMode=NO_CODEGEN;

select max(col1), max(col2) from values
(cast(null  as decimal(27,2)), cast(null   as decimal(27,2))),
(cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2)))
as data(col1, col2);

+---------+---------+
|max(col1)|max(col2)|
+---------+---------+
|null     |239.88   |
+---------+---------+

This is because InterpretedMutableProjection inappropriately uses InternalRow#setNullAt on unsafe rows to set null for decimal types with precision > Decimal.MAX_LONG_DIGITS.

When setNullAt is used, the pointer to the decimal's storage area in the variable length region gets zeroed out. Later, when InterpretedMutableProjection calls setDecimal on that field, UnsafeRow#setDecimal picks up the zero pointer and stores decimal data on top of the null-tracking bit set. Later updates to the null-tracking bit set (e.g., calls to setNotNullAt) further corrupt the decimal data (turning 245.00 into 239.88, for example). The stomping of the null-tracking bit set also can make non-null fields appear null (turning 77.77 into null, for example).

This bug can manifest for end-users after codegen fallback (say, if an expression's generated code fails to compile).

Codegen for mutable projection uses mutableRow.setDecimal for null decimal values regardless of precision or the type for mutableRow, so this PR does the same.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New unit tests.

@github-actions github-actions bot added the SQL label Dec 5, 2022
Copy link
Contributor Author

@bersprockets bersprockets Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this won't handle the case where the decimal type is nested in a struct or array, that shouldn't be an issue, since in that situation the target row won't be an unsafe row (InterpretedMutableProjection#target would reject such a row, AFAICT). When the row is not an unsafe row, the plain writer should be fine.

@bersprockets bersprockets changed the title InterpretedMutableProjection should use setDecimal to set null values for high-precision decimals in an unsafe row [SPARK-41395][SQL] InterpretedMutableProjection should use setDecimal to set null values for high-precision decimals in an unsafe row Dec 5, 2022
@HyukjinKwon
Copy link
Member

cc @wangyum FYI

@bersprockets
Copy link
Contributor Author

bersprockets commented Dec 6, 2022

By the way, there's a similar-looking problem with type CalendarInterval:

set spark.sql.codegen.wholeStage=false;
set spark.sql.codegen.factoryMode=NO_CODEGEN;

select first(col1), last(col2) from values
(make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0))
as data(col1, col2);

+---------------+---------------+
|first(col1)    |last(col2)     |
+---------------+---------------+
|16 years 2 days|16 years 2 days|
+---------------+---------------+

In this case, however, the bug doesn't appear to be in InterpretedMutableProjection, but in the way the unsafe buffer is initialized, so I will address it separately.

Edit: Further research shows that CalendarInterval has the same null value special case as Decimal (i.e., you can't use UnsafeRow#setNullAt). However, it also has another issue related to how the unsafe buffer is initialized, so I will still address CalendarInterval separately.

@bersprockets bersprockets changed the title [SPARK-41395][SQL] InterpretedMutableProjection should use setDecimal to set null values for high-precision decimals in an unsafe row [SPARK-41395][SQL] InterpretedMutableProjection should use setDecimal to set null values for decimals in an unsafe row Dec 6, 2022
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Dec 9, 2022

Merged to master, branch-3.3 and branch-3.2

cc @rednaxelafx and @cloud-fan for a posthoc review when you find some time.

HyukjinKwon pushed a commit that referenced this pull request Dec 9, 2022
…mal` to set null values for decimals in an unsafe row

Change `InterpretedMutableProjection` to use `setDecimal` rather than `setNullAt` to set null values for decimals in unsafe rows.

The following returns the wrong answer:

```
set spark.sql.codegen.wholeStage=false;
set spark.sql.codegen.factoryMode=NO_CODEGEN;

select max(col1), max(col2) from values
(cast(null  as decimal(27,2)), cast(null   as decimal(27,2))),
(cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2)))
as data(col1, col2);

+---------+---------+
|max(col1)|max(col2)|
+---------+---------+
|null     |239.88   |
+---------+---------+
```
This is because `InterpretedMutableProjection` inappropriately uses `InternalRow#setNullAt` on unsafe rows to set null for decimal types with precision > `Decimal.MAX_LONG_DIGITS`.

When `setNullAt` is used, the pointer to the decimal's storage area in the variable length region gets zeroed out. Later, when `InterpretedMutableProjection` calls `setDecimal` on that field, `UnsafeRow#setDecimal` picks up the zero pointer and stores decimal data on top of the null-tracking bit set. Later updates to the null-tracking bit set (e.g., calls to `setNotNullAt`) further corrupt the decimal data (turning 245.00 into 239.88, for example). The stomping of the null-tracking bit set also can make non-null fields appear null (turning 77.77 into null, for example).

This bug can manifest for end-users after codegen fallback (say, if an expression's generated code fails to compile).

[Codegen for mutable projection](https://github.com/apache/spark/blob/89b2ee27d258dec8fe265fa862846e800a374d8e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1729) uses `mutableRow.setDecimal` for null decimal values regardless of precision or the type for `mutableRow`, so this PR does the same.

No.

New unit tests.

Closes #38923 from bersprockets/unsafe_decimal_issue.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit fec210b)
Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Dec 9, 2022
…mal` to set null values for decimals in an unsafe row

Change `InterpretedMutableProjection` to use `setDecimal` rather than `setNullAt` to set null values for decimals in unsafe rows.

The following returns the wrong answer:

```
set spark.sql.codegen.wholeStage=false;
set spark.sql.codegen.factoryMode=NO_CODEGEN;

select max(col1), max(col2) from values
(cast(null  as decimal(27,2)), cast(null   as decimal(27,2))),
(cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2)))
as data(col1, col2);

+---------+---------+
|max(col1)|max(col2)|
+---------+---------+
|null     |239.88   |
+---------+---------+
```
This is because `InterpretedMutableProjection` inappropriately uses `InternalRow#setNullAt` on unsafe rows to set null for decimal types with precision > `Decimal.MAX_LONG_DIGITS`.

When `setNullAt` is used, the pointer to the decimal's storage area in the variable length region gets zeroed out. Later, when `InterpretedMutableProjection` calls `setDecimal` on that field, `UnsafeRow#setDecimal` picks up the zero pointer and stores decimal data on top of the null-tracking bit set. Later updates to the null-tracking bit set (e.g., calls to `setNotNullAt`) further corrupt the decimal data (turning 245.00 into 239.88, for example). The stomping of the null-tracking bit set also can make non-null fields appear null (turning 77.77 into null, for example).

This bug can manifest for end-users after codegen fallback (say, if an expression's generated code fails to compile).

[Codegen for mutable projection](https://github.com/apache/spark/blob/89b2ee27d258dec8fe265fa862846e800a374d8e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1729) uses `mutableRow.setDecimal` for null decimal values regardless of precision or the type for `mutableRow`, so this PR does the same.

No.

New unit tests.

Closes #38923 from bersprockets/unsafe_decimal_issue.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit fec210b)
Signed-off-by: Hyukjin Kwon <[email protected]>
@rednaxelafx
Copy link
Contributor

Post-hoc review: LGTM, this is a good catch.

@bersprockets
Copy link
Contributor Author

Thanks @HyukjinKwon @rednaxelafx

private[this] val fieldWriters: Array[Any => Unit] = validExprs.map { case (e, i) =>
val writer = InternalRow.getWriter(i, e.dataType)
if (!e.nullable) {
if (!e.nullable || e.dataType.isInstanceOf[DecimalType]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch! It's better to add some code comments to explain it, or refactor the code to make codegen and interpreted code paths share some util functions to update an internal row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can follow up, since calendar interval has the same problem (in the case of calendar interval, the issue exists in both InterpretedMutableProjection and InterpretedUnsafeProjection).

beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…mal` to set null values for decimals in an unsafe row

### What changes were proposed in this pull request?

Change `InterpretedMutableProjection` to use `setDecimal` rather than `setNullAt` to set null values for decimals in unsafe rows.

### Why are the changes needed?

The following returns the wrong answer:

```
set spark.sql.codegen.wholeStage=false;
set spark.sql.codegen.factoryMode=NO_CODEGEN;

select max(col1), max(col2) from values
(cast(null  as decimal(27,2)), cast(null   as decimal(27,2))),
(cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2)))
as data(col1, col2);

+---------+---------+
|max(col1)|max(col2)|
+---------+---------+
|null     |239.88   |
+---------+---------+
```
This is because `InterpretedMutableProjection` inappropriately uses `InternalRow#setNullAt` on unsafe rows to set null for decimal types with precision > `Decimal.MAX_LONG_DIGITS`.

When `setNullAt` is used, the pointer to the decimal's storage area in the variable length region gets zeroed out. Later, when `InterpretedMutableProjection` calls `setDecimal` on that field, `UnsafeRow#setDecimal` picks up the zero pointer and stores decimal data on top of the null-tracking bit set. Later updates to the null-tracking bit set (e.g., calls to `setNotNullAt`) further corrupt the decimal data (turning 245.00 into 239.88, for example). The stomping of the null-tracking bit set also can make non-null fields appear null (turning 77.77 into null, for example).

This bug can manifest for end-users after codegen fallback (say, if an expression's generated code fails to compile).

[Codegen for mutable projection](https://github.com/apache/spark/blob/89b2ee27d258dec8fe265fa862846e800a374d8e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1729) uses `mutableRow.setDecimal` for null decimal values regardless of precision or the type for `mutableRow`, so this PR does the same.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit tests.

Closes apache#38923 from bersprockets/unsafe_decimal_issue.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@bersprockets bersprockets deleted the unsafe_decimal_issue branch December 23, 2022 00:09
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…mal` to set null values for decimals in an unsafe row

Change `InterpretedMutableProjection` to use `setDecimal` rather than `setNullAt` to set null values for decimals in unsafe rows.

The following returns the wrong answer:

```
set spark.sql.codegen.wholeStage=false;
set spark.sql.codegen.factoryMode=NO_CODEGEN;

select max(col1), max(col2) from values
(cast(null  as decimal(27,2)), cast(null   as decimal(27,2))),
(cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2)))
as data(col1, col2);

+---------+---------+
|max(col1)|max(col2)|
+---------+---------+
|null     |239.88   |
+---------+---------+
```
This is because `InterpretedMutableProjection` inappropriately uses `InternalRow#setNullAt` on unsafe rows to set null for decimal types with precision > `Decimal.MAX_LONG_DIGITS`.

When `setNullAt` is used, the pointer to the decimal's storage area in the variable length region gets zeroed out. Later, when `InterpretedMutableProjection` calls `setDecimal` on that field, `UnsafeRow#setDecimal` picks up the zero pointer and stores decimal data on top of the null-tracking bit set. Later updates to the null-tracking bit set (e.g., calls to `setNotNullAt`) further corrupt the decimal data (turning 245.00 into 239.88, for example). The stomping of the null-tracking bit set also can make non-null fields appear null (turning 77.77 into null, for example).

This bug can manifest for end-users after codegen fallback (say, if an expression's generated code fails to compile).

[Codegen for mutable projection](https://github.com/apache/spark/blob/89b2ee27d258dec8fe265fa862846e800a374d8e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1729) uses `mutableRow.setDecimal` for null decimal values regardless of precision or the type for `mutableRow`, so this PR does the same.

No.

New unit tests.

Closes apache#38923 from bersprockets/unsafe_decimal_issue.

Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit fec210b)
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants