Skip to content

Fix Iceberg read failing for Decimal type#23305

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
wypb:fix_iceberg_vector_decimal_reader
Jul 30, 2024
Merged

Fix Iceberg read failing for Decimal type#23305
tdcmeehan merged 1 commit intoprestodb:masterfrom
wypb:fix_iceberg_vector_decimal_reader

Conversation

@wypb
Copy link
Contributor

@wypb wypb commented Jul 26, 2024

Description

Fix #23274

After the introduction of the decimal batch reader(#22636) in Presto, when trying to read iceberg tables with decimal columns with parquet_batch_read_optimization_enabled=true, we are getting the below error

java.lang.ClassCastException: com.facebook.presto.parquet.batchreader.decoders.rle.Int32RLEDictionaryValuesDecoder cannot be cast to com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$ShortDecimalValuesDecoder
	at com.facebook.presto.parquet.batchreader.ShortDecimalFlatBatchReader.readNextPage(ShortDecimalFlatBatchReader.java:139)
	at com.facebook.presto.parquet.batchreader.ShortDecimalFlatBatchReader.readWithNull(ShortDecimalFlatBatchReader.java:156)
	at com.facebook.presto.parquet.batchreader.ShortDecimalFlatBatchReader.readNext(ShortDecimalFlatBatchReader.java:104)
	at com.facebook.presto.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:389)
	at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:550)
	at com.facebook.presto.parquet.reader.ParquetReader.readBlock(ParquetReader.java:533)
	at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:230)
	at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:208)
	at com.facebook.presto.common.block.LazyBlock.assureLoaded(LazyBlock.java:313)
	at com.facebook.presto.common.block.LazyBlock.getLoadedBlock(LazyBlock.java:304)
	at com.facebook.presto.common.Page.getLoadedPage(Page.java:335)
	at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:269)
	at com.facebook.presto.operator.Driver.processInternal(Driver.java:441)
	at com.facebook.presto.operator.Driver.lambda$processFor$10(Driver.java:324)
	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:750)
	at com.facebook.presto.operator.Driver.processFor(Driver.java:317)
	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:621)
	at com.facebook.presto.$gen.Presto_null__testversion____20240722_195546_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

The issue happens when trying to read dictionary-encoded data.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.


Iceberg Connector Changes
* Fix Iceberg read failing for Decimal type. :pr:`23305`

CC: @tdcmeehan @yingsu00 @imjalpreet

@wypb wypb requested review from a team, ZacBlanco, hantangwangd and shangxinli as code owners July 26, 2024 08:56
@wypb wypb requested a review from presto-oss July 26, 2024 08:56
Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wypb Thank you for the fix.

Can you please force-push once to retrigger the entire suite again? Looks like there were some intermittent issues.

private static final String ORC_TINY_STRIPE_THRESHOLD = "orc_tiny_stripe_threshold";
private static final String ORC_ZSTD_JNI_DECOMPRESSION_ENABLED = "orc_zstd_jni_decompression_enabled";
private static final String PARQUET_BATCH_READ_OPTIMIZATION_ENABLED = "parquet_batch_read_optimization_enabled";
public static final String PARQUET_BATCH_READ_OPTIMIZATION_ENABLED = "parquet_batch_read_optimization_enabled";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this after line 43 and please add @VisibleForTesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add a generic test with this feature enabled in the iceberg connector which also tries reading other datatypes in addition to decimal.

@tdcmeehan tdcmeehan self-assigned this Jul 26, 2024
@wypb wypb force-pushed the fix_iceberg_vector_decimal_reader branch 7 times, most recently from 0cd0b38 to 4c98d81 Compare July 29, 2024 08:59
hantangwangd
hantangwangd previously approved these changes Jul 30, 2024
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. LGTM!

@wypb wypb force-pushed the fix_iceberg_vector_decimal_reader branch from 12714ef to 8c61cc2 Compare July 30, 2024 11:13
Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding the tests.

@tdcmeehan tdcmeehan merged commit 9d38c40 into prestodb:master Jul 30, 2024
@wypb wypb deleted the fix_iceberg_vector_decimal_reader branch July 30, 2024 14:56
@tdcmeehan tdcmeehan mentioned this pull request Aug 23, 2024
34 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg read failing for Decimal type

4 participants