Skip to content

Fix id based mapping for non-lowercase fields#7180

Merged
phd3 merged 3 commits intotrinodb:masterfrom
phd3:case-fix
Mar 6, 2021
Merged

Fix id based mapping for non-lowercase fields#7180
phd3 merged 3 commits intotrinodb:masterfrom
phd3:case-fix

Conversation

@phd3
Copy link
Copy Markdown
Member

@phd3 phd3 commented Mar 5, 2021

Fixes the issue discussed in Slack thread. The change in #6520 causes nested column reads on ORC files to fail when they have non-lowercase field names.

io.trino.spi.TrinoException: Error opening Iceberg split s3a://REDACTED/data/timestamp_day=2021-01-21/00002-17-5a427ab9-81ab-442b-ad46-512af01a5019-00001.orc (offset=0, length=1479520): null
	at io.trino.plugin.iceberg.IcebergPageSourceProvider.createOrcPageSource(IcebergPageSourceProvider.java:354)
	at io.trino.plugin.iceberg.IcebergPageSourceProvider.createDataPageSource(IcebergPageSourceProvider.java:205)
	at io.trino.plugin.iceberg.IcebergPageSourceProvider.createPageSource(IcebergPageSourceProvider.java:167)
	at io.trino.spi.connector.ConnectorPageSourceProvider.createPageSource(ConnectorPageSourceProvider.java:68)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:66)
	at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:64)
	at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:254)
	at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:182)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
	at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
	at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
	at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:149)
	at io.trino.operator.Driver.processInternal(Driver.java:387)
	at io.trino.operator.Driver.lambda$processFor$9(Driver.java:291)
	at io.trino.operator.Driver.tryWithLock(Driver.java:683)
	at io.trino.operator.Driver.processFor(Driver.java:284)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.trino.$gen.Trino_352_214_gef3079c____20210304_212845_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.NullPointerException: undefined
	at io.trino.plugin.iceberg.IcebergPageSourceProvider$IdBasedFieldMapper.get(IcebergPageSourceProvider.java:418)
	at io.trino.orc.reader.StructColumnReader.<init>(StructColumnReader.java:104)
	at io.trino.orc.reader.ColumnReaders.createColumnReader(ColumnReaders.java:75)
	at io.trino.orc.OrcRecordReader.createColumnReaders(OrcRecordReader.java:579)
	at io.trino.orc.OrcRecordReader.<init>(OrcRecordReader.java:244)
	at io.trino.orc.OrcReader.createRecordReader(OrcReader.java:308)
	at io.trino.plugin.iceberg.IcebergPageSourceProvider.createOrcPageSource(IcebergPageSourceProvider.java:315)
	... 31 more

The added test here fails without this change with a similar stacktrace.

@cla-bot cla-bot bot added the cla-signed label Mar 5, 2021
@phd3 phd3 added the bug label Mar 5, 2021
@phd3 phd3 requested review from djsagain, electrum and lxynov March 5, 2021 17:38
@phd3 phd3 mentioned this pull request Mar 6, 2021
10 tasks
@phd3 phd3 merged commit 36a6a59 into trinodb:master Mar 6, 2021
@phd3 phd3 added this to the 353 milestone Mar 6, 2021
Comment on lines +3 to +4
# TODO: Remove this config to test default read behavior once Spark writer version is fixed. See https://github.com/trinodb/trino/issues/6369 for details
iceberg.use-file-size-from-metadata=false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed now?
if Spark Iceberg still writes incorrect file sizes, we should reconsider whether iceberg.use-file-size-from-metadata shouldn't be false by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants