Skip to content

Conversation

@chenjian2664
Copy link
Contributor

@chenjian2664 chenjian2664 commented Jul 8, 2025

Description

Currently, Delta Lake connector writes malformed checkpoint file when the deletion vector is enabled.

Writing checkpoint with error message:

java.lang.IllegalArgumentException: Expected sizeInBytes for field 4 but got Optional[cardinality] for type row(storageType varchar, pathOrInlineDv varchar, offset integer, sizeInBytes integer, cardinality bigint)
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:483)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter.validateAndGetField(CheckpointWriter.java:640)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter.writeLong(CheckpointWriter.java:571)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter.lambda$writeDeletionVector$0(CheckpointWriter.java:451)
	at io.trino.spi.block.RowBlockBuilder.buildEntry(RowBlockBuilder.java:111)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter.writeDeletionVector(CheckpointWriter.java:447)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter.lambda$writeAddFileEntry$0(CheckpointWriter.java:334)
	at io.trino.spi.block.RowBlockBuilder.buildEntry(RowBlockBuilder.java:111)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter.writeAddFileEntry(CheckpointWriter.java:299)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter.lambda$write$4(CheckpointWriter.java:169)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter$CheckpointPageWriter.addEntry(CheckpointWriter.java:199)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter.write(CheckpointWriter.java:169)
	at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriterManager.writeCheckpoint(CheckpointWriterManager.java:170)
	at io.trino.plugin.deltalake.DeltaLakeMetadata.writeCheckpointIfNeeded(DeltaLakeMetadata.java:3178)
	at io.trino.plugin.deltalake.DeltaLakeMetadata.finishMerge(DeltaLakeMetadata.java:2693)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.finishMerge(ClassLoaderSafeConnectorMetadata.java:1222)
	at io.trino.tracing.TracingConnectorMetadata.finishMerge(TracingConnectorMetadata.java:770)
	at io.trino.metadata.MetadataManager.finishMerge(MetadataManager.java:1336)
	at io.trino.tracing.TracingMetadata.finishMerge(TracingMetadata.java:830)
	at io.trino.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$0(LocalExecutionPlanner.java:4226)
	at io.trino.operator.TableFinishOperator.getOutput(TableFinishOperator.java:316)
	at io.trino.operator.Driver.processInternal(Driver.java:403)
	at io.trino.operator.Driver.lambda$process$0(Driver.java:306)
	at io.trino.operator.Driver.tryWithLock(Driver.java:709)
	at io.trino.operator.Driver.process(Driver.java:298)
	at io.trino.operator.Driver.processForDuration(Driver.java:269)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:889)
	at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:201)
	at io.trino.$gen.Trino_testversion____20250708_034624_1.run(Unknown Source)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:202)
	at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:177)
	at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:164)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:545)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:128)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:80)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
	at java.base/java.lang.Thread.run(Thread.java:1447)
	Suppressed: java.lang.IllegalStateException: Declared positions (3) does not match block 0's number of entries (2)
		at io.trino.spi.PageBuilder.build(PageBuilder.java:161)
		at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter$CheckpointPageWriter.flush(CheckpointWriter.java:209)
		at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter$CheckpointPageWriter.close(CheckpointWriter.java:218)
		at io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointWriter.write(CheckpointWriter.java:158)

Steps to reproduce:

CREATE TABLE t (x int) WITH (deletion_vectors_enabled = true, checkpoint_interval = 2);
INSERT INTO t VALUES 1, 2;

-- write the checkpoint, you will see the error in the connector, but it doesn't fails the query

DELETE FROM t WHERE x = 2; 

Additional context and related issues

Release notes

## Delta Lake
* Fix writing malformed checkpoint files when deletion vector is enabled. ({issue}`26145`)

@cla-bot cla-bot bot added the cla-signed label Jul 8, 2025
@github-actions github-actions bot added the delta-lake Delta Lake connector label Jul 8, 2025
@chenjian2664 chenjian2664 requested a review from ebyhr July 8, 2025 03:53
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for comments. Could you fix CI failures?

Error:  src/test/java/io/trino/plugin/deltalake/TestDeltaLakeBasic.java:[89,8] (imports) UnusedImports: Unused import - java.util.function.Predicate.

@chenjian2664 chenjian2664 force-pushed the fix_checkpoint_deletion_vector branch from 4e2cfb8 to bf3b4ae Compare July 8, 2025 04:19
@ebyhr ebyhr merged commit 4bd071f into trinodb:master Jul 8, 2025
23 checks passed
@github-actions github-actions bot added this to the 477 milestone Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector

Development

Successfully merging this pull request may close these issues.

2 participants