Skip to content

Conversation

@szehon-ho
Copy link
Member

@szehon-ho szehon-ho commented Mar 9, 2022

The presence of some V2 delete files lead to following error for a subsequent delete that tries to use metadata.

Failed to cleanly delete data files matching: ref(name="id") == 1
java.lang.IllegalArgumentException: Failed to cleanly delete data files matching: ref(name="id") == 1
	at org.apache.iceberg.spark.source.SparkTable.deleteWhere(SparkTable.java:282)
	at org.apache.spark.sql.execution.datasources.v2.DeleteFromTableExec.run(DeleteFromTableExec.scala:31)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at org.apache.iceberg.spark.SparkTestBase.sql(SparkTestBase.java:104)
	at org.apache.iceberg.spark.extensions.TestDelete.testDeleteFileThenMetadataDelete(TestDelete.java:93)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:27)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
	at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:133)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot delete file where some, but not all, rows match filter ref(name="id") == 1: file:/var/folders/wy/5b87_qx57n974szn9_wrn6lw0000gn/T/hive3482518248062248112/table/data/00003-162-f1d8c967-189a-40f7-9507-1e82267b1e22-00001.orc
	at app//org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:46)
	at app//org.apache.iceberg.ManifestFilterManager.manifestHasDeletedFiles(ManifestFilterManager.java:369)
	at app//org.apache.iceberg.ManifestFilterManager.filterManifest(ManifestFilterManager.java:307)
	at app//org.apache.iceberg.ManifestFilterManager.lambda$filterManifests$0(ManifestFilterManager.java:191)
	at app//org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
	at app//org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:70)
	at app//org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:310)
	at [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at [email protected]/java.lang.Thread.run(Thread.java:834)

Specifically, these are delete files that cannot be removed by the file-metrics alone, such as if all values of the column of the delete statement's expression are null (as the test demonstrates).

This because there is a mismatch. The optimizer chooses metadata-only delete if all DataFiles can be deleted using metadata, and does not check if all DeleteFiles can be removed only by metadata: https://github.com/apache/iceberg/blob/apache-iceberg-0.13.1/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L241

But the actual metadata-delete logic in ManifestFIlterManager tries to delete even DeleteFiles, and it throws ValidationException when it finds a DeleteFile that cannot be cleanlly deleted because the metrics-bound do not match or are not available.

This change attempts to reconcile these two, by having the ManifestFilterManager skip DeleteFiles that do not have tight enough metrics to be safely removed. These will get cleaned up eventually by sequenceNumber expiration, or later if we build another action to more aggressively drop dangling DeleteFiles (there are already instances today where they are left behind, beyond this).

The other options I considered:

  1. Make the optimization decision check also delete files before deciding to try metadata-only delete
  2. Make the metadata-only deletion implementation skip removal of all DeleteFiles

These may be cleaner, but the first one will be a huge performance hit as it means we skip the metadata-only delete unnecessarily if such a delete file exists.

The second is also too conservative, as 90% of the time the delete files can be removed opportunistically by metadata-only deletes (certain tests actually asserting that they are removed, so it would be an unexpected change).

Hence this more specific check.

…only delete with presence of certain delete files
@szehon-ho szehon-ho force-pushed the delete_metadata_bug branch from b401647 to b213236 Compare March 9, 2022 23:35
@szehon-ho szehon-ho force-pushed the delete_metadata_bug branch from c07770e to c82bae4 Compare March 14, 2022 19:29
@szehon-ho
Copy link
Member Author

@RussellSpitzer thanks for review, unnested the new case (though it seems still breaks the "cyclomatic complexity" sadly). Also added the new test asserts for metadata delete.

@RussellSpitzer
Copy link
Member

@RussellSpitzer thanks for review, unnested the new case (though it seems still breaks the "cyclomatic complexity" sadly). Also added the new test asserts for metadata delete.

Yeah to fix that the code needs less "exits" I just thought it reads easier

@szehon-ho szehon-ho changed the title ValidationException on metadata-only delete with presence of certain delete files Core: ValidationException on metadata-only delete with certain delete files Mar 14, 2022
@szehon-ho szehon-ho changed the title Core: ValidationException on metadata-only delete with certain delete files Core: ValidationException on metadata-only delete of certain delete files Mar 14, 2022
@szehon-ho szehon-ho merged commit b64aa0d into apache:master Mar 15, 2022
@szehon-ho
Copy link
Member Author

Thanks @RussellSpitzer for review, might look at reducing the method's cyclomatic complexity in a follow up if its possible

@szehon-ho
Copy link
Member Author

Note: this fix does not fix all the issues without #4370

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants