Skip to content

Add support for reading v2 row level deletes in Iceberg connector#21189

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
wypb:iceberg_deletefile
Dec 5, 2023
Merged

Add support for reading v2 row level deletes in Iceberg connector#21189
tdcmeehan merged 1 commit intoprestodb:masterfrom
wypb:iceberg_deletefile

Conversation

@wypb
Copy link
Contributor

@wypb wypb commented Oct 19, 2023

Description

Add support for reading v2 row level deletes in Iceberg connector

Cherry-pick of trinodb/trino#11642
Cherry-pick of trinodb/trino#13219
Cherry-pick of trinodb/trino#13395

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Changes
* Add support for reading v2 row level deletes in Iceberg connector

@wypb wypb requested a review from a team as a code owner October 19, 2023 11:13
@wypb wypb requested a review from presto-oss October 19, 2023 11:13
@wypb wypb force-pushed the iceberg_deletefile branch from e340af4 to 4b3bed6 Compare October 19, 2023 11:18
@yingsu00
Copy link
Contributor

@wypb Thank you for submitting the PR! THere're some test failures in Iceberg. Can you please fix them?

@wypb wypb force-pushed the iceberg_deletefile branch 2 times, most recently from 966d45a to 2fc3cf7 Compare October 20, 2023 01:19
@github-actions
Copy link

github-actions bot commented Oct 20, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff c361c45...f6b2b68.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/connector/iceberg.rst

@wypb wypb force-pushed the iceberg_deletefile branch from 595a2e1 to 588d403 Compare October 20, 2023 04:06
@wypb
Copy link
Contributor Author

wypb commented Oct 20, 2023

Hi @yingsu00 Thank you for your reply. I've fixed these failed test sets.

@wypb wypb force-pushed the iceberg_deletefile branch 3 times, most recently from 5d605fb to 1bc4744 Compare October 20, 2023 07:58
@tdcmeehan tdcmeehan self-assigned this Oct 20, 2023
@tdcmeehan tdcmeehan requested a review from ChunxuTang October 20, 2023 20:15
@yingsu00
Copy link
Contributor

@wypb I helped resolving the conflict but it results in a separate commit "Merge branch 'master' into iceberg_deletefile" Can you please squash it?

@wypb wypb force-pushed the iceberg_deletefile branch 4 times, most recently from f7508d2 to c039bf2 Compare October 23, 2023 01:10
@wypb
Copy link
Contributor Author

wypb commented Oct 23, 2023

HI @yingsu00 I have squashed the this commit and my branch has been synchronized with the latest code of the master branch.

@tdcmeehan tdcmeehan linked an issue Oct 23, 2023 that may be closed by this pull request
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, will finish review after I return on Wednesday.

@wypb wypb force-pushed the iceberg_deletefile branch from 3360bf8 to 719024c Compare October 24, 2023 01:35
@wypb
Copy link
Contributor Author

wypb commented Oct 24, 2023

Hi @tdcmeehan I modified the corresponding code according to the review comments, Thank you for your review

@wypb wypb force-pushed the iceberg_deletefile branch 4 times, most recently from a4ab09b to 45d83d4 Compare October 25, 2023 11:15
@steveburnett
Copy link
Contributor

@wypb I think we should set the default value iceberg.enable-merge-on-read-mode to true in this PR. Thoughts?

If this happens, please update the doc in line 213 of iceberg.rst appropriately.

@wypb
Copy link
Contributor Author

wypb commented Nov 2, 2023

I think we should set the default value iceberg.enable-merge-on-read-mode to true in this PR. Thoughts?

@tdcmeehan I have read the Iceberg document, what you say is right, I will set the default value of iceberg.enable-merge-on-read-mode to true. thank you.

@wypb wypb force-pushed the iceberg_deletefile branch from 1c22141 to 48471fa Compare November 2, 2023 06:19
@wypb
Copy link
Contributor Author

wypb commented Nov 2, 2023

Hi @steveburnett I have updated the corresponding documents, please help me take a look again, thank you.

@wypb wypb force-pushed the iceberg_deletefile branch from 79c8fe6 to 81b6f21 Compare November 2, 2023 08:24
@wypb
Copy link
Contributor Author

wypb commented Nov 2, 2023

HI @tdcmeehan I synchronized the latest code of the master branch, and then several test cases related to delete file failed.

java.lang.NoSuchMethodError: org.apache.parquet.hadoop.ParquetFileWriter.<init>(Lorg/apache/parquet/io/OutputFile;Lorg/apache/parquet/schema/MessageType;Lorg/apache/parquet/hadoop/ParquetFileWriter$Mode;JIIIZLorg/apache/parquet/crypto/InternalFileEncryptor;)V

	at org.apache.iceberg.parquet.ParquetWriter.ensureWriterInitialized(ParquetWriter.java:111)
	at org.apache.iceberg.parquet.ParquetWriter.flushRowGroup(ParquetWriter.java:210)
	at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:254)
	at org.apache.iceberg.deletes.EqualityDeleteWriter.close(EqualityDeleteWriter.java:78)
	at com.facebook.presto.iceberg.IcebergDistributedTestBase.writeEqualityDeleteToNationTable(IcebergDistributedTestBase.java:583)
	at com.facebook.presto.iceberg.IcebergDistributedTestBase.testTableWithEqualityDelete(IcebergDistributedTestBase.java:466)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)
	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
	at org.testng.TestRunner.privateRun(TestRunner.java:808)
	at org.testng.TestRunner.run(TestRunner.java:603)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:429)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:423)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:383)
	at org.testng.SuiteRunner.run(SuiteRunner.java:326)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:95)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1249)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1169)
	at org.testng.TestNG.runSuites(TestNG.java:1092)
	at org.testng.TestNG.run(TestNG.java:1060)
	at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66)
	at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:109)

It was because Iceberg was recently upgraded to 1.4.1 (#21230), and the EqualityDeleteWriter and PositionDeleteWriter rely on the new ParquetFileWriter method added in parquet 1.13.1, but the current parquet dependency version of Presto is 1.12.0, So I need to upgrade the Parquet version of presto and presto-hive-apache projects. Any Thoughts?

@tdcmeehan
Copy link
Contributor

@wypb let's update the Parquet dependency in Presto and presto-hive-apache. The latter release will take some time, but I think we'll be able to do it in the next few days.
Let me know if you'd like to raise the PR for those. @imjalpreet is following up on enabling a presto-hive-apache release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update IcebergDistributedSmokeTestBase::testMergeOnReadDisabled to set the session property.

Copy link
Contributor Author

@wypb wypb Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will fix this. thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you!

@imjalpreet
Copy link
Member

@tdcmeehan I just saw for this we just have a dependency on presto-hive-apache which is not blocked and can be released after we update the parquet dependency.

Release of presto-hadoop-apache2 has been blocked which I will be unblocking soon. But this PR is not dependent on that.

@tdcmeehan
Copy link
Contributor

@imjalpreet you're right.

To clear things up: let's try to directly make the updates to the Parquet dependency. Myself and @wanglinsong can cut the release for presto-hive-apache. @wypb , please let me know if you would like to raise the PRs bumping the dependency.

@wypb
Copy link
Contributor Author

wypb commented Nov 2, 2023

@tdcmeehan I will submit a new PR to upgrade the version of parquet to 1.13.1.

BTW, This PR should rely on presto-hive-apache to upgrade Parquet to 1.13.x, because the org.apache.parquet.hadoop.ParquetFileWriter class is in parquet-hadoop module, and Presto uses this class through presto-hive-apache, which itself does not directly depend on parquet-hadoop. Therefore, we need to upgrade the Parquet dependencies of Presto and presto-hive-apache at the same time before merging this PR. @imjalpreet @tdcmeehan

@steveburnett
Copy link
Contributor

Hi @steveburnett I have updated the corresponding documents, please help me take a look again, thank you.

Hi @wypb, I pulled the updated branch, made a local build of the docs, and everything in the docs looks great. Thank you!

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the code again and it looks good to me. @wypb Thanks for your updates!
We probably could merge this PR after the version issue is resolved.

@wypb wypb mentioned this pull request Nov 3, 2023
6 tasks
@wypb wypb force-pushed the iceberg_deletefile branch from 878a729 to 0852c49 Compare November 3, 2023 01:48
@tdcmeehan
Copy link
Contributor

@wypb will you be creating the PR to bump the dependency in presto-hive-apache?

@wypb
Copy link
Contributor Author

wypb commented Nov 9, 2023

@tdcmeehan I've already created one(prestodb/presto-hive-apache#61), please help me review this, thank you.

@wypb wypb mentioned this pull request Nov 21, 2023
6 tasks
@wypb wypb force-pushed the iceberg_deletefile branch 2 times, most recently from 751ce06 to 71ac549 Compare December 5, 2023 08:00
@wypb wypb force-pushed the iceberg_deletefile branch from 860998f to f6b2b68 Compare December 5, 2023 09:02
@wypb
Copy link
Contributor Author

wypb commented Dec 5, 2023

HI @tdcmeehan I have synchronized the latest code of the master branch, and now the test cases can be run. Could you help me take a look again? thank you.

this(recordReader, orcDataSource, columns, typeManager, systemMemoryContext, stats, runtimeStats, nCopies(columns.size(), false));
}

public OrcBatchPageSource(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create an issue to add whatever is necessary for the corresponding selective page source?

@tdcmeehan
Copy link
Contributor

Thank you for following through this long PR and chasing all the loose ends @wypb!

@tdcmeehan tdcmeehan merged commit 61c7980 into prestodb:master Dec 5, 2023
@wypb wypb deleted the iceberg_deletefile branch December 6, 2023 00:50
@wanglinsong wanglinsong mentioned this pull request Feb 12, 2024
64 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

iceberg Apache Iceberg related

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Iceberg MoR support - reads

8 participants