Skip to content

Upgrade Delta Standalone to 3.0.0#20256

Closed
dnskr wants to merge 1 commit intoprestodb:masterfrom
dnskr:upgrade-delta-lake-to-3.0.0
Closed

Upgrade Delta Standalone to 3.0.0#20256
dnskr wants to merge 1 commit intoprestodb:masterfrom
dnskr:upgrade-delta-lake-to-3.0.0

Conversation

@dnskr
Copy link
Contributor

@dnskr dnskr commented Jul 9, 2023

This PR upgrades Delta Standalone dependency to 3.0.0 for Delta Lake Connector.

Test plan - Existing tests

== RELEASE NOTES ==

Delta Lake Changes
* Upgrade Delta Standalone to 3.0.0

@dnskr dnskr requested a review from a team as a code owner July 9, 2023 14:41
@dnskr dnskr requested a review from presto-oss July 9, 2023 14:41
@dnskr dnskr force-pushed the upgrade-delta-lake-to-3.0.0 branch 2 times, most recently from 80cf66f to 4fc98f2 Compare July 9, 2023 20:00
@dnskr dnskr marked this pull request as draft July 9, 2023 21:11
@tdcmeehan tdcmeehan self-assigned this Jul 31, 2023
@agrawalreetika
Copy link
Member

agrawalreetika commented Aug 16, 2023

Hi @dnskr,
Wanted to confirm does this upgrade also fix _last_checkpoint file check-related exception which was mentioned here - #18898

@dnskr
Copy link
Contributor Author

dnskr commented Aug 16, 2023

Hi @dnskr, Wanted to confirm does this upgrade also fix _last_checkpoint file check-related exception which was mentioned here - #18898

I don't think so. I'm not sure I can check it, because unfortunately I'm quite limited in Presto development and debugging by Windows machine I use.

@dnskr dnskr force-pushed the upgrade-delta-lake-to-3.0.0 branch from 4fc98f2 to 5786fc7 Compare December 6, 2023 22:23
@dnskr dnskr marked this pull request as ready for review December 6, 2023 23:24
@tdcmeehan
Copy link
Contributor

@dnskr have you done any end to end tests on this upgrade?

@agrawalreetika
Copy link
Member

agrawalreetika commented Dec 8, 2023

Hi @dnskr, Wanted to confirm does this upgrade also fix _last_checkpoint file check-related exception which was mentioned here - #18898

@dnskr, I tried query on this upgrade. Looks like _last_checkpoint file check exception is not handled here as well. Just curious, do we know what Delta reader side improvement would get on Presto side with this upgrade?

@dnskr
Copy link
Contributor Author

dnskr commented Dec 9, 2023

@tdcmeehan Unfortunately, I cannot build Presto or run tests locally, because I use Windows. So, I rely on passing CI/CD pipelines and existing delta tests to check that the upgrade is fine.

@agrawalreetika There is at least one performance improvement in Delta Standalone according to release notes:

The key features in this release are:

  • Support for disabling Delta checkpointing during commits - For very large tables with millions of files, performing Delta checkpoints can become an expensive overhead during writes. Users can now disable this checkpointing by setting the hadoop configuration property io.delta.standalone.checkpointing.enabled to false. This is only safe and suggested to do if another job will periodically perform the checkpointing.
  • Performance improvement to snapshot initialization - When a delta table is loaded at a particular version, the snapshot must contain, at a minimum, the latest protocol and metadata. This PR improves the snapshot load performance for repeated table changes.
  • Support adding absolute paths to the Delta log - This now enables users to manually perform SHALLOW CLONEs and create Delta tables with external files.
  • Fix in schema evolution to prevent adding non-nullable columns to existing Delta tables

@agrawalreetika @tdcmeehan I would like to be transparent about the amount of effort from my side for the changes.
I upgraded Delta Standalone just to keep the dependency up to date, without expecting any significant improvements.
During the preliminary upgrade to 3.0.0rc1 I found and reported a minor bug, which is fixed in 3.0.0.
I would be happy if someone could take the opportunity to manage Delta Lake upgrades in more careful way.

BTW, as far as I know, next Delta Standalone 3.1.0 release will be based on Delta Kernel, which might require more effort for upgrade, but also can provide more capabilities for Delta Lake connector in Presto.

@dnskr dnskr changed the title [WIP] Upgrade Delta Standalone to 3.0.0 Upgrade Delta Standalone to 3.0.0 Dec 12, 2023
@dnskr dnskr force-pushed the upgrade-delta-lake-to-3.0.0 branch from 5786fc7 to 33e13c3 Compare January 9, 2024 20:43
Signed-off-by: dnskr <dnskrv88@gmail.com>
@dnskr dnskr force-pushed the upgrade-delta-lake-to-3.0.0 branch from 33e13c3 to 4e9da59 Compare February 3, 2024 12:38
@dnskr
Copy link
Contributor Author

dnskr commented Jul 31, 2024

Close in favor of #22596

@dnskr dnskr closed this Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants