Skip to content

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented Dec 14, 2023

Change Logs

This PR integrates the new native HFile reader with file reader factory.

Major changes are:

  • Moves common logic of reading Avro records from HFile from HoodieAvroHFileReader to BaseHoodieAvroHFileReader (abstract class). HoodieAvroHFileReader now contains the new implementation based on the Hudi-native HFile reader. HoodieAvroHBaseHFileReader (code moved from previous HoodieAvroHFileReader class) is the implementation based on the HBase HFile reader.
  • Adds a new implementation of BootstrapIndex.IndexReader in HFileBootstrapIndex using the Hudi-native HFile reader.
  • A new config hoodie.hfile.use.built.in.reader (HoodieReaderConfig. USE_BUILT_IN_HFILE_READER) to control whether the built-in HFile reader is used to read HFiles. By default, it's set to true.
  • HoodieFileReaderFactory and subclasses are refactored to allow instantiating the Avro HFile reader through either native-based or HBase-based HFile reader, by checking the Hudi config.
  • Fixes packaging and shading by adding com.google.protobuf:protobuf-java and the corresponding shading rule.
  • Refactors production code to use HoodieFileReaderFactory to get the reader for reading HFile, instead of directly instantiating HoodieAvroHFileReader.
  • Adds utils and bloom filter constructors around byte buffer, to be easily used by HoodieAvroHFileReader with the Hudi-native HFile reader.
  • Removes direct usage of implementation classes of HFile readers and uses the file reader factory instead to get the HFile reader (BaseHoodieAvroHFileReader)
  • TestInLineFileSystemHFileInLining tests are refactored to run on both native-based and HBase-based HFile readers.
  • TestHoodieHFileReaderWriter tests are refactored to run on both native-based and HBase-based HFile readers.

Impact

Uses new HFile reader.

Risk level

medium

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@yihua yihua changed the title [HUDI-7218][WIP] Integrate new HFile reader with file reader factory [HUDI-7218][Stacked on HUDI-7170][WIP] Integrate new HFile reader with file reader factory Dec 14, 2023
@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch 4 times, most recently from 57dc2f4 to 67b0d70 Compare December 15, 2023 22:01
@yihua yihua marked this pull request as ready for review December 15, 2023 22:16
@yihua
Copy link
Contributor Author

yihua commented Dec 15, 2023

This PR is still WIP. Sending the PR to Azure CI to run tests.

@yihua
Copy link
Contributor Author

yihua commented Dec 15, 2023

@hudi-bot run azure

@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch from 67b0d70 to 7591e10 Compare December 18, 2023 16:17
@yihua yihua changed the title [HUDI-7218][Stacked on HUDI-7170][WIP] Integrate new HFile reader with file reader factory [HUDI-7218][Stacked on HUDI-7170] Integrate new HFile reader with file reader factory Dec 18, 2023
@yihua
Copy link
Contributor Author

yihua commented Dec 18, 2023

@hudi-bot run azure

@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch 3 times, most recently from 6622e92 to f18e571 Compare December 19, 2023 14:34
@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch 7 times, most recently from 3bc5c38 to be59e31 Compare January 18, 2024 03:00
@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch from 421f1cc to 9c93ac1 Compare January 18, 2024 19:47
@yihua yihua changed the title [HUDI-7218][Stacked on HUDI-7170] Integrate new HFile reader with file reader factory [HUDI-7218] Integrate new HFile reader with file reader factory Jan 18, 2024
@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch 5 times, most recently from 0f13cf9 to 4173eda Compare January 18, 2024 22:49
@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch 4 times, most recently from 7462f45 to 3dfa478 Compare January 19, 2024 07:20
@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch from 3dfa478 to 30f633f Compare January 19, 2024 08:38
Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a skim. No major red flags. Land and proceed if this is blocking you

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

half way through my review

@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch from 30f633f to e0625ec Compare January 29, 2024 06:43
@yihua yihua force-pushed the HUDI-7218-new-hfile-reader-integration branch from e0625ec to 49e5fa7 Compare January 29, 2024 06:47
@apache apache deleted a comment from hudi-bot Jan 29, 2024
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants