Skip to content

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented Dec 5, 2023

Change Logs

This PR adds a Hudi-native HFile reader implementation independent of HBase.

Motivation

Hudi uses HFile format as the base file format for storing various metadata, e.g., file listing, column stats, and bloom filters, in the metadata table (MDT), as HFile format is optimized for range scans and point lookups. HFile format is originally designed and implemented by HBase. Historically, Hudi is tightly coupled with Hadoop ecosystem. Even now, popular engines like Spark, Flink, and platforms like EMR still rely on Hadoop dependencies for various functionality. So Hudi has chosen to directly use the HFile reader implementation provided by HBase.

This approach has a couple of problems:

  1. The required HBase dependencies make the bundle jar heavy and cause dependency conflict, compatibility issues, and interference with actual HBase index usage in some environments (see for example [SUPPORT] Compatible with multiple HBASE version or hbase: 2.1.0-cdh6.3.2 #5372 [SUPPORT] HBase connection closed exception #6509 [SUPPORT] Error "Could not create interface org.apache.hudi.org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactory Is the hadoop compatibility jar on the classpath?" While Deleting Data #7899). HBase does not provide an independent module for HFile reader and writer, thus we have to add many HBase dependencies and custom shading rules.
  2. HBase is tightly coupled with Hadoop ecosystem, and cannot be used by a query engine that is Hadoop independent, such as Trino, which has its own TrinoFileSystem abstraction independent of Hadoop. Thus, Trino Hudi connector cannot read HFiles and the metadata table based on the current implementation.

To address these two problems, one way is to port the relevant code of the HFile reader from HBase to the Hudi repo and maintain it. We have attempted this in #4695, but the LoC is more than 60K, which is large. The main problem is that there is a lot of code irrelevant to HFile that is included but not easy to trim due to the coupling with HBase functionality.

This led us to a better first step: to implement a Hudi-native HFile reader from scratch based on what Hudi really uses and make it independent of HBase library. This PR serves the purpose.

Approach

We follow the HFile format version 3 and only focus on what Hudi uses on the reader side. We summarize a simplified HFile format for Hudi in hudi-io/hfile_format.md. Our Hudi-native HFile reader follows this format specification. This means that the Hudi-native HFile reader can read the HFiles written by an HBase HFile writer used by Hudi without any problem.

The Hudi-native HFile reader provides almost the same semantics as the HBase HFile reader and scanner:

  • All the HBase HFile reader APIs used by Hudi have similar or counterpart API in the Hudi-native HFile reader, so the upper layer can easily integrate the new HFile reader;
  • At the core, the Hudi-native HFile reader provides the seek API int seekTo(Key key), which enables multiple seeks with sorted keys, and each seek operation should move the cursor to the HFile inside the reader when necessary. This is essential for point lookups, range scan, and prefix lookup.

Implementation

This PR introduces a new hudi-io module for I/O related functionality, and the Hudi-native HFile reader implementation sits inside the new module.

A new interface HFileReader is defined to support reading HFile with seeks. Here are the APIs:

  • int seekTo(Key key): seek to or just before the passed lookup key. The return code indicates whether the key is found and what the current cursor points to:
    • -1: when the lookup key is less than the first key of the file. The cursor points to the first key of the file.
    • 0: when the lookup key is found in the file. The cursor points to the matched key in the file.
    • 1: when the lookup key is not found, but it's in the range of the file. The cursor points to the greatest key that is less than the lookup key.
    • 2: when the lookup key is greater than the last key of the file, EOF is reached. The cursor points to EOF.
  • boolean seekTo(): positions the cursor of this reader at the start of the file.
  • boolean next(): move the cursor to the next entry in the file.
  • Option<KeyValue> getKeyValue(): the key-value pair at the current cursor or position.
  • boolean isSeeked(): whether one of the seek calls is invoked.
  • Option<byte[]> getMetaInfo(UTF8StringKey key): gets info entry from file info block of a HFile.
  • Option<ByteBuffer> getMetaBlock(String metaBlockName): gets the content of a meta block from HFile.
  • long getNumKeyValueEntries(): total number of key-value entries in the HFile.
  • void initializeMetadata(): initializes metadata based on a HFile before other read operations.

HFileReaderImpl is an implementation of HFileReader. Here are some important classes created to support the HFileReaderImpl:

  • BlockIndexEntry: represents the index entry of a data (or meta) block in the Data (or Meta) Index stored in the ROOT_INDEX block.
  • HFileBlock: an abstract class representing a block in a HFile. It is extended by HFileDataBlock, HFileMetaBlock, HFileRootIndexBlock, and HFileFileInfoBlock for different block types.
  • HFileBlockReader: reads and parses one or more HFile blocks based on the start and end offsets.
  • HFileCursor: stores the current position and key-value pair at the position in the HFile. The same instance is used as a position cursor during HFile reading.
  • HFileTrailer: represents a HFile trailer, read first to understand the structure of the HFile.
  • KeyValue: represents a key-value pair in the data block.
  • Key: represents the key part only.
  • UTF8StringKey: represents a UTF8 String key only, with no length information encoded. This is used by lookup key.

The high-level algorithm of seekTo based on a lookup key is:

  1. if the lookup key is beyond the range of the current data block (greater or equal to the first key of next block if present), a binary search of the key in the data block index is done to figure out the right data block to read, then move on to <2.>;
  2. if the lookup key is within the range of the current data block, based on the data block index, sequential read the key-value pairs until a matched key is found or the first time the key in the HFile is greater than the lookup key.

Same as HBase HFile scanner, seekTo(key) does not support backward seek and the Hudi-native HFile reader throws an exception in this case. Before doing a backward seek, the caller has to call seekTo() again to reposition the cursor to the beginning of the file.

Testing

Comprehensive unit tests have been added to TestHFileReader to test the new Hudi-native HFile reader. The integration of the Hudi-native HFile reader done by #10330 will also test the HFile reader functionality end-to-end.

Impact

Removes dependency on HBase to read HFiles, which makes it much easier for engine integration, along other benefits mentioned above.

Risk level

low

Documentation Update

We have documented the HFile format. We will update the RFC accordingly.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@yihua yihua force-pushed the HUDI-7170-new-hfile-reader branch 2 times, most recently from 8550f66 to d132592 Compare December 5, 2023 02:45
@yihua yihua force-pushed the HUDI-7170-new-hfile-reader branch 2 times, most recently from 596ec9f to c6833ec Compare December 7, 2023 22:56
Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Left some comments

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this a little more complete, so we don't need changes for other use-cases down the line (for e.g colstats, RLI, ....)

@yihua yihua force-pushed the HUDI-7170-new-hfile-reader branch from 3636b93 to 9adc156 Compare January 10, 2024 17:50
@yihua yihua changed the title [HUDI-7170][WIP] Implement HFile reader independent of HBase [HUDI-7170] Implement HFile reader independent of HBase Jan 11, 2024
@yihua yihua marked this pull request as ready for review January 11, 2024 03:30
@yihua yihua force-pushed the HUDI-7170-new-hfile-reader branch from 3934af1 to c5918ee Compare January 17, 2024 09:07
@yihua
Copy link
Contributor Author

yihua commented Jan 17, 2024

The README is updated with HFile format description.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks in good shape. Can you clarify some of my questions on the code.

@apache apache deleted a comment from hudi-bot Jan 18, 2024
@apache apache deleted a comment from hudi-bot Jan 18, 2024
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua
Copy link
Contributor Author

yihua commented Jan 18, 2024

Azure CI is green.
Screenshot 2024-01-18 at 10 08 06

@yihua yihua merged commit 48ce342 into apache:master Jan 18, 2024
@pan3793
Copy link
Member

pan3793 commented Feb 22, 2024

@yihua the question may be navie, is a native writer required to cut out the HBase deps eventually?

yihua added a commit that referenced this pull request Feb 27, 2024
This commit adds a Hudi-native HFile reader implementation independent of HBase.
@yihua
Copy link
Contributor Author

yihua commented Mar 9, 2024

@yihua the question may be navie, is a native writer required to cut out the HBase deps eventually?

Sorry for the late reply. Yes, we'll also implement native HFile writer. Eventually we'll remove HBase dependencies and use the native HFile reader and writer. HBase index still requires HBase dependencies but that’s going to be optional and not included in the bundle jar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

4 participants