Skip to content

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented Mar 10, 2022

What is the purpose of the pull request

This PR upgrades HBase version from 1.2.3 to 2.4.9, addresses issues of Hudi HFile usage due to HBase API changes, and shade HBase-related dependencies in Hudi bundles.

Brief change log

  • Upgrades HBase version from 1.2.3 to 2.4.9 and Hadoop from 2.7.3 to 2.10.1, which is the minimum Hadoop version required based on HBase documentation. HBase 2.x is not compatible with Hadoop 2.7 due to usage of hew Hadoop APIs from 2.8.
  • Extends HoodieHBaseKVComparator from CellComparatorImpl which is a CellComparator implementation, instead of KeyValue.KVComparator since it is deprecated in 2.x.
  • Changes HFile reader and writer factory methods and how CellComparator is configured.
    • Adds HoodieHFileUtils to construct HFile.Reader in different ways.
    • For HoodieHFileReader constructor with byte array, the file system and dummy path are passed in. The dummy path can be either the log file path or base path of the table, which is not actually used for reading content, only serving the purpose of reference.
  • Addresses HBase 2.x conflict with HBase 1.x in Hive 2: In Hudi, we newly include hudi-common/src/main/resources/hbase-site.xml containing HBase default configs from the hbase-common 2.4.9 we use, to override the default configs loaded from hbase-default.xml from an older HBase version on the classpath and to ensure correct default configs for Hudi HBase usage. In Hive, the Hive server loads all lib jars including HBase jars with its corresponding hbase-default.xml into class path (e.g., HBase 1.1.1), and that can cause conflict with the hbase-default.xml in Hudi bundles (HBase 2.4.9). The exception is thrown as follows: Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be for an older version of HBase (1.1.1), this version is 2.4.9. Relevant logic causing such exception can be found in HBaseConfiguration::addHbaseResources(). To get around this issue, since HBase loads hbase-site.xml after hbase-default.xml, we provide hbase-site.xml from the bundle so that HBaseConfiguration can pick it up and ensure the right defaults.
  • Makes changes to configuration in maven-shade-plugin for packaging hudi bundles
    • Adds necessary includes for HBase 2.x related dependencies
    • Adds ServicesResourceTransformer for shading services resource
    • Adds relocation for org.apache.commons.io., org.apache.hadoop.hbase., org.apache.hbase., and org.apache.htrace.
    • Adds relocation for classes from hbase-hadoop2-compat and hbase-hadoop-compat. There are classes from these two artifacts which implement org.apache.hadoop.metrics2 classes in hadoop-common. We cannot shade all classes in org.apache.hadoop.metrics2 which would cause NoClassDef of hadoop classes. So we have to pick and choose here for HBase-specific ones.
    • Excludes **/*.proto and hbase-webapps/** from bundles
    • Remove unnecessary dependencies of HBase which are already introduced from hudi-common in the bundle poms
  • Note that: hbase-shaded-server is discontinued and hbase-shaded is maintained afterwards, which bundles all HBase related libs, most of which are not needed by us. So the original hbase-* dependencies are used and we control the shading instead.
  • Backward compatibility (Hudi 0.11 reader reading tables written by previous releases). There are two places which read and write HFile, the HFileBootstrapIndex and HoodieHFileReader/HoodieHFileWriter. Note that the HFileBootstrapIndex does not use the HoodieHFileReader/HoodieHFileWriter. There are nuances here that affect the backward compatibility, which is considered in this PR. As already mentioned, HBase 1.x uses KeyValue.KVComparator while HBase 2.x uses CellComparator, although the conceptual functionality is the same. The comparator class name is written to the trailer of the HFile. Below shows the comparator class name written to the HFile trailer in different Hudi + HBase versions:
Versions Class name (Abbr.) written by HFileBootstrapIndex Class name (Abbr.) written by HoodieHFileWriter
Hudi 0.9.0 based on HBase 1.2.3 HFileBootstrapIndex$HoodieKVComparator hbase.KeyValue$KeyComparator
Hudi 0.10.0 based on HBase 1.2.3 HFileBootstrapIndex$HoodieKVComparator HoodieHBaseKVComparator
Hudi 0.11 (this PR) based on HBase 2.4.9 HFileBootstrapIndex$HoodieKVComparator HoodieHBaseKVComparator

Full class name:
HFileBootstrapIndex$HoodieKVComparator: org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex$HoodieKVComparator
hbase.KeyValue$KeyComparator: org.apache.hadoop.hbase.KeyValue$KeyComparator
HoodieHBaseKVComparator: org.apache.hudi.io.storage.HoodieHBaseKVComparator

There are no compatibility issues with HoodieKVComparator and HoodieHBaseKVComparator given that we control the implementation. For org.apache.hadoop.hbase.KeyValue$KeyComparator, HBase already provides the conversion when retrieving the comparator class in FixedFileTrailer:

@SuppressWarnings("unchecked")
  private static Class<? extends CellComparator> getComparatorClass(String comparatorClassName)
    throws IOException {
    Class<? extends CellComparator> comparatorKlass;
    // for BC
    if (comparatorClassName.equals(KeyValue.COMPARATOR.getLegacyKeyComparatorName())
      || comparatorClassName.equals(KeyValue.COMPARATOR.getClass().getName())
      || (comparatorClassName.equals("org.apache.hadoop.hbase.CellComparator"))) {
      comparatorKlass = CellComparatorImpl.class;
    }

The KeyValue.COMPARATOR.getLegacyKeyComparatorName() returns org.apache.hadoop.hbase.KeyValue$KeyComparator in KeyValue class. In the unit test, it is verified that the HFile written by Hudi 0.9.0 HoodieHFileWriter can be read with the comparator transformed to CellComparatorImpl.

@Deprecated
  public static class KVComparator implements RawComparator<Cell>, SamePrefixComparator<byte[]> {

    /**
     * The HFileV2 file format's trailer contains this class name.  We reinterpret this and
     * instantiate the appropriate comparator.
     * TODO: With V3 consider removing this.
     * @return legacy class name for FileFileTrailer#comparatorClassName
     */
    public String getLegacyKeyComparatorName() {
      return "org.apache.hadoop.hbase.KeyValue$KeyComparator";
    }

When we package the Hudi bundles, we need to make sure that org.apache.hadoop.hbase.KeyValue$KeyComparator is not shaded, not to break the backward compatibility logic in HBase. That's why this relocation rule is added:

<relocation>
  <pattern>org.apache.hadoop.hbase.</pattern>
  <shadedPattern>org.apache.hudi.org.apache.hadoop.hbase.</shadedPattern>
  <excludes>
    <exclude>org.apache.hadoop.hbase.KeyValue$KeyComparator</exclude>
  </excludes>
</relocation>

The bundle BC logic is verified by using Java Decompiler. Before excluding the class in shading, the new bundle cannot read Hudi metadata table written by 0.9.0:
Screen Shot 2022-03-19 at 17 36 54

Caused by: org.apache.hudi.org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file file:/Users/ethan/Work/data/hudi/hudi_0_9_metadata_table/.hoodie/metadata/files/c626cae9-96eb-48c5-ab41-bef60c6037c0-0_0-94-187_003001.hfile
  at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileInfo.initTrailerAndContext(HFileInfo.java:355)
Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.KeyValue$KeyComparator
  at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.getComparatorClass(FixedFileTrailer.java:627)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.KeyValue$KeyComparator
  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)

After excluding the class in shading, the new bundle can read Hudi metadata table written by 0.9.0:
Screen Shot 2022-03-21 at 12 08 57

  • Forward compatibility (reader of previous Hudi releases reading tables written by Hudi 0.11 writer). The data table bootstrap index is not affected, supporting forward compatibility. The metadata table is affected. The forward compatibility is anyway broken since 0.10, because Hudi 0.9.0 reader cannot read Hudi 0.10/0.11 metadata table due to the comparator class name change, as the new comparator class HoodieHBaseKVComparator is not present in Hudi 0.9.0 and before. However, the metadata table is forward compatible since 0.10.0.
  • Add new tests as mentioned below to cover changes in HoodieHFileReader.

Verify this pull request

This PR adds new tests:

  • Restructures the tests around HoodieFileReader and HoodieFileWriter by adding an abstract class TestHoodieReaderWriterBase for common tests for all file formats. Adding more verification TestHoodieReaderWriterBase to cover all APIs. Makes TestHoodieHFileReaderWriter and TestHoodieOrcReaderWriter extend TestHoodieReaderWriterBase.
  • Adds more tests in TestHoodieHFileReaderWriter to cover HFile specific read and write logic.
  • Adds HFile compatibility tests in TestHoodieHFileReaderWriter and test fixtures in resources. Three types of HFiles are added for all three Hudi versions (9 HFile fixtures in total). These are all cases where HFile is used.
    • HFile fixtures generated: (1) HFile based on simple schema generated from TestHoodieReaderWriterBase#testWriteReadPrimitiveRecord(); (2) HFile based on complex schema generated from TestHoodieReaderWriterBase#testWriteReadComplexRecord(); (3) HFile based on bootstrap index generated from TestBootstrapIndex#testBootstrapIndex().
    • Hudi versions: (1) Hudi 0.9.0 based on HBase 1.2.3 (2) Hudi 0.10.0 based on HBase 1.2.3 (3) Hudi 0.11 (this PR) based on HBase 2.4.9 (the nuances can be found in the above section)

This PR is covered by existing unit, functional and CI tests for core write/read flows, compatibility with docker setup of Spark 2.4.4, Hadoop 2.8.4, Hive 2.3.3.

Further compatibility testing:

  • Write-Read Backward compatibility: Write a hudi table before this change with metadata table enabled (0.9, 0.10, 0.11-this PR), containing HFile base files and log files metadata table. List partitions and files of the data table using metadata table, with this change. There is no issue.
  • Write-Read Backward compatibility: Write a hudi table with this change with metadata table enabled (0.11-this PR). Read the table using old releases (0.9, 0.10). The table can be read.
  • Tested on both local and EMR 6.5.0.

Bundle jar:
The bundle jars are verified so that HBase related classes are properly shaded (jar tf | grep "hbase\|htrace\|metrics2\|commons.io") and Google guava is not present (jar tf | grep guava).

Unfortunately, the size of bundle jars is significantly increased due to new necessary packages (~ 16MB additional size for most of the bundles):

Bundle jar file Master This PR
hudi-flink-bundle_2.11-0.11.0-SNAPSHOT.jar 31.4 MB 47.2 MB
hudi-hadoop-mr-bundle-0.11.0-SNAPSHOT.jar 17.4 MB 31.5 MB
hudi-hive-sync-bundle-0.11.0-SNAPSHOT.jar 3.7 MB 3.7 MB
hudi-integ-test-bundle-0.11.0-SNAPSHOT.jar 113.8 MB 136.3 MB
hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar 51.9 MB 67.5 MB
hudi-presto-bundle-0.11.0-SNAPSHOT.jar 19.3 MB 33.3 MB
hudi-spark-bundle_2.11-0.11.0-SNAPSHOT.jar 40.4 MB 56.4 MB
hudi-timeline-server-bundle-0.11.0-SNAPSHOT.jar 35.8 MB 51.5 MB
hudi-trino-bundle-0.11.0-SNAPSHOT.jar 19.3 MB 35.3 MB
hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar 43 MB 59.0 MB

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@yihua yihua force-pushed the HUDI-1180-hbase-upgrade-ci-test-deps branch from cfd58a7 to 1b354cf Compare March 10, 2022 22:26
@yihua
Copy link
Contributor Author

yihua commented Mar 10, 2022

@hudi-bot run azure

@yihua yihua changed the title [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9 (Simplify dependency changes) [WIP][HUDI-1180] Upgrade HBase to 2.4.9 Mar 11, 2022
Comment on lines 241 to 245
//LOG.error("\n\n ###### Stdout #######\n" + callback.getStdout().toString());
}
callback.getStderr().flush();
callback.getStdout().flush();
LOG.error("\n\n ###### Stdout #######\n" + callback.getStdout().toString());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These debugging code will be removed once the PR is near landing.

@yihua yihua changed the title [WIP][HUDI-1180] Upgrade HBase to 2.4.9 [HUDI-1180] Upgrade HBase to 2.4.9 Mar 11, 2022
@yihua
Copy link
Contributor Author

yihua commented Mar 11, 2022

@danny0405 @xiarixiaoyao @alexeykudinkin @umehrot2 PTAL after more changes to make HBase upgrade work with Spark 2.4.4, Hadoop 2.8.4, and Hive 2.3.3 (tested with Docker setup).

@danny0405 could you help verify if the changes to hudi-flink-bundle is fine? I don't see any IT tests using Flink in Docker setup, though the hudi-flink-bundle changes are similar to other bundles which are verified. Maybe we should add Flink ITs in Docker setup as well.

@yihua yihua force-pushed the HUDI-1180-hbase-upgrade-ci-test-deps branch 2 times, most recently from fe2f135 to 70c1029 Compare March 14, 2022 08:03
@nsivabalan nsivabalan added the priority:blocker Production down; release blocker label Mar 15, 2022
@vinothchandar
Copy link
Member

Write a hudi table before this change with metadata table enabled

can we test the reverse. For e.g writers may be upgraded, while query engines may not be. So can we write hudi table with this change and then try to read it with older code?

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a pass. Wanted to clarify few things. Also make sure we handle upgrade-downgrade

private final Configuration hadoopConf;
private final BloomFilter bloomFilter;
private final KeyValue.KVComparator hfileComparator;
private final CellComparator hfileComparator;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these backwards and forward compatible . i.e KVComparator is written into the HFile footer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion. there is a potential issue of 0.11 queries not being able to read 0.9 tables, because the org.apache.hadoop.hbase.KeyValue.KVComparator class is written into HFile 1.x (0.9 table) and since we now shade hbase in 0.11 as org.apache.hudi.org.apache.hadoop.hbase.KeyValue.KVComparator.

If we ship the KVComparator class along with our jar, then we will be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can just extend CellComparatorImpl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I took another approach here. In HBase, there is already backward compatibility to transform the old comparator. When we package the Hudi bundles, we need to make sure that org.apache.hadoop.hbase.KeyValue$KeyComparator (written to the metadata table HFiles in Hudi 0.9.0) is not shaded, not to break the backward compatibility logic in HBase. The new rule is added for this. More details can be found in the PR description.

<relocation>
  <pattern>org.apache.hadoop.hbase.</pattern>
  <shadedPattern>org.apache.hudi.org.apache.hadoop.hbase.</shadedPattern>
  <excludes>
    <exclude>org.apache.hadoop.hbase.KeyValue$KeyComparator</exclude>
  </excludes>
</relocation>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way, we don't need to ship our own comparator in org.apache.hadoop.hbase and exclude that from shading.

} catch (IOException e) {
throw new HoodieException("Could not read min/max record key out of file information block correctly from path", e);
}
HFileInfo fileInfo = reader.getHFileInfo();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we UT this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more unit tests around the HFile reader.

Map<byte[], byte[]> fileInfo = reader.loadFileInfo();
return new String[] { new String(fileInfo.get(KEY_MIN_RECORD.getBytes())),
new String(fileInfo.get(KEY_MAX_RECORD.getBytes()))};
} catch (IOException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good practice to wrap this in a HoodieException no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOException is checked exception, so if new HFile API doesn't throw it we can't catch it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the new API getHFileInfo() returns the stored file info directly and there is no exception we can catch here now.

HFileReaderImpl
  @Override
  public HFileInfo getHFileInfo() {
    return this.fileInfo;
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we should wrap it with HoodieException.


<include>org.apache.hbase:hbase-common</include>
<include>org.apache.hbase:hbase-client</include>
<include>org.apache.hbase:hbase-hadoop-compat</include>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the absolute minimal set of artifacts needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need not to take as part of this PR, but i actually want to suggest one step further:
Since we're mostly reliant on HFile and the classes it's dependent on, can we try to filter out packages that won't break it?

My hunch is that we can greatly reduce 16Mb overhead number by just cleaning up all the stuff that is bolted onto HBase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. In fact, i've tried out but it's a very manual time-consuming process to verify. I gave up after a few failures. And keep future upgrades in mind. But, i would be very happy to reduce the bundle size in any way we can and we should take another stab at this idea in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's good to have. The problem as @codope pointed out is that such a process is time-consuming. For now, what I can say is that the newly added artifacts are necessary, since I started with the old pom, incrementally added new artifacts as I saw NoClassDef exception until every test can pass.

One thing we may try later is to add and trim hudi-hbase-shaded by excluding transitives and only depend on hudi-hbase-shaded here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's tedious manual process for sure, but i think we can do it pretty fast: we just look at the packages imported by HFile, then look at files that are imported by HFile, and so on. Then after that we can run the tests if we collected it properly or not.

The hypothesis is that this set should be reasonably bounded (why wouldn't it?) so this iteration should be pretty fast.

Can you please create a task and link it here to follow-up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUDI-3674 is created as a follow-up.

<shadedPattern>${flink.bundle.shade.prefix}org.apache.avro.</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.commons.io.</pattern>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reminds me. older bootstrap index files may have an unshaded key comparator class saved within the HFile. Does that cause any issues? ie can we read such files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check that. This is another compatibility test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, bootstrap index HFiles always use our own comparator class under org.apache.hudi package so we're good here. The problem is with the metadata table HFiles in Hudi 0.9.0.

<pattern>com.fasterxml.jackson.</pattern>
<shadedPattern>${flink.bundle.shade.prefix}com.fasterxml.jackson.</shadedPattern>
</relocation>
<!-- The classes below in org.apache.hadoop.metrics2 package come from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are classes from hbase-hadoop2-compat and hbase-hadoop-compat implementof org.apache.hadoop.metrics2 classes in hadoop-common. We cannot shade all classes in org.apache.hadoop.metrics2 which cause NoClassDef of hadoop classes. So have to pick and choose here for HBase specific ones.

<pattern>com.google.common.</pattern>
<shadedPattern>org.apache.hudi.com.google.common.</shadedPattern>
</relocation>
<!-- The classes below in org.apache.hadoop.metrics2 package come from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume all of this is repeated? does maven offer a way to reuse the include? I am wondering if we can build a hudi-hbase-shaded package and simply include that everywhere. will be easier to maintain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can have hudi-hbase-shaded as a separate module. I'll see if this can be done soon; if not, I'll take it as a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUDI-3673 is created as a follow-up.

Map<byte[], byte[]> fileInfo = reader.loadFileInfo();
return new String[] { new String(fileInfo.get(KEY_MIN_RECORD.getBytes())),
new String(fileInfo.get(KEY_MAX_RECORD.getBytes()))};
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOException is checked exception, so if new HFile API doesn't throw it we can't catch it here

HBaseConfiguration::addHbaseResources(). To get around this issue,
since HBase loads "hbase-site.xml" after "hbase-default.xml", we
provide hbase-site.xml from the bundle so that HBaseConfiguration
can pick it up and ensure the right defaults.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding explainer!


<include>org.apache.hbase:hbase-common</include>
<include>org.apache.hbase:hbase-client</include>
<include>org.apache.hbase:hbase-hadoop-compat</include>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need not to take as part of this PR, but i actually want to suggest one step further:
Since we're mostly reliant on HFile and the classes it's dependent on, can we try to filter out packages that won't break it?

My hunch is that we can greatly reduce 16Mb overhead number by just cleaning up all the stuff that is bolted onto HBase.

<pattern>com.google.common.</pattern>
<shadedPattern>org.apache.hudi.com.google.common.</shadedPattern>
</relocation>
<!-- The classes below in org.apache.hadoop.metrics2 package come from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great effort @yihua and ❤️ the PR description.

</exclusions>
</dependency>
<dependency>
<groupId>org.apache.zookeeper</groupId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for TestSparkHoodieHBaseIndex.


<include>org.apache.hbase:hbase-common</include>
<include>org.apache.hbase:hbase-client</include>
<include>org.apache.hbase:hbase-hadoop-compat</include>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. In fact, i've tried out but it's a very manual time-consuming process to verify. I gave up after a few failures. And keep future upgrades in mind. But, i would be very happy to reduce the bundle size in any way we can and we should take another stab at this idea in future.

@yihua yihua force-pushed the HUDI-1180-hbase-upgrade-ci-test-deps branch from 70c1029 to 29100bd Compare March 17, 2022 20:01
@yihua
Copy link
Contributor Author

yihua commented Mar 17, 2022

Write a hudi table before this change with metadata table enabled

can we test the reverse. For e.g writers may be upgraded, while query engines may not be. So can we write hudi table with this change and then try to read it with older code?

Good point. I'll take this up as well.

@yihua yihua force-pushed the HUDI-1180-hbase-upgrade-ci-test-deps branch from 9f05eb9 to 8ebfdf9 Compare March 21, 2022 01:21
schema = new Schema.Parser().parse(new String(reader.loadFileInfo().get("schema".getBytes())));
schema = new Schema.Parser().parse(new String(reader.getHFileInfo().get(KEY_SCHEMA.getBytes())));
}
HFileScanner scanner = reader.getScanner(false, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cuibo01 Thanks for raising this. Do you mean that we should use the close() method to properly close the reader when done reading?

yes, we should close scanner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is for the test only. Regarding the production code, HoodieHFileReader properly close the HFile reader:

@Override
  public synchronized void close() {
    try {
      reader.close();
      reader = null;
      if (fsDataInputStream != null) {
        fsDataInputStream.close();
      }
      keyScanner = null;
    } catch (IOException e) {
      throw new HoodieIOException("Error closing the hfile reader", e);
    }
  }

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks in good shape. Few minor comments/clarifications.

@Override
@Test
public void testWriteReadWithEvolvedSchema() throws Exception {
// Disable the test with evolved schema for HFile since it's not supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it if it's not needed or track it in a JIRA if you plan to add support later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked here: HUDI-3683

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the override here to disable the test.

@ParameterizedTest
@ValueSource(strings = {
"/hudi_0_9_hbase_1_2_3", "/hudi_0_10_hbase_1_2_3", "/hudi_0_11_hbase_2_4_9"})
public void testHoodieHFileCompatibility(String hfilePrefix) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way it's being tested but we will need to update fixtures as and when there're changes to the sources from which they were generated. They wouldn't be too frequent but just something to keep in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. The fixtures for older releases should not be changed given they are generated from public releases and should be used as is for compatibility tests.

}
@Override
public void testReaderFilterRowKeys() {
// TODO: fix filterRowKeys test for ORC due to a bug in ORC logic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the bug here? Are we tracking it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked here: HUDI-3682

LOG.info("Exit code for command : " + exitCode);
if (exitCode != 0) {
LOG.error("\n\n ###### Stdout #######\n" + callback.getStdout().toString());
//LOG.error("\n\n ###### Stdout #######\n" + callback.getStdout().toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this condition, or maybe bring L243-L245 inside this conditional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are debugging code for showing logs when there are IT failures. I'll remove these changes.

@yihua yihua force-pushed the HUDI-1180-hbase-upgrade-ci-test-deps branch from 2b54590 to 0733dcc Compare March 24, 2022 21:29
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency updates priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants