Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d27b712
Moved `HoodieSpark3Analysis` and required classes along to "hudi-spar…
Aug 31, 2022
09d154e
Removed duplication from Spark 3.3 module
Aug 31, 2022
bc75ebf
Unified Scala library deps to not over-specify version in every module;
Aug 31, 2022
f3c9c6c
Moved `HoodieIdentifier` to "hudi-spark3-common";
Aug 31, 2022
5d66c01
Missing licenses
Aug 31, 2022
9259aa5
Provisioned new "hudi-spark3.2plus-common" project (to accommodate fo…
Aug 31, 2022
e229784
Moved clases pertaining to Spark >= 3.2 under "hudi-spark3.2plus-common"
Aug 31, 2022
5495b03
Fixed invalid module refs
Aug 31, 2022
693186c
Fixing module id
Aug 31, 2022
2b299b4
Removed unnecessary exclusions
Aug 31, 2022
9fc48f9
Cleaned up unused `HoodieSPark3ResolveReferences` rule
Aug 31, 2022
98e1832
Added `HoodieCatalogUtils` support into `SparkAdapter`
Aug 31, 2022
c6854f7
Fixed `HoodieCatalog` for Spark 3.3
Aug 31, 2022
9c4c641
Cleaned up duplicated `AlterTableCommand`;
Aug 31, 2022
19fe331
De-duplicated Spar's default source impls for Spark 3.2 and 3.3 (as `…
Aug 31, 2022
47aeb02
Fixed `HoodieSpark3_1AvroDeserializer` to properly propagate legacy r…
Aug 31, 2022
8c923b3
De-duped `TimeTravelRelation`
Aug 31, 2022
f798198
De-duped Hudi's `ParquetFileFormat` extensions for Spark >= 3.2
Aug 31, 2022
27efd99
Tidying up
Aug 31, 2022
5c80923
Missing licenses
Aug 31, 2022
204509c
Fixing invalid ref
Sep 1, 2022
8f7d56e
Tidying up
Sep 1, 2022
5b1d6ab
Fixed bundles to include new common module;
Sep 1, 2022
86d5562
Fixed invalid ref on "hbase-server"
Sep 1, 2022
1c7683a
Fixing compilation for Spark 3.x
Sep 1, 2022
bbba30a
Cleaning up "spark2" profile
Sep 2, 2022
4f8101f
Removed "hudi-cli" dep from "hudi-spark-examples"
Sep 2, 2022
bcab7cd
Exclude "hbase-client", "hbase-common" transitively pulled in from Hi…
Sep 2, 2022
c089a68
Added enforcer rules to make sure no transitive HBase deps are pulled…
Sep 2, 2022
c2e28ce
Excluding HBase deps pulled in by Hive in "hudi-spark-bundle"
Sep 2, 2022
5acd5b1
Consolidated exclusions in the root POM
Sep 2, 2022
282d49d
Restored "spark2" and "spark3" profiles (for BWC);
Sep 3, 2022
55d3249
Fixed all Hive deps to exclude any transitive HBase deps
Sep 9, 2022
64e9753
Tidying up
Sep 9, 2022
7f9e915
Fixing typo
Sep 9, 2022
80e9ea2
Tidying up
Sep 10, 2022
3858976
Fixed Hudi IT scripts to properly alias bundle jars (having the forma…
Sep 10, 2022
37309a8
Unravel globbing adn replaced w/ module-ref placeholders
Sep 14, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ parameters:
- '!hudi-utilities'

variables:
BUILD_PROFILES: '-Dscala-2.11 -Dspark2 -Dflink1.14'
BUILD_PROFILES: '-Dscala-2.11 -Dspark2.4 -Dflink1.14'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we update the docs/readme as well?

PLUGIN_OPTS: '-Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true -ntp -B -V -Pwarn-log -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.shade=warn -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.dependency=warn'
MVN_OPTS_INSTALL: '-DskipTests $(BUILD_PROFILES) $(PLUGIN_OPTS)'
MVN_OPTS_TEST: '-fae -Pwarn-log $(BUILD_PROFILES) $(PLUGIN_OPTS)'
Expand Down
6 changes: 5 additions & 1 deletion hudi-cli/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,11 @@
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>

<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-collection-compat_${scala.binary.version}</artifactId>
</dependency>

<!-- Logging -->
Expand Down
26 changes: 13 additions & 13 deletions hudi-client/hudi-flink-client/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -36,20 +36,20 @@

<dependencies>
<!-- Logging -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>

<!-- Hudi -->
<!-- Hudi -->
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-client-common</artifactId>
Expand Down
6 changes: 5 additions & 1 deletion hudi-client/hudi-spark-client/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,11 @@
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>

<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-collection-compat_${scala.binary.version}</artifactId>
</dependency>

<!-- Logging -->
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql

/**
* NOTE: Since support for [[TableCatalog]] was only added in Spark 3, this trait
* is going to be an empty one simply serving as a placeholder (for compatibility w/ Spark 2)
*/
trait HoodieCatalogUtils {}
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@

package org.apache.spark.sql

import org.apache.spark.sql.catalyst.{AliasIdentifier, TableIdentifier}
import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
import org.apache.spark.sql.catalyst.plans.JoinType
import org.apache.spark.sql.catalyst.plans.logical.{Join, LogicalPlan}
import org.apache.spark.sql.catalyst.{AliasIdentifier, TableIdentifier}
import org.apache.spark.sql.internal.SQLConf

trait HoodieCatalystPlansUtils {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
import org.apache.spark.sql.execution.datasources.{FilePartition, FileScanRDD, LogicalRelation, PartitionedFile, SparkParsePartitionUtil}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{DataType, StructType}
import org.apache.spark.sql.{HoodieCatalystExpressionUtils, HoodieCatalystPlansUtils, Row, SparkSession}
import org.apache.spark.sql.{HoodieCatalogUtils, HoodieCatalystExpressionUtils, HoodieCatalystPlansUtils, Row, SparkSession}
import org.apache.spark.storage.StorageLevel

import java.util.Locale
Expand All @@ -44,13 +44,19 @@ import java.util.Locale
trait SparkAdapter extends Serializable {

/**
* Creates instance of [[HoodieCatalystExpressionUtils]] providing for common utils operating
* Returns an instance of [[HoodieCatalogUtils]] providing for common utils operating on Spark's
* [[TableCatalog]]s
*/
def getCatalogUtils: HoodieCatalogUtils

/**
* Returns an instance of [[HoodieCatalystExpressionUtils]] providing for common utils operating
* on Catalyst [[Expression]]s
*/
def getCatalystExpressionUtils: HoodieCatalystExpressionUtils

/**
* Creates instance of [[HoodieCatalystPlansUtils]] providing for common utils operating
* Returns an instance of [[HoodieCatalystPlansUtils]] providing for common utils operating
* on Catalyst [[LogicalPlan]]s
*/
def getCatalystPlanUtils: HoodieCatalystPlansUtils
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import org.apache.hadoop.hbase.exceptions.IllegalArgumentIOException;
import org.apache.hudi.common.util.Option;
import org.apache.hudi.exception.HoodieException;
import org.apache.hudi.exception.HoodieIOException;
import org.apache.hudi.internal.schema.InternalSchema;
import org.apache.hudi.internal.schema.Type;
import org.apache.hudi.internal.schema.Types;
Expand Down Expand Up @@ -179,7 +179,7 @@ private static void toJson(Type type, JsonGenerator generator) throws IOExceptio
if (!type.isNestedType()) {
generator.writeString(type.toString());
} else {
throw new IllegalArgumentIOException(String.format("cannot write unknown types: %s", type));
throw new HoodieIOException(String.format("cannot write unknown types: %s", type));
}
}
}
Expand Down
12 changes: 5 additions & 7 deletions hudi-examples/hudi-examples-spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,11 @@
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>

<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-collection-compat_${scala.binary.version}</artifactId>
</dependency>

<!-- Logging -->
Expand All @@ -134,12 +138,6 @@
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-cli</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-client-common</artifactId>
Expand Down
6 changes: 5 additions & 1 deletion hudi-spark-datasource/hudi-spark-common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,11 @@
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>

<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-collection-compat_${scala.binary.version}</artifactId>
</dependency>

<!-- Logging -->
Expand Down
55 changes: 12 additions & 43 deletions hudi-spark-datasource/hudi-spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -165,84 +165,53 @@
</build>

<dependencies>
<!-- Scala -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>

<!-- Logging -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
</dependency>

<!-- Hoodie -->
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-client-common</artifactId>
<version>${project.version}</version>
</dependency>
<!-- Hoodie - Spark-->
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-spark-client</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-common</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-hadoop-mr</artifactId>
<artifactId>${hudi.spark.module}_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>

<!-- Hoodie - Other-->

<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-hive-sync</artifactId>
<artifactId>hudi-client-common</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-sync-common</artifactId>
<artifactId>hudi-common</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-spark-common_${scala.binary.version}</artifactId>
<artifactId>hudi-hadoop-mr</artifactId>
<version>${project.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.curator</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>${hudi.spark.module}_${scala.binary.version}</artifactId>
<artifactId>hudi-hive-sync</artifactId>
<version>${project.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.hudi</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>${hudi.spark.common.module}</artifactId>
<artifactId>hudi-sync-common</artifactId>
<version>${project.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.hudi</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>

<!-- Fasterxml -->
Expand Down
2 changes: 1 addition & 1 deletion hudi-spark-datasource/hudi-spark/run_hoodie_app.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ function error_exit {

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
#Ensure we pick the right jar even for hive11 builds
HUDI_JAR=`ls -c $DIR/../../packaging/hudi-spark-bundle/target/hudi-spark-bundle*.jar | grep -v sources | head -1`
HUDI_JAR=`ls -c $DIR/../../packaging/hudi-spark-bundle/target/hudi-spark*-bundle*.jar | grep -v sources | head -1`

if [ -z "$HADOOP_CONF_DIR" ]; then
echo "setting hadoop conf dir"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ function error_exit {

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
#Ensure we pick the right jar even for hive11 builds
HUDI_JAR=`ls -c $DIR/../../packaging/hudi-spark-bundle/target/hudi-spark-bundle*.jar | grep -v sources | head -1`
HUDI_JAR=`ls -c $DIR/../../packaging/hudi-spark-bundle/target/hudi-spark*-bundle*.jar | grep -v sources | head -1`

if [ -z "$HADOOP_CONF_DIR" ]; then
echo "setting hadoop conf dir"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ function error_exit {

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
#Ensure we pick the right jar even for hive11 builds
HUDI_JAR=`ls -c $DIR/../../packaging/hudi-spark-bundle/target/hudi-spark-bundle*.jar | grep -v sources | head -1`
HUDI_JAR=`ls -c $DIR/../../packaging/hudi-spark-bundle/target/hudi-spark*-bundle*.jar | grep -v sources | head -1`

if [ -z "$HADOOP_CONF_DIR" ]; then
echo "setting hadoop conf dir"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,6 @@ object HoodieAnalysis {
val spark3Analysis: RuleBuilder =
session => ReflectionUtils.loadClass(spark3AnalysisClass, session).asInstanceOf[Rule[LogicalPlan]]

val spark3ResolveReferencesClass = "org.apache.spark.sql.hudi.analysis.HoodieSpark3ResolveReferences"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be not related to changes in this patch. but "HoodieSpark3Analysis" actually refers to 3.2 or greater right? but the naming is not right. I don't see it being used for 3.1 below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be you can name the variable in L78, 79 accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is actually deleted

val spark3ResolveReferences: RuleBuilder =
session => ReflectionUtils.loadClass(spark3ResolveReferencesClass, session).asInstanceOf[Rule[LogicalPlan]]

val resolveAlterTableCommandsClass =
if (HoodieSparkUtils.gteqSpark3_3)
"org.apache.spark.sql.hudi.Spark33ResolveHudiAlterTableCommand"
Expand All @@ -94,10 +90,10 @@ object HoodieAnalysis {
//
// It's critical for this rules to follow in this order, so that DataSource V2 to V1 fallback
// is performed prior to other rules being evaluated
rules ++= Seq(dataSourceV2ToV1Fallback, spark3Analysis, spark3ResolveReferences, resolveAlterTableCommands)
rules ++= Seq(dataSourceV2ToV1Fallback, spark3Analysis, resolveAlterTableCommands)

} else if (HoodieSparkUtils.gteqSpark3_1) {
val spark31ResolveAlterTableCommandsClass = "org.apache.spark.sql.hudi.Spark312ResolveHudiAlterTableCommand"
val spark31ResolveAlterTableCommandsClass = "org.apache.spark.sql.hudi.Spark31ResolveHudiAlterTableCommand"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem if not all writers and readers use the same Hudi version? Maybe not in this case, but just calling it out to think through the change of class names. I had encountered one issue while doing HBase upgrade (however that was because we actually wrote the kv comparator class name in data files).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in this case class renames don't have any impact

val spark31ResolveAlterTableCommands: RuleBuilder =
session => ReflectionUtils.loadClass(spark31ResolveAlterTableCommandsClass, session).asInstanceOf[Rule[LogicalPlan]]

Expand Down
6 changes: 0 additions & 6 deletions hudi-spark-datasource/hudi-spark2/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -166,12 +166,6 @@
</build>

<dependencies>
<!-- Scala -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>

<!-- Hoodie -->
<dependency>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ import scala.collection.mutable.ArrayBuffer
*/
class Spark2Adapter extends SparkAdapter {

override def getCatalogUtils: HoodieCatalogUtils = {
throw new UnsupportedOperationException("Catalog utilities are not supported in Spark 2.x");
}

override def getCatalystExpressionUtils: HoodieCatalystExpressionUtils = HoodieSpark2CatalystExpressionUtils

override def getCatalystPlanUtils: HoodieCatalystPlansUtils = HoodieSpark2CatalystPlanUtils
Expand Down
5 changes: 0 additions & 5 deletions hudi-spark-datasource/hudi-spark3-common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -158,11 +158,6 @@
</build>

<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala12.version}</version>
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>
Expand Down
Loading