-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[WIP][HUDI-3088] Use Spark 3.2 as default Spark version #4752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/helpers/TestKafkaOffsetGen.java
Outdated
Show resolved
Hide resolved
|
@hudi-bot run azure |
| return records.stream().map(Helpers::toJsonString).toArray(String[]::new); | ||
| } | ||
|
|
||
| public static Tuple2<String, String>[] jsonifyRecordsByPartitions(List<HoodieRecord> records, int partitions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java class org.apache.hudi.common.util.collection.Pair is preferred here over Tuple2
| public static Tuple2<String, String>[] jsonifyRecordsByPartitions(List<HoodieRecord> records, int partitions) { | ||
| Tuple2<String, String>[] data = new Tuple2[records.size()]; | ||
| for (int i = 0; i < records.size(); i++) { | ||
| int key = i % partitions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be better with org.apache.hudi.common.model.HoodieKey#recordKey from each HoodieRecord ?
e25be08 to
e7933bc
Compare
|
the first job passing..nice progress. there was a recent fix that might help pass some tests in spark 3.2. d971974 |
6677a97 to
097c4d8
Compare
| // So when returnNullIfNotFound is true, catch this exception. | ||
| if (!returnNullIfNotFound) { | ||
| throw e; | ||
| LOG.warn("Failed to get nested field Value ", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should not be changed.
631437f to
a2f0214
Compare
|
@hudi-bot run azure |
a1cb5d3 to
e0b8ffa
Compare
azure-pipelines.yml
Outdated
| MAVEN_CACHE_FOLDER: $(Pipeline.Workspace)/.m2/repository | ||
| MAVEN_OPTS: '-Dmaven.repo.local=$(MAVEN_CACHE_FOLDER) -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true' | ||
| SPARK_VERSION: '2.4.4' | ||
| SPARK_VERSION: '3.2.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to upgrade to 3.2.1 to align with master version
hudi-common/pom.xml
Outdated
| <dependency> | ||
| <groupId>org.apache.avro</groupId> | ||
| <artifactId>avro</artifactId> | ||
| <version>${avro.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? thought it inherits the same version from the root pom
| * Helper class to do common stuff across Avro. | ||
| */ | ||
| public class HoodieAvroUtils { | ||
| private static final Logger LOG = LogManager.getLogger(HoodieAvroUtils.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG is not used. can we revert?
| Object value = record.get(DebeziumConstants.FLATTENED_OP_COL_NAME); | ||
| Object value; | ||
| if (record.getSchema().getField(DebeziumConstants.FLATTENED_OP_COL_NAME) == null) { | ||
| value = null; | ||
| } else { | ||
| value = record.get(DebeziumConstants.FLATTENED_OP_COL_NAME); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new check does not look clean; it looks like whenever we try to get a column's value, we need to check its nullability? are we able to retrieve the value the without this kind of check
| new Schema.Field("id", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", JsonProperties.NULL_VALUE), | ||
| new Schema.Field("name", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", JsonProperties.NULL_VALUE), | ||
| new Schema.Field("age", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", JsonProperties.NULL_VALUE), | ||
| new Schema.Field("job", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", JsonProperties.NULL_VALUE) | ||
| new Schema.Field("id", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", null), | ||
| new Schema.Field("name", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", null), | ||
| new Schema.Field("age", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", null), | ||
| new Schema.Field("job", Schema.createUnion(Schema.create(Schema.Type.STRING), Schema.create(Schema.Type.NULL)), "", null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should prefer to use org.apache.avro.Schema.Field#NULL_DEFAULT_VALUE whenever we want to set default as null
| @Disabled | ||
| @Test | ||
| public void testMetadataBootstrapNonpartitionedCOW() throws Exception { | ||
| testBootstrapCommon(false, false, EffectiveMode.METADATA_BOOTSTRAP_MODE); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can disable for the whole class instead of annotating each method, and add a note in the annotation like @Disabled("<describe the reason>")
| scala.collection.immutable.List.empty(), | ||
| JavaConverters.collectionAsScalaIterableConverter(new ArrayList<String>()).asScala().toSeq(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you need a seq why not scala.collection.immutable.List.empty().toSeq() ?
hudi-utilities/pom.xml
Outdated
| <dependency> | ||
| <groupId>org.apache.avro</groupId> | ||
| <artifactId>avro</artifactId> | ||
| <version>1.8.2</version> | ||
| <scope>provided</scope> | ||
| </dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concern on the impact here: why we set an old version just for utilities?
| cleanupResources(); | ||
| } | ||
|
|
||
| @Disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be helpful to set a reason
packaging/hudi-presto-bundle/pom.xml
Outdated
| <dependency> | ||
| <groupId>org.apache.avro</groupId> | ||
| <artifactId>avro</artifactId> | ||
| <version>${avro.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same avro version question
e0b8ffa to
f89956b
Compare
7b9e92c to
76a681e
Compare
95832e1 to
2b87e93
Compare
|
Closing this one, in favor of #5402 which is going to have all changes for Hadoop/Hive/Spark 3.x upgrades. |
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.