Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
f96ba7a
[HUDI-3642] Handle NPE due to empty requested replacecommit metadata …
codope Mar 23, 2022
52f0498
Fixing non partitioned all files record in MDT (#5108)
nsivabalan Mar 24, 2022
a1c42fc
[minor] Checks the data block type for archived timeline (#5106)
danny0405 Mar 24, 2022
fe2c398
[HUDI-3689] Fix glob path and hive sync in deltastreamer tests (#5117)
codope Mar 24, 2022
ccc3728
[HUDI-3684] Fixing NPE in `ParquetUtils` (#5102)
Mar 24, 2022
b147065
[HUDI-3689] Remove Azure CI cache (#5121)
xushiyan Mar 24, 2022
686da41
[HUDI-3689] Fix UT failures in TestHoodieDeltaStreamer (#5120)
xushiyan Mar 24, 2022
44ab3b7
[HUDI-3706] Downgrade maven surefire and failsafe version (#5123)
yihua Mar 24, 2022
ff13665
[HUDI-3689] Fix delta streamer tests (#5124)
xushiyan Mar 24, 2022
4ddd094
[HUDI-3689] Disable flaky tests in TestHoodieDeltaStreamer (#5127)
yihua Mar 24, 2022
9b3dd2e
[HUDI-3624] Check all instants before starting a commit in metadata t…
yihua Mar 25, 2022
608d4bf
[HUDI-3638] Make ZookeeperBasedLockProvider serializable (#5112)
yihua Mar 25, 2022
5e86cdd
[HUDI-3701] Flink bulk_insert support bucket hash index (#5118)
danny0405 Mar 25, 2022
eaa4c4f
[HUDI-1180] Upgrade HBase to 2.4.9 (#5004)
yihua Mar 25, 2022
483ee84
[HUDI-3703] Reset taskID in restoreWriteMetadata (#5122)
yuzhaojing Mar 25, 2022
2fd9a4d
[HUDI-3580] Claim RFC number 48 for LogCompaction action RFC (#5128)
suryaprasanna Mar 25, 2022
8896864
[HUDI-3678] Fix record rewrite of create handle when 'preserveMetadat…
danny0405 Mar 25, 2022
8b38dde
[HUDI-3594] Supporting Composite Expressions over Data Table Columns …
Mar 25, 2022
f20c986
[HUDI-3711] Fix typo in MaxwellJsonKafkaSourcePostProcessor.Config#PR…
wangxianghu Mar 25, 2022
e5c3f90
[HUDI-3563] Make quickstart examples covered by CI tests (#5082)
XuQianJin-Stars Mar 25, 2022
12cc8e7
[MINOR] fix QuickstartUtils move (#5133)
XuQianJin-Stars Mar 25, 2022
51034fe
[HUDI-3396] Refactoring `MergeOnReadRDD` to avoid duplication, fetch …
Mar 25, 2022
0c09a97
[HUDI-3435] Do not throw exception when instant to rollback does not …
danny0405 Mar 26, 2022
57b4f39
[HUDI-3612] Clustering strategy should create new TypedProperties whe…
Mar 26, 2022
189d529
[HUDI-3709] Fixing `ParquetWriter` impls not respecting Parquet Max F…
Mar 26, 2022
4d940bb
[HUDI-3716] OOM occurred when use bulk_insert cow table with flink BU…
danny0405 Mar 27, 2022
484b340
[HUDI-3604] Adjust the order of timeline changes in rollbacks (#5114)
yihua Mar 27, 2022
85c4a6c
[MINOR] Relaxing cleaner and archival configs (#5142)
nsivabalan Mar 27, 2022
9da2dd4
[HUDI-3719] High performance costs of AvroSerizlizer in DataSource wr…
xiarixiaoyao Mar 27, 2022
f2a93ea
[HUDI-3724] Fixing closure of ParquetReader (#5141)
nsivabalan Mar 28, 2022
d31cde2
[MINOR] Fix call command parser use spark3.2 (#5144)
XuQianJin-Stars Mar 28, 2022
1d0f4cc
[HUDI-3538] Support Compaction Command Based on Call Procedure Comman…
huberylee Mar 28, 2022
2e2d08c
[HUDI-3539] Flink bucket index bucketID bootstrap optimization. (#5093)
minihippo Mar 28, 2022
4ed84b2
[HUDI-3720] Fix the logic of reattempting pending rollback (#5148)
yihua Mar 28, 2022
6ccbae4
[HUDI-2757] Implement Hudi AWS Glue sync (#5076)
xushiyan Mar 28, 2022
d074089
[HUDI-2566] Adding multi-writer test support to integ test (#5065)
nsivabalan Mar 28, 2022
72e0b52
[HUDI-3722] Fix truncate hudi table's error (#5140)
XuQianJin-Stars Mar 29, 2022
3bf9c5f
[HUDI-3728] Set the sort operator parallelism for flink bucket bulk i…
danny0405 Mar 29, 2022
8f8a815
[HUDI-2520] Fix drop table issue when sync to Hive (#5143)
leesf Mar 29, 2022
7c7ecb1
[HUDI-3736] Fix default dynamodblock url default value (#4967)
parisni Mar 29, 2022
1b2fb71
[MINOR] Move Experiemental to javadoc (#5161)
xushiyan Mar 29, 2022
fcb003e
[HUDI-3731] Fixing Column Stats Index record Merging sequence missing…
Mar 29, 2022
0802510
[HUDI-2520] Fix drop partition issue when sync to hive (#5147)
XuQianJin-Stars Mar 29, 2022
e5a2bae
[HUDI-3549] Removing dependency on "spark-avro" (#4955)
Mar 29, 2022
941c254
[HUDI-2520] Fix CTAS statment issue when sync to hive (#5145)
XuQianJin-Stars Mar 29, 2022
5c1b482
[HUDI-3741] Fix flink bucket index bulk insert generates too many sma…
danny0405 Mar 30, 2022
4fed8dd
[HUDI-3485] Adding scheduler pool configs for async clustering (#5043)
nsivabalan Mar 30, 2022
7fa3639
[HUDI-3745] Support for spark datasource options in S3EventsHoodieInc…
harsh1231 Mar 30, 2022
b9fbada
[minor] Follow 3178, fix the flink metadata table compaction (#5175)
danny0405 Mar 30, 2022
04478a4
[MINOR] Fix dates as per UTC in TestDataSkippingUtils (#5166)
codope Mar 30, 2022
8b796e9
[HUDI-3653] Cleaning up bespoke Column Stats Index implementation (#5…
Mar 30, 2022
eae8488
[HUDI-3647] HoodieMetadataTableValidator: check MDT was initialized a…
zhangyue19921010 Mar 30, 2022
2b60641
[HUDI-3635] Fix HoodieMetadataTableValidator around comparison of par…
zhangyue19921010 Mar 30, 2022
17d11f4
[MINOR] Repeated execution of update status (#5089)
cuibo01 Mar 30, 2022
31d4a16
[HUDI-3536] Add hudi-datahub-sync implementation (#5155)
xushiyan Mar 30, 2022
9ff6a48
[HUDI-3736] Fix null pointer when key not specified (#5167)
parisni Mar 30, 2022
2d73c8a
[HUDI-3355] Issue with out of order commits in the timeline when inge…
xiarixiaoyao Mar 30, 2022
9830005
[HUDI-3681] Provision additional hudi-spark-bundle with different ver…
yihua Mar 31, 2022
4fb1a59
[HUDI-3700] Add hudi-utilities-slim-bundle excluding hudi-spark-datas…
yihua Mar 31, 2022
d80c806
[MINOR] Fixing flakiness in TestHoodieSparkMergeOnReadTableRollback.t…
nsivabalan Mar 31, 2022
2c4554f
[HUDI-3750] Fix NPE when build HoodieFileIndex (#5134)
KnightChess Mar 31, 2022
2dbb273
[HUDI-3721] Delete MDT if necessary when trigger rollback to savepoin…
zhangyue19921010 Mar 31, 2022
f6ff95f
[MINOR][DOCS] Update hudi-utilities-slim-bundle docs (#5184)
yihua Mar 31, 2022
4569734
[HUDI-3713] Guarding archival for multi-writer (#5138)
nsivabalan Mar 31, 2022
ce45f7f
[HUDI-3692] MetadataFileSystemView includes compaction in timeline (#…
YuweiXiao Mar 31, 2022
3cdb590
[HUDI-3733] Adding HoodieFailedWritesCleaningPolicy for restore with …
nsivabalan Mar 31, 2022
80011df
[HUDI-3135] Make delete partitions lazy to be executed by the cleaner…
XuQianJin-Stars Mar 31, 2022
73a2109
[HUDI-3732] Fixing rollback validation (#5157)
nsivabalan Mar 31, 2022
7889c78
[HUDI-3729][SPARK] fixed the per regression by enable vectorizeReader…
xiarixiaoyao Mar 31, 2022
51a701c
[HUDI-3020] Utility to create manifest file (#5153)
codejoyan Mar 31, 2022
1da196c
[HUDI-2777] Improve HoodieSparkSqlWriter write performance (#5187)
liuhe0702 Mar 31, 2022
28dafa7
[HUDI-2488][HUDI-3175] Implement async metadata indexing (#4693)
codope Mar 31, 2022
a048e94
[HUDI-3743] Support DELETE_PARTITION for metadata table (#5169)
codope Apr 1, 2022
98b4e97
[HUDI-3406] Rollback incorrectly relying on FS listing instead of Com…
XuQianJin-Stars Apr 1, 2022
6df14f1
[HUDI-2752] The MOR DELETE block breaks the event time sequence of CD…
danny0405 Apr 1, 2022
23b3122
[HUDI-3769] Optimize the logs of HoodieMergeHandle and BufferedConnec…
dongkelun Apr 1, 2022
7dfb168
[HUDI-3763] Fixing hadoop conf class loading for inline reading (#5194)
nsivabalan Apr 1, 2022
dfdd2de
[HUDI-3225] [RFC-45] for async metadata indexing (#4640)
codope Apr 1, 2022
9275b8f
[HUDI-3468][RFC-49] Support sync with DataHub (#5022)
xushiyan Apr 1, 2022
444ff49
[RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolut…
xiarixiaoyao Apr 1, 2022
fb45fc9
[HUDI-3773] Fix parallelism used for metadata table bloom filter inde…
yihua Apr 2, 2022
b1e7e1f
[HUDI-3708] Fix failure with HoodieMetadataRecord due to schema compa…
yihua Apr 2, 2022
020786a
[HUDI-3451] Delete metadata table when the write client disables MDT …
zhangyue19921010 Apr 2, 2022
eef3f9c
[HUDI-3771] flink supports sync table information to aws glue (#5202)
todd5167 Apr 2, 2022
c19f505
[HUDI-3784] Improve docs and logs of HoodieMetadataTableValidator (#5…
yihua Apr 2, 2022
20964df
[HUDI-3357] MVP implementation of BigQuerySyncTool (#5125)
Apr 2, 2022
74eb09b
[HUDI-3776] Fix BloomIndex incorrectly using ColStats to lookup recor…
codope Apr 2, 2022
cc3737b
[HUDI-3664] Fixing Column Stats Index composition (#5181)
Apr 3, 2022
84064a9
[HUDI-3772] Fixing auto adjustment of lock configs for deltastreamer …
nsivabalan Apr 3, 2022
c34eb07
[MINOR] Reuse deleteMetadataTable for disabling metadata table (#5217)
yihua Apr 3, 2022
8add740
[HUDI-3534] [RFC-34] Added the implementation details for the BigQuer…
Apr 3, 2022
b28f0d6
[HUDI-3290] Different file formats for the partition metadata file. (…
prashantwason Apr 4, 2022
3449e86
[HUDI-3780] improve drop partitions (#5178)
XuQianJin-Stars Apr 5, 2022
325b3d6
[HUDI-3795] Fix hudi-examples checkstyle and maven enforcer error (#5…
XuQianJin-Stars Apr 5, 2022
3195f51
[HUDI-3748] write and select hudi table when enable hoodie.datasource…
YannByron Apr 5, 2022
92ca426
[HUDI-2319] dbt example models to demonstrate hudi dbt integration (#…
Apr 5, 2022
898be61
[HUDI-3782] Fixing table config when any of the index is disabled (#5…
codope Apr 6, 2022
8baeb81
[HUDI-3723] Fixed stack overflows in Record Iterators (#5235)
Apr 6, 2022
e96f08f
Moving to 0.12.0-SNAPSHOT on master branch.
xushiyan Apr 6, 2022
8683fb1
[HUDI-3800] Fixed preserve commit metadata for compaction for untouch…
nsivabalan Apr 6, 2022
7612549
[MINOR] Fixing build failure when using flink-1.13 (#5214)
BruceKellan Apr 6, 2022
9e87d16
[HUDI-3760] Adding capability to fetch Metadata Records by prefix (#…
Apr 6, 2022
ca27327
[HUDI-3340] Fix deploy_staging_jars for different profiles (#5240)
xushiyan Apr 6, 2022
939b3d1
[HUDI-3726] Switching from non-partitioned to partitioned key gen doe…
rkkalluri-dbx Apr 6, 2022
b2f09a1
[HUDI-3340] Fix deploy_staging_jars command (#5243)
xushiyan Apr 6, 2022
d43b4cd
[HUDI-3739] Fix handling of the `isNotNull` predicate in Data Skippin…
Apr 6, 2022
e33149b
[HUDI-3808] Flink bulk_insert timestamp(3) can not be read by Spark (…
danny0405 Apr 7, 2022
531381f
[HUDI-3096] fixed the bug that the cow table(contains decimalType) wr…
xiarixiaoyao Apr 7, 2022
9d744bb
[HUDI-3805] Delete existing corrupted requested rollback plan during …
yihua Apr 7, 2022
6a83964
[HUDI-3643] Fix hive count exception when the table is empty and the …
dongkelun Apr 7, 2022
b3c834a
[HUDI-3571] Spark datasource continuous ingestion tool (#5156)
nsivabalan Apr 7, 2022
cd2c346
[HUDI-3637] Exclude uncommitted log files from metadata table validat…
yihua Apr 7, 2022
ef06e4a
[HUDI-3810] Fixing lazy read for metadata log record readers (#5241)
nsivabalan Apr 7, 2022
672974c
[HUDI-3823] Fix hudi-hive-sync-bundle to include HBase dependencies a…
yihua Apr 8, 2022
df87095
[HUDI-3454] Fix partition name in all code paths for LogRecordScanner…
codope Apr 8, 2022
7a6272f
[HUDI-3781] fix spark delete sql can not delete record (#5215)
KnightChess Apr 8, 2022
67215ab
[HUDI-3827] Promote the inetAddress picking strategy for NetworkUtils…
danny0405 Apr 8, 2022
d7cc767
[HUDI-3825] Fixing non-partitioned table Partition Records persistenc…
Apr 8, 2022
26eb7b8
[HUDI-3571] Spark datasource continuous checkpoint should have own fs…
data-storyteller Apr 8, 2022
1cc7542
[MINOR] Update README of docker build setup (#5256)
yihua Apr 8, 2022
81b25c5
[HUDI-3825] Fixing Column Stats Index updating sequence (#5267)
Apr 9, 2022
5e65aef
[HUDI-3837] Fix license and rat check settings (#5273)
xushiyan Apr 9, 2022
3e97c88
[HUDI-3807] Add a new config to control the use of metadata index in …
yihua Apr 9, 2022
15c2645
[MINOR] Fix typos in the comments of HoodieMergeHandle (#5271)
dongkelun Apr 10, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
75 changes: 60 additions & 15 deletions .github/workflows/bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,53 @@ jobs:
build:
runs-on: ubuntu-latest
strategy:
max-parallel: 8
matrix:
include:
- scala: "scala-2.11"
spark: "spark2"
- scala: "scala-2.11"
spark: "spark2,spark-shade-unbundle-avro"
- scala: "scala-2.12"
spark: "spark3.1.x"
- scala: "scala-2.12"
spark: "spark3.1.x,spark-shade-unbundle-avro"
- scala: "scala-2.12"
spark: "spark3"
- scala: "scala-2.12"
spark: "spark3,spark-shade-unbundle-avro"
# Spark 2.4.4, scala 2.11
- scalaProfile: "scala-2.11"
sparkProfile: "spark2.4"
sparkVersion: "2.4.4"
flinkProfile: "flink1.13"

# Spark 2.4.4, scala 2.12
- scalaProfile: "scala-2.12"
sparkProfile: "spark2.4"
sparkVersion: "2.4.4"
flinkProfile: "flink1.14"

# Spark 3.1.x
- scalaProfile: "scala-2.12"
sparkProfile: "spark3.1"
sparkVersion: "3.1.0"
flinkProfile: "flink1.13"

- scalaProfile: "scala-2.12"
sparkProfile: "spark3.1"
sparkVersion: "3.1.1"
flinkProfile: "flink1.13"

- scalaProfile: "scala-2.12"
sparkProfile: "spark3.1"
sparkVersion: "3.1.2"
flinkProfile: "flink1.14"

- scalaProfile: "scala-2.12"
sparkProfile: "spark3.1"
sparkVersion: "3.1.3"
flinkProfile: "flink1.14"

# Spark 3.2.x
- scalaProfile: "scala-2.12"
sparkProfile: "spark3.2"
sparkVersion: "3.2.0"
flinkProfile: "flink1.13"

- scalaProfile: "scala-2.12"
sparkProfile: "spark3.2"
sparkVersion: "3.2.1"
flinkProfile: "flink1.14"

steps:
- uses: actions/checkout@v2
- name: Set up JDK 8
Expand All @@ -38,6 +71,18 @@ jobs:
architecture: x64
- name: Build Project
env:
SCALA_PROFILE: ${{ matrix.scala }}
SPARK_PROFILE: ${{ matrix.spark }}
run: mvn install -P "$SCALA_PROFILE,$SPARK_PROFILE" -DskipTests=true -Dmaven.javadoc.skip=true -B -V
SCALA_PROFILE: ${{ matrix.scalaProfile }}
SPARK_PROFILE: ${{ matrix.sparkProfile }}
SPARK_VERSION: ${{ matrix.sparkVersion }}
FLINK_PROFILE: ${{ matrix.flinkProfile }}
run:
mvn clean install -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -D"$FLINK_PROFILE" -Dspark.version="$SPARK_VERSION" -Pintegration-tests -DskipTests=true -B -V
- name: Quickstart Test
env:
SCALA_PROFILE: ${{ matrix.scalaProfile }}
SPARK_PROFILE: ${{ matrix.sparkProfile }}
SPARK_VERSION: ${{ matrix.sparkVersion }}
FLINK_PROFILE: ${{ matrix.flinkProfile }}
if: ${{ !startsWith(env.SPARK_VERSION, '3.2.') }} # skip test spark 3.2 before hadoop upgrade to 3.x
run:
mvn test -P "unit-tests" -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -D"$FLINK_PROFILE" -Dspark.version="$SPARK_VERSION" -DfailIfNoTests=false -pl hudi-examples/hudi-examples-flink,hudi-examples/hudi-examples-java,hudi-examples/hudi-examples-spark
49 changes: 22 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,41 +70,36 @@ To build the Javadoc for all Java and Scala classes:
mvn clean javadoc:aggregate -Pjavadocs
```

### Build with Scala 2.12
### Build with different Spark versions

The default Scala version supported is 2.11. To build for Scala 2.12 version, build using `scala-2.12` profile
The default Spark version supported is 2.4.4. To build for different Spark versions and Scala 2.12, use the
corresponding profile

```
mvn clean package -DskipTests -Dscala-2.12
```

### Build with Spark 3

The default Spark version supported is 2.4.4. To build for different Spark 3 versions, use the corresponding profile
| Label | Artifact Name for Spark Bundle | Maven Profile Option | Notes |
|--|--|--|--|
| Spark 2.4, Scala 2.11 | hudi-spark2.4-bundle_2.11 | `-Pspark2.4` | For Spark 2.4.4, which is the same as the default |
| Spark 2.4, Scala 2.12 | hudi-spark2.4-bundle_2.12 | `-Pspark2.4,scala-2.12` | For Spark 2.4.4, which is the same as the default and Scala 2.12 |
| Spark 3.1, Scala 2.12 | hudi-spark3.1-bundle_2.12 | `-Pspark3.1` | For Spark 3.1.x |
| Spark 3.2, Scala 2.12 | hudi-spark3.2-bundle_2.12 | `-Pspark3.2` | For Spark 3.2.x |
| Spark 3, Scala 2.12 | hudi-spark3-bundle_2.12 | `-Pspark3` | This is the same as `Spark 3.2, Scala 2.12` |
| Spark, Scala 2.11 | hudi-spark-bundle_2.11 | Default | The default profile, supporting Spark 2.4.4 |
| Spark, Scala 2.12 | hudi-spark-bundle_2.12 | `-Pscala-2.12` | The default profile (for Spark 2.4.4) with Scala 2.12 |

For example,
```
# Build against Spark 3.2.1 (the default build shipped with the public Spark 3 bundle)
mvn clean package -DskipTests -Dspark3
# Build against Spark 3.2.x (the default build shipped with the public Spark 3 bundle)
mvn clean package -DskipTests -Pspark3.2

# Build against Spark 3.1.2
mvn clean package -DskipTests -Dspark3.1.x
```

### Build without spark-avro module

The default hudi-jar bundles spark-avro module. To build without spark-avro module, build using `spark-shade-unbundle-avro` profile
# Build against Spark 3.1.x
mvn clean package -DskipTests -Pspark3.1

# Build against Spark 2.4.4 and Scala 2.12
mvn clean package -DskipTests -Pspark2.4,scala-2.12
```
# Checkout code and build
git clone https://github.com/apache/hudi.git && cd hudi
mvn clean package -DskipTests -Pspark-shade-unbundle-avro

# Start command
spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
--packages org.apache.spark:spark-avro_2.11:2.4.4 \
--jars `ls packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
```
### What about "spark-avro" module?

Starting from versions 0.11, Hudi no longer requires `spark-avro` to be specified using `--packages`

## Running Tests

Expand Down
70 changes: 19 additions & 51 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ pool:
vmImage: 'ubuntu-18.04'

variables:
MAVEN_CACHE_FOLDER: $(Pipeline.Workspace)/.m2/repository
MAVEN_OPTS: '-Dmaven.repo.local=$(MAVEN_CACHE_FOLDER) -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true'
MAVEN_OPTS: '-Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true'
SPARK_VERSION: '2.4.4'
HADOOP_VERSION: '2.7'
SPARK_ARCHIVE: spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION)
EXCLUDE_TESTED_MODULES: '!hudi-examples/hudi-examples-common,!hudi-examples/hudi-examples-flink,!hudi-examples/hudi-examples-java,!hudi-examples/hudi-examples-spark,!hudi-common,!hudi-flink-datasource/hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync'

stages:
- stage: test
Expand All @@ -35,23 +35,15 @@ stages:
displayName: UT FT common & flink & UT client/spark-client
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
inputs:
key: 'maven | "$(Agent.OS)" | **/pom.xml'
restoreKeys: |
maven | "$(Agent.OS)"
maven
path: $(MAVEN_CACHE_FOLDER)
- task: Maven@3
displayName: maven install
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
goals: 'clean install'
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: UT common flink client/spark-client
inputs:
Expand All @@ -60,7 +52,7 @@ stages:
options: -Punit-tests -pl hudi-common,hudi-flink-datasource/hudi-flink,hudi-client/hudi-spark-client
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT common flink
inputs:
Expand All @@ -69,28 +61,20 @@ stages:
options: -Pfunctional-tests -pl hudi-common,hudi-flink-datasource/hudi-flink
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- job: UT_FT_2
displayName: FT client/spark-client
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
inputs:
key: 'maven | "$(Agent.OS)" | **/pom.xml'
restoreKeys: |
maven | "$(Agent.OS)"
maven
path: $(MAVEN_CACHE_FOLDER)
- task: Maven@3
displayName: maven install
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
goals: 'clean install'
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT client/spark-client
inputs:
Expand All @@ -99,28 +83,20 @@ stages:
options: -Pfunctional-tests -pl hudi-client/hudi-spark-client
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- job: UT_FT_3
displayName: UT FT clients & cli & utilities & sync/hive-sync
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
inputs:
key: 'maven | "$(Agent.OS)" | **/pom.xml'
restoreKeys: |
maven | "$(Agent.OS)"
maven
path: $(MAVEN_CACHE_FOLDER)
- task: Maven@3
displayName: maven install
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
goals: 'clean install'
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: UT clients & cli & utilities & sync/hive-sync
inputs:
Expand All @@ -129,7 +105,7 @@ stages:
options: -Punit-tests -pl hudi-client/hudi-client-common,hudi-client/hudi-flink-client,hudi-client/hudi-java-client,hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT clients & cli & utilities & sync/hive-sync
inputs:
Expand All @@ -138,46 +114,38 @@ stages:
options: -Pfunctional-tests -pl hudi-client/hudi-client-common,hudi-client/hudi-flink-client,hudi-client/hudi-java-client,hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- job: UT_FT_4
displayName: UT FT other modules
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
inputs:
key: 'maven | "$(Agent.OS)" | **/pom.xml'
restoreKeys: |
maven | "$(Agent.OS)"
maven
path: $(MAVEN_CACHE_FOLDER)
- task: Maven@3
displayName: maven install
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
goals: 'clean install'
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: UT other modules
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Punit-tests -pl !hudi-common,!hudi-flink-datasource/hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
options: -Punit-tests -pl $(EXCLUDE_TESTED_MODULES)
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT other modules
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Pfunctional-tests -pl !hudi-common,!hudi-flink-datasource/hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
options: -Pfunctional-tests -pl $(EXCLUDE_TESTED_MODULES)
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- job: IT
displayName: IT modules
timeoutInMinutes: '90'
Expand Down
18 changes: 14 additions & 4 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,19 @@ mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.b
mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.build.skip=false -pl :hudi-hadoop-trinobase-docker -am
```

Alternatively, you can use `docker` cli directly under `hoodie/hadoop`. Note that, you need to manually name your local
image by using `-t` option to match the naming in the `pom.xml`, so that you can update the corresponding image
repository in Docker Hub (detailed steps in the next section).
Alternatively, you can use `docker` cli directly under `hoodie/hadoop` to build images in a faster way. If you use this
approach, make sure you first build Hudi modules with `integration-tests` profile as below so that the latest Hudi jars
built are copied to the corresponding Hudi docker folder, e.g., `$HUDI_DIR/docker/hoodie/hadoop/hive_base/target`, which
is required to build each docker image. Otherwise, the `target/` folder can be missing and `docker` cli complains about
that: `failed to compute cache key: "/target" not found: not found`.

```shell
mvn -Pintegration-tests clean package -DskipTests
```

Note that, to build the image with `docker` cli, you need to manually name your local image by using `-t` option to
match the naming in the `pom.xml`, so that you can update the corresponding image repository in Docker Hub (detailed
steps in the next section).

```shell
# Run under hoodie/hadoop, the <tag> is optional, "latest" by default
Expand Down Expand Up @@ -82,7 +92,7 @@ docker push apachehudi/hudi-hadoop_2.8.4-trinobase_368
You can also easily push the image to the Docker Hub using Docker Desktop app: go to `Images`, search for the image by
the name, and then click on the three dots and `Push to Hub`.

![Push to Docker Hub](push_to_docker_hub.png)
![Push to Docker Hub](images/push_to_docker_hub.png)

Note that you need to ask for permission to upload the Hudi Docker Demo images to the repositories.

Expand Down
2 changes: 1 addition & 1 deletion docker/demo/config/test-suite/cow-spark-long-running.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ dag_content:
first_delete:
config:
num_partitions_delete: 50
num_records_delete: 8000
num_records_delete: 4000
type: SparkDeleteNode
deps: first_upsert
second_validate:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ dag_content:
first_delete:
config:
num_partitions_delete: 50
num_records_delete: 8000
num_records_delete: 4000
type: DeleteNode
deps: first_upsert
second_hive_sync:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ dag_content:
first_delete:
config:
num_partitions_delete: 50
num_records_delete: 8000
num_records_delete: 4000
type: DeleteNode
deps: first_upsert
second_validate:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ dag_content:
first_delete:
config:
num_partitions_delete: 50
num_records_delete: 8000
num_records_delete: 4000
type: DeleteNode
deps: first_upsert
second_validate:
Expand Down
Loading