Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
143 commits
Select commit Hold shift + click to select a range
e00a904
[HUDI-3072] Fixing conflict resolution in transaction management code…
nsivabalan Jan 24, 2022
7bd389f
[MINOR] typo fix in BaseTableMetadata wrt spurious deletes handling (…
zhangyue19921010 Jan 24, 2022
87db4de
[MINOR] Add default value as null for S3 Incremental source propertie…
vinishjail97 Jan 24, 2022
1f7b6b2
[HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient (#3…
dongkelun Jan 24, 2022
6f10107
[HUDI-3306] Upgrade rocksdb version (#4663)
stym06 Jan 24, 2022
bc7882c
[HUDI-2872][HUDI-2646] Refactoring layout optimization (clustering) f…
Jan 24, 2022
26c3f79
[HUDI-3237] gracefully fail to change column data type (#4677)
YannByron Jan 25, 2022
bf409e8
[MINOR] Standardize HoodieSqlCommon.g4 file (#4582)
xuzifu666 Jan 25, 2022
920f459
[HUDI-1822] Rewriting rfc-27 for data skipping index (#4280)
nsivabalan Jan 25, 2022
78e6ab0
[HUDI-3217] Claim the number for RFC-46 (#4687)
Jan 25, 2022
9363804
[MINOR] Fixing serializability with ListingBasedRollbackRequest (#4655)
nsivabalan Jan 26, 2022
dd4ce1b
[HUDI-3328] Updating doap file for release 0.10.1 (#4689)
nsivabalan Jan 26, 2022
f87c473
[HUDI-2763] Metadata table records - support for key deduplication ba…
manojpec Jan 26, 2022
3f21e5f
[MINOR] Fixing serializability of SerializableHoodieRollbackRequest (…
nsivabalan Jan 26, 2022
4a9f826
[HUDI-3215] Solve UT for Spark 3.2 (#4565)
YannByron Jan 26, 2022
0bd38f2
[HUDI-2596] Make class names consistent in hudi-client (#4680)
xushiyan Jan 28, 2022
2b52a56
[HUDI-2688][RFC-40] A new Hudi connector for Trino (#3957)
codope Jan 28, 2022
e78b2f1
[HUDI-2943] Complete pending clustering before deltastreamer sync (#4…
codope Jan 29, 2022
c0e8b03
[HUDI-1977] Fix Hudi CLI tempview query issue (#4626)
peanut-chenzhong Jan 29, 2022
ed7aa13
[MINOR] Added log to debug checkpoint resumption when set to 0 (#4650)
h7kanna Jan 29, 2022
ecbad95
[HUDI-3253] preferred to use the table's own location (#4608)
YannByron Jan 29, 2022
d3cfe07
[HUDI-3318] [RFC-46] Optimize Record Payload handling (#4697)
Feb 1, 2022
4b388c1
[HUDI-3292] Enabling lazy read by default for log blocks during compa…
nsivabalan Feb 1, 2022
7ce0f45
[HUDI-2711] Fallback to fulltable scan for IncrementalRelation if und…
Feb 1, 2022
f140c58
[HUDI-3346] Fixing non existant marker dir handling in TwoToOnedowngr…
nsivabalan Feb 1, 2022
4e61e5c
[HUDI-3293] Fixing default value for clustering small file config to…
nsivabalan Feb 1, 2022
16138db
[HUDI-3368] Revert "[HUDI-3306] Upgrade rocksdb version (#4663)" (#4733)
nsivabalan Feb 1, 2022
72f7348
[HUDI-2589] RFC-37: Metadata table based bloom index (#3989)
manojpec Feb 1, 2022
caef3d5
[HUDI-3330] Remove fixture test tables for multi writer tests (#4704)
xushiyan Feb 2, 2022
a68e1dc
[HUDI-431] Adding support for Parquet in MOR `LogBlock`s (#4333)
Feb 2, 2022
819e801
[HUDI-3322][HUDI-3343] Fixing Metadata Table Records Duplication Issu…
Feb 2, 2022
d681824
[HUDI-3337] Fixing Parquet Column Range metadata extraction (#4705)
Feb 3, 2022
5927bdd
[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…
manojpec Feb 3, 2022
69dfcda
[HUDI-3191] Removing duplicating file-listing process w/in Hive's MOR…
Feb 3, 2022
b8601a9
[HUDI-2656] Generalize HoodieIndex for flexible record data type (#3893)
yihua Feb 4, 2022
0880a8a
[HUDI-3344] Standard format for HoodieDataSourceExample.scala (#4717)
dcoliversun Feb 7, 2022
de206ac
[HUDI-3369] New ScheduleAndExecute mode for HoodieCompactor and hudi-…
zhangyue19921010 Feb 7, 2022
538db18
[HUDI-2491] Expose HMS mode metastore uri config option for spark wri…
fuyun2024 Feb 7, 2022
24f738f
[HUDI-3360] Adding retries to deltastreamer for source errors (#4744)
nsivabalan Feb 7, 2022
773b317
[HUDI-2941] Show _hoodie_operation in spark sql results (#4649)
XuQianJin-Stars Feb 7, 2022
3f263b8
[HUDI-3206] Unify Hive's MOR implementations to avoid duplication (#4…
Feb 7, 2022
3bd8fc1
[HUDI-3058] Simplify Precommit file system view (#4570)
satishkotha Feb 7, 2022
8ab6f17
[HUDI-3373] Add zero value metrics for empty data source and PROMETHE…
vinishjail97 Feb 7, 2022
0ab1a8e
[HUDI-3312] Fixing spark yaml and adding hive validation to integ tes…
nsivabalan Feb 8, 2022
1636876
[HUDI-3320] Hoodie metadata table validator (#4721)
zhangyue19921010 Feb 8, 2022
ab73047
Adding support for custom scheduler configs with streaming sink (#4762)
nsivabalan Feb 8, 2022
6a32cfe
[HUDI-3091] Making SIMPLE index as the default index type (#4659)
nsivabalan Feb 8, 2022
60831d6
[HUDI-3361] Fixing missing begin checkpoint in HoodieIncremental pull…
nsivabalan Feb 8, 2022
973087f
[HUDI-3276] Rebased Parquet-based `FileInputFormat` impls to inherit …
Feb 8, 2022
464027e
[HUDI-3239] Convert `BaseHoodieTableFileIndex` to Java (#4669)
Feb 9, 2022
b3b4423
[HUDI-3389] Bump flink version to 1.14.3 (#4776)
danny0405 Feb 10, 2022
0ababcf
[HUDI-1847] Adding inline scheduling support for spark datasource pat…
nsivabalan Feb 10, 2022
e7ec3a8
[HUDI-2432] Adding restore.requested instant and restore plan for res…
nsivabalan Feb 10, 2022
d971974
[HUDI-3333] fix that getNestedFieldVal breaks with Spark 3.2 (#4783)
YannByron Feb 10, 2022
1c77859
[HUDI-3395] Allow pass rollbackUsingMarkers to Hudi CLI rollback comm…
zhedoubushishi Feb 10, 2022
2fe7a3a
[HUDI-2610] pass the spark version when sync the table created by spa…
YannByron Feb 10, 2022
ba4e732
[HUDI-2987] Update all deprecated calls to new apis in HoodieRecordPa…
nsivabalan Feb 11, 2022
10474e0
[HUDI-3402] Set TIMESTAMP_MICROS as the default value for hoodie.parq…
YannByron Feb 11, 2022
b431246
[HUDI-3338] Custom relation instead of HadoopFsRelation (#4709)
YannByron Feb 11, 2022
89ed6f0
[HUDI-3362] Fix restore to rollback pending clustering operations fol…
satishkotha Feb 11, 2022
9518f78
[HUDI-3413]fix jackson parse error when empty message from JsonKafkaS…
zhangxiang17 Feb 12, 2022
ce9762d
[MINOR] unused import (#4799)
wangxianghu Feb 12, 2022
6aba00e
[MINOR] Fix typos in Spark client related classes (#4781)
yihua Feb 13, 2022
55777fe
[HUDI-2413] fix Sql source's checkpoint issue (#3648)
fengjian428 Feb 14, 2022
76e2faa
[HUDI-3370] The files recorded in the commit may not match the actual…
zhangyue19921010 Feb 14, 2022
93ee09f
[HUDI-3412] TypedProperties no need to create new set when check key …
Feb 14, 2022
94806d5
[HUDI-3272] If `mode==ignore && tableExists`, do not execute write lo…
dongkelun Feb 14, 2022
5ca4480
[HUDI-3417] Switch AbstractTableFileSystemView#filterBaseFileAfterPen…
yuzhaojing Feb 14, 2022
0db1e97
[HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Dataso…
leesf Feb 14, 2022
bcfd8ef
[MINOR] Prevent async service from starting twice (#4801)
xushiyan Feb 14, 2022
e639d99
[HUDI-1657] Fix the build on aarch64, Fedora 33 (#4617)
guyuqi Feb 14, 2022
0a97a98
[HUDI-3398] Fix TableSchemaResolver for all file formats and metadata…
zhangyue19921010 Feb 15, 2022
3b401d8
[HUDI-3200] deprecate hoodie.file.index.enable and unify to use BaseF…
YannByron Feb 15, 2022
27bd7b5
[HUDI-1576] Make archiving an async service (#4795)
xushiyan Feb 15, 2022
cb6ca7f
[HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…
YannByron Feb 15, 2022
fe02c64
fix build & ci (#4822)
YannByron Feb 15, 2022
538ec44
[HUDI-2931] Add config to disable table services (#4777)
xushiyan Feb 15, 2022
9a05940
[HUDI-3366] Remove hardcoded logic of disabling metadata table in tes…
yihua Feb 15, 2022
3363c66
[HUDI-3394] Check isWriteLockedByCurrentThread before unlock for InPr…
zhangyue19921010 Feb 16, 2022
aaddaf5
[HUDI-3280] Cleaning up Hive-related hierarchies after refactoring (#…
Feb 16, 2022
ba0afe1
[HUDI-3426] Sync datasource clustering config (#4828)
codope Feb 17, 2022
433c257
[HUDI-3442]Duplicate code calls for 'FlinkOptions.flatOptions' (#4832)
zhangxiang17 Feb 17, 2022
2844a77
[HUDI-3439] Remove the hive shade pattern for flink bundle jar (#4833)
danny0405 Feb 17, 2022
ed106f6
[HUDI-2809] Introduce a checksum mechanism for validating hoodie.prop…
codope Feb 18, 2022
de8161a
HoodieSortedMergeHandle#close write data disorder (#4841)
loukey-lj Feb 18, 2022
fba5822
[HUDI-3430] Fix Deltastreamer to properly shut down the services upon…
yihua Feb 18, 2022
5009138
[HUDI-3438] Avoid getSmallFiles if hoodie.parquet.small.file.limit is…
Feb 18, 2022
f15125c
[HUDI-3389] fix ColumnarArrayData ClassCastException issue (#4842)
stayrascal Feb 19, 2022
8327997
[HUDI-3446] Supports batch reader in BootstrapOperator#loadRecords (#…
cuibo01 Feb 19, 2022
66ac144
[MINOR] Moving spark scheduling configs out of DataSourceOptions (#4843)
nsivabalan Feb 20, 2022
0938f55
[HUDI-3458] Fix BulkInsertPartitioner generic type (#4854)
xushiyan Feb 20, 2022
359fbfd
[HUDI-2648] Retry FileSystem action instead of failed directly. (#3887)
zhangyue19921010 Feb 20, 2022
76b6ad6
[HUDI-2732][RFC-38] Spark Datasource V2 Integration (#3964)
leesf Feb 21, 2022
17cb5cb
[HUDI-3432] Fixing restore with metadata enabled (#4849)
nsivabalan Feb 21, 2022
d36fe24
[HUDI-3455] Fixing checkpoint management in hoodie incr source (#4850)
nsivabalan Feb 21, 2022
bf16bc1
[HUDI-349]: Added new cleaning policy based on number of hours (#3646)
pratyakshsharma Feb 21, 2022
801fdab
[HUDI-3042] Abstract Spark update Strategy to make code more clean an…
Feb 21, 2022
0c95018
[HUDI-3423] upgrade spark to 3.2.1 (#4815)
YannByron Feb 22, 2022
0dee8ed
[HUDI-2925] Fix duplicate cleaning of same files when unfinished clea…
prashantwason Feb 22, 2022
7e1ea06
[MINOR] Fix typos and improve docs in HoodieMetadataConfig (#4867)
yihua Feb 22, 2022
14dbbdf
[HUDI-2189] Adding delete partitions support to DeltaStreamer (#4787)
nsivabalan Feb 22, 2022
4d1f74e
[HUDI-3464] Fix wrong exception thrown from HiveSchemaProvider (#4865)
wangxianghu Feb 22, 2022
4affdd0
[HUDI-3461] The archived timeline for flink streaming reader should n…
danny0405 Feb 22, 2022
b87e95d
[HUDI-3476] Remove the shade pattern for parquet for flink bundle jar…
danny0405 Feb 22, 2022
9678c3f
[MINOR] Fixing checkpoint management in S3IncrSource (#4871)
nsivabalan Feb 22, 2022
01cbdde
Add hive-standalone-metastore dependency to hudi-flink-bundle module …
xiaozhch5 Feb 23, 2022
dabae80
[HUDI-3420] Remove duplicates type in HoodieClusteringGroup.avsc (#4808)
yuzhaojing Feb 23, 2022
4e8accc
[HUDI-3486] Fix wrong field order for constructing HoodieMetadataColu…
yihua Feb 23, 2022
2a93b8e
[HUDI-3489] Unify config to avoid duplicate code (#4883)
leesf Feb 23, 2022
62605be
[HUDI-3480][HUDI-3481] Enchancements to integ test suite (#4884)
nsivabalan Feb 23, 2022
943b997
[HUDI-3488] The flink small file list should exclude file slices with…
yanenze Feb 24, 2022
521338b
[HUDI-3161] Add Call Produce Command for Spark SQL (#4535)
XuQianJin-Stars Feb 24, 2022
85e8a5c
[HUDI-1296] Support Metadata Table in Spark Datasource (#4789)
Feb 24, 2022
aa1810d
[HUDI-3493] Not table to get execution plan (#4894)
XuQianJin-Stars Feb 25, 2022
3694485
[HUDI-3429] Support clustering scheduleAndExecute for hudi-cli and ad…
zhangyue19921010 Feb 25, 2022
45d1216
[HUDI-3401] fix NPE caused by incorrect beforeKeyGenClassName valida…
todd5167 Feb 25, 2022
a4ee746
[HUDI-3474] Add more document to Pipelines for the usage of this tool…
danny0405 Feb 25, 2022
7428100
[HUDI-3421]Pending clustering may break AbstractTableFileSystemView#g…
zhangyue19921010 Feb 25, 2022
b50f4b4
[HUDI-3042] Refactor clustering executors (#4847)
xushiyan Feb 25, 2022
92cdc59
[HUDI-3515] Making rdd unpersist optional at the end of writes (#4898)
scxwhite Feb 25, 2022
6a5cfb4
[MINOR] Fix table type in input format test (#4912)
codope Feb 25, 2022
1379300
[HUDI-3483] Adding insert override nodes to integ test suite and few …
nsivabalan Feb 26, 2022
c77b259
[HUDI-2439] Remove SparkBoundedInMemoryExecutor (#4860)
xushiyan Feb 26, 2022
2f99e84
[HUDI-3521] Fixing kakfa key and value serializer value type from cla…
nsivabalan Feb 27, 2022
d5444ff
[HUDI-3018] Adding validation to dataframe scheme to ensure reserved …
nsivabalan Feb 27, 2022
1932152
[MINOR] Change MINI_BATCH_SIZE to 2048 (#4862)
cuibo01 Feb 28, 2022
4a59876
[HUDI-2917] rollback insert data appended to log file when using Hbas…
nsivabalan Feb 28, 2022
8f1e4f5
[HUDI-3528] Fix String convert issue and overwrite putAll method in T…
stayrascal Feb 28, 2022
05e395a
[HUDI-3341] Fix log file reader for S3 with hadoop-aws 2.7.x (#4897)
yihua Feb 28, 2022
18dc89c
[HUDI-3450] Avoid passing empty string spark master to hudi cli (#4844)
zhedoubushishi Feb 28, 2022
44b8ab6
[HUDI-3418] Save timeout option for remote RemoteFileSystemView (#4809)
yuzhaojing Feb 28, 2022
257052a
[HUDI-3465] Add validation of column stats and bloom filters in Hoodi…
yihua Mar 1, 2022
f7088a9
[HUDI-3497] Adding Datatable validator tool (#4902)
nsivabalan Mar 1, 2022
a81a632
[HUDI-3441] Add support for "marker delete" in hudi-cli (#4922)
XuQianJin-Stars Mar 1, 2022
3fdc933
[HUDI-3516] Implement record iterator for HoodieDataBlock (#4909)
cuibo01 Mar 2, 2022
3cfb52c
[MINOR] fix get builtin function issue from Hudi catalog (#4917)
stayrascal Mar 2, 2022
3b2da9f
[HUDI-2631] In CompactFunction, set up the write schema each time wit…
yuzhaojing Mar 2, 2022
85f47b5
[HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reprodu…
Mar 2, 2022
10d866f
[HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer (#4679)
garyli1019 Mar 2, 2022
1d57bd1
[minor] Cosmetic changes following HUDI-3315 (#4934)
danny0405 Mar 2, 2022
f8945ec
[MINOR] Adding more test props to integ tests (#4935)
nsivabalan Mar 2, 2022
527bd34
[MINOR] RFC-38 markdown content error (#4933)
liujinhui1994 Mar 2, 2022
907e60c
[HUDI-3264]: made schema registry urls configurable with MTDS (#4779)
pratyakshsharma Mar 2, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
7 changes: 6 additions & 1 deletion .github/workflows/bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,16 @@ jobs:
include:
- scala: "scala-2.11"
spark: "spark2"
skipModules: ""
- scala: "scala-2.11"
spark: "spark2,spark-shade-unbundle-avro"
skipModules: ""
- scala: "scala-2.12"
spark: "spark3.1.x"
skipModules: "!hudi-spark-datasource/hudi-spark3"
- scala: "scala-2.12"
spark: "spark3.1.x,spark-shade-unbundle-avro"
skipModules: "!hudi-spark-datasource/hudi-spark3"
- scala: "scala-2.12"
spark: "spark3"
- scala: "scala-2.12"
Expand All @@ -40,4 +44,5 @@ jobs:
env:
SCALA_PROFILE: ${{ matrix.scala }}
SPARK_PROFILE: ${{ matrix.spark }}
run: mvn install -P "$SCALA_PROFILE,$SPARK_PROFILE" -DskipTests=true -Dmaven.javadoc.skip=true -B -V
SKIP_MODULES: ${{ matrix.skipModules }}
run: mvn install -P "$SCALA_PROFILE,$SPARK_PROFILE" -pl "$SKIP_MODULES" -DskipTests=true -Dmaven.javadoc.skip=true -B -V
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ mvn clean package -DskipTests -Dscala-2.12
The default Spark version supported is 2.4.4. To build for different Spark 3 versions, use the corresponding profile

```
# Build against Spark 3.2.0 (the default build shipped with the public Spark 3 bundle)
# Build against Spark 3.2.1 (the default build shipped with the public Spark 3 bundle)
mvn clean package -DskipTests -Dspark3

# Build against Spark 3.1.2
Expand Down
5 changes: 5 additions & 0 deletions doap_HUDI.rdf
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@
<created>2021-12-08</created>
<revision>0.10.0</revision>
</Version>
<Version>
<name>Apache Hudi 0.10.1</name>
<created>2022-01-26</created>
<revision>0.10.1</revision>
</Version>
</release>
<repository>
<GitRepository>
Expand Down
16 changes: 8 additions & 8 deletions docker/demo/config/test-suite/cow-spark-long-running.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: cow-spark-long-running-multi-partitions.yaml
dag_rounds: 50
dag_intermittent_delay_mins: 1
dag_name: cow-spark-deltastreamer-long-running-multi-partitions.yaml
dag_rounds: 30
dag_intermittent_delay_mins: 0
dag_content:
first_insert:
config:
record_size: 1000
record_size: 200
num_partitions_insert: 50
repeat_count: 1
num_records_insert: 10000
Expand All @@ -33,12 +33,12 @@ dag_content:
deps: first_insert
first_validate:
config:
validate_hive: true
validate_hive: false
type: ValidateDatasetNode
deps: first_hive_sync
first_upsert:
config:
record_size: 1000
record_size: 200
num_partitions_insert: 50
num_records_insert: 300
repeat_count: 1
Expand All @@ -60,13 +60,13 @@ dag_content:
deps: first_delete
second_validate:
config:
validate_hive: true
validate_hive: false
delete_input_data: true
type: ValidateDatasetNode
deps: second_hive_sync
last_validate:
config:
execute_itr_count: 50
execute_itr_count: 30
validate_clean: true
validate_archival: true
type: ValidateAsyncOperations
Expand Down
6 changes: 3 additions & 3 deletions docker/demo/config/test-suite/cow-spark-simple.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: cow-spark-simple.yaml
dag_rounds: 2
dag_rounds: 1
dag_intermittent_delay_mins: 1
dag_content:
first_insert:
Expand All @@ -33,7 +33,7 @@ dag_content:
deps: first_insert
first_validate:
config:
validate_hive: true
validate_hive: false
type: ValidateDatasetNode
deps: first_hive_sync
first_upsert:
Expand All @@ -60,7 +60,7 @@ dag_content:
deps: first_delete
second_validate:
config:
validate_hive: true
validate_hive: false
delete_input_data: false
type: ValidateDatasetNode
deps: second_hive_sync
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: cow-long-running-multi-partitions.yaml
dag_name: deltastreamer-long-running-multi-partitions.yaml
dag_rounds: 50
dag_intermittent_delay_mins: 1
dag_content:
Expand Down Expand Up @@ -76,7 +76,7 @@ dag_content:
deps: first_delete
second_validate:
config:
validate_hive: false
validate_hive: true
delete_input_data: true
type: ValidateDatasetNode
deps: second_hive_sync
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: deltastreamer-long-running-multi-partitions.yaml
dag_rounds: 50
dag_intermittent_delay_mins: 1
dag_content:
first_insert:
config:
record_size: 1000
num_partitions_insert: 5
repeat_count: 1
num_records_insert: 1000
type: InsertNode
deps: none
second_insert:
config:
record_size: 1000
num_partitions_insert: 50
repeat_count: 1
num_records_insert: 10000
deps: first_insert
type: InsertNode
third_insert:
config:
record_size: 1000
num_partitions_insert: 2
repeat_count: 1
num_records_insert: 300
deps: second_insert
type: InsertNode
first_upsert:
config:
record_size: 1000
num_partitions_insert: 2
num_records_insert: 300
repeat_count: 1
num_records_upsert: 100
num_partitions_upsert: 1
type: UpsertNode
deps: third_insert
first_delete:
config:
num_partitions_delete: 50
num_records_delete: 8000
type: DeleteNode
deps: first_upsert
second_hive_sync:
config:
queue_name: "adhoc"
engine: "mr"
type: HiveSyncNode
deps: first_delete
second_validate:
config:
validate_hive: false
delete_input_data: true
type: ValidateDatasetNode
deps: second_hive_sync
last_validate:
config:
execute_itr_count: 50
validate_clean: true
validate_archival: true
type: ValidateAsyncOperations
deps: second_validate
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# to be used with test-aggressive-clean-archival.properties

dag_name: deltastreamer-long-running-multi-partitions.yaml
dag_rounds: 20
dag_intermittent_delay_mins: 1
dag_content:
first_insert:
config:
record_size: 1000
num_partitions_insert: 5
repeat_count: 1
num_records_insert: 1000
type: InsertNode
deps: none
second_insert:
config:
record_size: 1000
num_partitions_insert: 50
repeat_count: 1
num_records_insert: 10000
deps: first_insert
type: InsertNode
third_insert:
config:
record_size: 1000
num_partitions_insert: 2
repeat_count: 1
num_records_insert: 300
deps: second_insert
type: InsertNode
first_upsert:
config:
record_size: 1000
num_partitions_insert: 2
num_records_insert: 300
repeat_count: 1
num_records_upsert: 100
num_partitions_upsert: 1
type: UpsertNode
deps: third_insert
first_delete:
config:
num_partitions_delete: 50
num_records_delete: 8000
type: DeleteNode
deps: first_upsert
second_hive_sync:
config:
queue_name: "adhoc"
engine: "mr"
type: HiveSyncNode
deps: first_delete
second_validate:
config:
validate_hive: false
delete_input_data: false
type: ValidateDatasetNode
deps: second_hive_sync
last_validate:
config:
execute_itr_count: 20
validate_clean: true
validate_archival: true
type: ValidateAsyncOperations
deps: second_validate
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: cow-long-running-example.yaml
dag_name: detlastreamer-long-running-example.yaml
dag_rounds: 50
dag_intermittent_delay_mins: 1
dag_content:
Expand Down
Loading