Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
d794f4f
[MINOR] Optimize code logic (#5499)
qianchutao May 5, 2022
abb4893
[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
guanziyue May 5, 2022
248b059
[HUDI-4042] Support truncate-partition for Spark-3.2 (#5506)
jinxing64 May 6, 2022
c319ee9
[HUDI-4017] Improve spark sql coverage in CI (#5512)
xushiyan May 6, 2022
52fe1c9
[HUDI-3675] Adding post write termination strategy to deltastreamer c…
nsivabalan May 6, 2022
9625d16
[HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ config…
cxzl25 May 7, 2022
80f9989
[MINOR] Fixing class not found when using flink and enable metadata t…
BruceKellan May 7, 2022
569a76a
[MINOR] fixing flaky tests in deltastreamer tests (#5521)
nsivabalan May 7, 2022
75eaa0b
[HUDI-4055]refactor ratelimiter to avoid stack overflow (#5530)
guanziyue May 9, 2022
4c70840
[MINOR] Fixing close for HoodieCatalog's test (#5531)
XuQianJin-Stars May 9, 2022
6b47ef6
[HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti…
xicm May 9, 2022
6285a23
[HUDI-3995] Making perf optimizations for bulk insert row writer path…
nsivabalan May 9, 2022
6fd21d0
[HUDI-4044] When reading data from flink-hudi to external storage, th…
aliceyyan May 10, 2022
4258a71
[HUDI-4003] Try to read all the log file to parse schema (#5473)
lanyuanxiaoyao May 10, 2022
4a8589f
[HUDI-4038] Avoid calling `getDataSize` after every record written (#…
May 11, 2022
7f0c1f3
[HUDI-4079] Supports showing table comment for hudi with spark3 (#5546)
jinxing64 May 11, 2022
b10ca7e
[HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHo…
nsivabalan May 11, 2022
ecd47e7
[HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improvin…
zhangyue19921010 May 12, 2022
0cec955
[HUDI-4018][HUDI-4027] Adding integ test yamls for immutable use-case…
nsivabalan May 13, 2022
701f8c0
[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5528)
cuibo01 May 13, 2022
8ad0bb9
[MINOR] Fix a NPE for Option (#5461)
xccui May 13, 2022
7fb436d
[HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact…
cuibo01 May 13, 2022
a704e37
[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5574)
cuibo01 May 13, 2022
5c4813f
[HUDI-4072] Fix NULL schema for empty batches in deltastreamer (#5543)
nsivabalan May 13, 2022
52e63b3
[HUDI-4097] add table info to jobStatus (#5529)
wqwl611 May 14, 2022
6e16e71
[HUDI-3980] Suport kerberos hbase index (#5464)
xicm May 14, 2022
75f8476
[HUDI-4001] Filter the properties should not be used when create tabl…
dongkelun May 16, 2022
1fded18
fix hive sync no partition table error (#5585)
bettermouse May 16, 2022
61030d8
[HUDI-3123] consistent hashing index: basic write path (upsert/insert…
YuweiXiao May 16, 2022
43e0819
[HUDI-4098] Metadata table heartbeat for instant has expired, last he…
danny0405 May 16, 2022
a7a42e4
[HUDI-4103] [HUDI-4001] Filter the properties should not be used when…
dongkelun May 16, 2022
ad773b3
[HUDI-3654] Preparations for hudi metastore. (#5572)
minihippo May 17, 2022
fdd96cc
[HUDI-4104] DeltaWriteProfile includes the pending compaction file sl…
danny0405 May 17, 2022
d52d133
[HUDI-4101] BucketIndexPartitioner should take partition path for bet…
danny0405 May 17, 2022
d422f69
[HUDI-4087] Support dropping RO and RT table in DropHoodieTableComman…
jinxing64 May 17, 2022
99555c8
[HUDI-4110] Clean the marker files for flink compaction (#5604)
BruceKellan May 17, 2022
f8b9399
[MINOR] Fixing spark long running yaml for non-partitioned (#5607)
nsivabalan May 17, 2022
ebbe56e
[minor] Some code refactoring for LogFileComparator and Instant insta…
danny0405 May 18, 2022
f1f8a1a
[HUDI-4109] Copy the old record directly when it is chosen for mergin…
danny0405 May 18, 2022
a1017c6
Clean the marker files for flink compaction (#5611)
loukey-lj May 18, 2022
008616c
[HUDI-3942] [RFC-50] Improve Timeline Server (#5392)
yuzhaojing May 18, 2022
199f642
[HUDI-4111] Bump ANTLR runtime version in Spark 3.x (#5606)
cxzl25 May 18, 2022
551aa95
Revert "[HUDI-3870] Add timeout rollback for flink online compaction …
danny0405 May 18, 2022
6573469
[HUDI-4116] Unify clustering/compaction related procedures' output ty…
huberylee May 19, 2022
6f37863
[HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#i…
danny0405 May 19, 2022
1da0b21
[HUDI-4119] the first read result is incorrect when Flink upsert- Kaf…
aliceyyan May 20, 2022
c7576f7
[HUDI-4130] Remove the upgrade/downgrade for flink #initTable (#5642)
danny0405 May 20, 2022
85b146d
[HUDI-3985] Refactor DLASyncTool to support read hoodie table as spar…
huberylee May 20, 2022
7d02b1f
[MINOR] Minor fixes to exception log and removing unwanted metrics fl…
nsivabalan May 20, 2022
2af9830
[HUDI-4122] Fix NPE caused by adding kafka nodes (#5632)
wangxianghu May 21, 2022
b5adba3
[MINOR] remove unused gson test dependency (#5652)
xushiyan May 21, 2022
8ec625d
[HUDI-3858] Shade javax.servlet for Spark bundle jar (#5295)
zhangyue19921010 May 21, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/workflows/bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,11 @@ jobs:
if: ${{ !endsWith(env.SPARK_PROFILE, '3.2') }} # skip test spark 3.2 before hadoop upgrade to 3.x
run:
mvn test -Punit-tests -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -D"$FLINK_PROFILE" -DfailIfNoTests=false -pl hudi-examples/hudi-examples-flink,hudi-examples/hudi-examples-java,hudi-examples/hudi-examples-spark
- name: Spark SQL Test
env:
SCALA_PROFILE: ${{ matrix.scalaProfile }}
SPARK_PROFILE: ${{ matrix.sparkProfile }}
FLINK_PROFILE: ${{ matrix.flinkProfile }}
if: ${{ !endsWith(env.SPARK_PROFILE, '2.4') }} # skip test spark 2.4 as it's covered by Azure CI
run:
mvn test -Punit-tests -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -D"$FLINK_PROFILE" '-Dtest=org.apache.spark.sql.hudi.Test*' -pl hudi-spark-datasource/hudi-spark
53 changes: 53 additions & 0 deletions docker/demo/config/test-suite/deltastreamer-immutable-dataset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: deltastreamer-immutable-dataset.yaml
dag_rounds: 5
dag_intermittent_delay_mins: 0
dag_content:
first_bulk_insert:
config:
record_size: 200
num_partitions_insert: 10
repeat_count: 3
num_records_insert: 5000
type: BulkInsertNode
deps: none
first_validate:
config:
validate_hive: false
delete_input_data: false
type: ValidateDatasetNode
deps: first_bulk_insert
first_insert:
config:
record_size: 200
num_partitions_insert: 10
repeat_count: 3
num_records_insert: 5000
type: InsertNode
deps: first_validate
second_validate:
config:
validate_hive: false
delete_input_data: false
type: ValidateDatasetNode
deps: first_insert
last_validate:
config:
execute_itr_count: 5
delete_input_data: true
type: ValidateAsyncOperations
deps: second_validate
38 changes: 38 additions & 0 deletions docker/demo/config/test-suite/deltastreamer-pure-bulk-inserts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: deltastreamer-pure-bulk-inserts.yaml
dag_rounds: 10
dag_intermittent_delay_mins: 0
dag_content:
first_bulk_insert:
config:
record_size: 200
num_partitions_insert: 10
repeat_count: 3
num_records_insert: 5000
type: BulkInsertNode
deps: none
second_validate:
config:
validate_hive: false
delete_input_data: false
type: ValidateDatasetNode
deps: first_bulk_insert
last_validate:
config:
execute_itr_count: 10
type: ValidateAsyncOperations
deps: second_validate
38 changes: 38 additions & 0 deletions docker/demo/config/test-suite/deltastreamer-pure-inserts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: deltastreamer-pure-inserts.yaml
dag_rounds: 10
dag_intermittent_delay_mins: 0
dag_content:
first_insert:
config:
record_size: 200
num_partitions_insert: 10
repeat_count: 3
num_records_insert: 5000
type: InsertNode
deps: none
second_validate:
config:
validate_hive: false
delete_input_data: false
type: ValidateDatasetNode
deps: first_insert
last_validate:
config:
execute_itr_count: 10
type: ValidateAsyncOperations
deps: second_validate
3 changes: 1 addition & 2 deletions docker/demo/config/test-suite/insert-overwrite.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ dag_name: simple-deltastreamer.yaml
dag_rounds: 1
dag_intermittent_delay_mins: 1
dag_content:

first_insert:
config:
record_size: 1000
Expand Down Expand Up @@ -91,4 +90,4 @@ dag_content:
validate_hive: false
delete_input_data: false
type: ValidateDatasetNode
deps: third_upsert
deps: third_upsert
2 changes: 1 addition & 1 deletion docker/demo/config/test-suite/multi-writer-1-ds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: simple-deltastreamer.yaml
dag_rounds: 3
dag_rounds: 6
dag_intermittent_delay_mins: 0
dag_content:
first_insert:
Expand Down
52 changes: 52 additions & 0 deletions docker/demo/config/test-suite/multi-writer-1-sds.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: cow-spark-simple.yaml
dag_rounds: 6
dag_intermittent_delay_mins: 0
dag_content:
first_insert:
config:
record_size: 1000
num_partitions_insert: 1
repeat_count: 1
num_records_insert: 100000
start_partition: 1
type: SparkInsertNode
deps: none
first_upsert:
config:
record_size: 1000
num_partitions_insert: 1
num_records_insert: 50000
repeat_count: 1
num_records_upsert: 50000
num_partitions_upsert: 1
start_partition: 1
type: SparkUpsertNode
deps: first_insert
first_delete:
config:
num_partitions_delete: 0
num_records_delete: 10000
start_partition: 1
type: SparkDeleteNode
deps: first_upsert
second_validate:
config:
validate_hive: false
delete_input_data: true
type: ValidateDatasetNode
deps: first_delete
4 changes: 2 additions & 2 deletions docker/demo/config/test-suite/multi-writer-2-sds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: cow-spark-simple.yaml
dag_rounds: 3
dag_intermittent_delay_mins: 0
dag_rounds: 5
dag_intermittent_delay_mins: 1
dag_content:
first_insert:
config:
Expand Down
52 changes: 52 additions & 0 deletions docker/demo/config/test-suite/multi-writer-3-sds.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: cow-spark-simple.yaml
dag_rounds: 4
dag_intermittent_delay_mins: 1
dag_content:
first_insert:
config:
record_size: 1000
num_partitions_insert: 1
repeat_count: 1
num_records_insert: 100000
start_partition: 20
type: SparkInsertNode
deps: none
first_upsert:
config:
record_size: 1000
num_partitions_insert: 1
num_records_insert: 50000
repeat_count: 1
num_records_upsert: 50000
num_partitions_upsert: 1
start_partition: 20
type: SparkUpsertNode
deps: first_insert
first_delete:
config:
num_partitions_delete: 0
num_records_delete: 10000
start_partition: 20
type: SparkDeleteNode
deps: first_upsert
second_validate:
config:
validate_hive: false
delete_input_data: true
type: ValidateDatasetNode
deps: first_delete
52 changes: 52 additions & 0 deletions docker/demo/config/test-suite/multi-writer-4-sds.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
dag_name: cow-spark-simple.yaml
dag_rounds: 4
dag_intermittent_delay_mins: 1
dag_content:
first_insert:
config:
record_size: 1000
num_partitions_insert: 1
repeat_count: 1
num_records_insert: 100000
start_partition: 30
type: SparkInsertNode
deps: none
first_upsert:
config:
record_size: 1000
num_partitions_insert: 1
num_records_insert: 50000
repeat_count: 1
num_records_upsert: 50000
num_partitions_upsert: 1
start_partition: 30
type: SparkUpsertNode
deps: first_insert
first_delete:
config:
num_partitions_delete: 0
num_records_delete: 10000
start_partition: 30
type: SparkDeleteNode
deps: first_upsert
second_validate:
config:
validate_hive: false
delete_input_data: true
type: ValidateDatasetNode
deps: first_delete
Loading