Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
349 commits
Select commit Hold shift + click to select a range
8e13793
[HUDI-2841] Fixing lazy rollback for MOR with list based strategy (#4…
nsivabalan Nov 25, 2021
e0125a7
[HUDI-2801] Add Amazon CloudWatch metrics reporter (#4081)
umehrot2 Nov 25, 2021
6f5d8d0
[HUDI-2840] Fixed DeltaStreaemer to properly respect configuration pa…
Nov 25, 2021
8340ccb
[HUDI-2005] Removing direct fs call in HoodieLogFileReader (#3865)
nsivabalan Nov 25, 2021
38585e4
[HUDI-2851] Shade org.apache.hadoop.hive.ql.optimizer package for fli…
lsyldliu Nov 26, 2021
f5da9b5
[MINOR] Include hudi-aws in flink bundle jar (#4127)
danny0405 Nov 26, 2021
e554c7f
[HUDI-2852] Table metadata returns empty for non-exist partition (#4117)
minchowang Nov 26, 2021
e9efbdb
[HUDI-2863] Rename option 'hoodie.parquet.page.size' to 'write.parque…
danny0405 Nov 26, 2021
3d75aca
[HUDI-2850] Fixing Clustering CLI - schedule and run command fixes to…
manojpec Nov 26, 2021
5755ff2
[HUDI-2814] Addressing issues w/ Z-order Layout Optimization (#4060)
Nov 26, 2021
a88691f
[MINOR] Fixing test failure to fix CI build failure (#4132)
nsivabalan Nov 26, 2021
f8e0176
[HUDI-2861] Re-use same rollback instant time for failed rollbacks (#…
nsivabalan Nov 26, 2021
d1e83e4
[HUDI-2767] Enabling timeline-server-based marker as default (#4112)
yihua Nov 26, 2021
445208a
[HUDI-2845] Metadata CLI - files/partition file listing fix and new v…
manojpec Nov 26, 2021
8402cac
[HUDI-2848] Excluse guava from hudi-cli pom (#4100)
huleilei Nov 26, 2021
9028e6e
[HUDI-2864] Fix README and scripts with current limitations of hive s…
rmahindra123 Nov 26, 2021
257a6a7
[HUDI-2856] Bit cask disk map delete modified (#4116)
xuzifu666 Nov 26, 2021
9c059ef
[MINOR] Follow ups from HUDI-2861 (re-use same rollback instant for f…
nsivabalan Nov 27, 2021
3a8d64e
[HUDI-2868] Fix skipped HoodieSparkSqlWriterSuite (#4125)
xushiyan Nov 27, 2021
2c7656c
[HUDI-2475] [HUDI-2862] Metadata table creation and avoid bootstrappi…
manojpec Nov 27, 2021
780a2ac
[HUDI-2102] Support hilbert curve for hudi (#3952)
xiarixiaoyao Nov 27, 2021
a1d0ff4
Moving to 0.11.0-SNAPSHOT on master branch.
danny0405 Nov 27, 2021
eca1693
[MINOR] fix typo (#4140)
vortual Nov 28, 2021
52aae36
[MINOR] Fixing integ test suite for hudi-aws and archival validation …
nsivabalan Nov 29, 2021
38e75ea
Removing rfc from release package and fixing release validation scrip…
nsivabalan Nov 29, 2021
536af4b
[MINOR] Fix syntax error in create_source_release.sh (#4150)
danny0405 Nov 29, 2021
3433f00
[MINOR] Fix typo,rename 'getUrlEncodePartitoning' to 'getUrlEncodePar…
dongkelun Nov 30, 2021
a398aad
[HUDI-2642] Add support ignoring case in update sql operation (#3882)
dongkelun Nov 30, 2021
ea009b5
[HUDI-2891] Fix write configs for Java engine in Kafka Connect Sink (…
yihua Nov 30, 2021
24380c2
Revert "[HUDI-2855] Change the default value of 'PAYLOAD_CLASS_NAME' …
Dec 1, 2021
9b254b6
Revert "[HUDI-2856] Bit cask disk map delete modified (#4116)" (#4171)
yihua Dec 1, 2021
f4c25ba
[HUDI-2880] Fixing loading of props from default dir (#4167)
nsivabalan Dec 1, 2021
5284730
[HUDI-2881] Compact the file group with larger log files to reduce wr…
minihippo Dec 2, 2021
772f5ca
Fixed partitions produced by layout optimization in case order-by key…
Dec 2, 2021
61a03bc
[MINOR] Fix the wrong usage of timestamp length variable bug (#4179)
zzzhy Dec 2, 2021
91d2e61
[HUDI-2904] Fix metadata table archival overstepping between regular …
rmahindra123 Dec 2, 2021
934fe54
[HUDI-2914] Fix remote timeline server config for flink (#4191)
danny0405 Dec 3, 2021
f74b3d1
[minor] Refactor write profile to always generate fs view (#4198)
danny0405 Dec 3, 2021
0699521
[HUDI-2924] Refresh the fs view on successful checkpoints for write p…
danny0405 Dec 3, 2021
ca42724
[MINOR] use catalog schema if can not find table schema (#4182)
YannByron Dec 3, 2021
e483f7c
[HUDI-2902] Fixing populate meta fields with Hfile writers and Disabl…
nsivabalan Dec 3, 2021
bed7f98
[HUDI-2911] Removing default value for `PARTITIONPATH_FIELD_NAME` res…
Dec 3, 2021
2f96f43
Revert "[HUDI-2495] Resolve inconsistent key generation for timestamp…
YannByron Dec 3, 2021
383d5ed
[HUDI-2894][HUDI-2905] Metadata table - avoiding key lookup failures …
manojpec Dec 3, 2021
5616830
Revert "[HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTa…
zhangyue19921010 Dec 4, 2021
a799fae
[MINOR] Mitigate CI jobs timeout issues (#4173)
xushiyan Dec 4, 2021
0fd6b2d
[HUDI-2933] DISABLE Metadata table by default (#4213)
vinothchandar Dec 4, 2021
94f45e9
[HUDI-2890] Kafka Connect: Fix failed writes and avoid table service …
rmahindra123 Dec 4, 2021
1d4fb82
[HUDI-2923] Fixing metadata table reader when metadata compaction is …
nsivabalan Dec 4, 2021
568181a
[HUDI-2934] Optimize RequestHandler code style
lsyldliu Dec 4, 2021
36b69d8
[HUDI-2935] Remove special casing of clustering in deltastreamer chec…
vinothchandar Dec 4, 2021
a8fb696
[HUDI-2877] Support flink catalog to help user use flink table conven…
lsyldliu Dec 5, 2021
63b1560
[HUDI-2937] Introduce a pulsar implementation of hoodie write commit …
XuQianJin-Stars Dec 5, 2021
734c9f5
[HUDI-2418] Support HiveSchemaProvider (#3671)
fengjian428 Dec 5, 2021
f0e46bf
[HUDI-2916] Add IssueNavigationLink for IDEA (#4192)
leesf Dec 6, 2021
84b531a
[HUDI-2900] Fix corrupt block end position (#4181)
lsyldliu Dec 6, 2021
57c4bf8
[HUDI-2876] for hive/presto hudi should remove the temp file which cr…
xiarixiaoyao Dec 6, 2021
2d66451
[MINOR] Fix partition path formatting in error log (#4168)
yihua Dec 6, 2021
4a437f2
[MINOR] Use maven-shade-plugin version for hudi-timeline-server-bundl…
zhedoubushishi Dec 6, 2021
6dab307
[MINOR] Remove redundant and conflicting spark-hive dependency (#4228)
codope Dec 7, 2021
e8473b9
[HUDI-2951] Disable remote view storage config for flink (#4237)
danny0405 Dec 7, 2021
c9e18d1
[HUDI-2942] add error message log in HoodieCombineHiveInputFormat (#4…
xuzifu666 Dec 8, 2021
c56d93e
[MINOR] Update DOAP with 0.10.0 Release (#4246)
danny0405 Dec 8, 2021
082faa3
[HUDI-2832][RFC-41] Proposal to integrate Hudi on Snowflake platform …
Dec 8, 2021
7c3f077
[HUDI-2964] Fixing aws lock configs to inherit from HoodieConfig (#4258)
nsivabalan Dec 9, 2021
bd08470
[HUDI-2957] Shade kryo jar for flink bundle jar (#4251)
danny0405 Dec 9, 2021
9c8ad0f
[HUDI-2665] Fix overflow of huge log file in HoodieLogFormatWriter (#…
guanziyue Dec 9, 2021
5ac9ce7
[MINOR] Fix Compile broken (#4263)
leesf Dec 9, 2021
f612a20
[HUDI-2779] Cache BaseDir if HudiTableNotFound Exception thrown (#4014)
Dec 9, 2021
68f8597
[HUDI-2966] Add TaskCompletionListener for HoodieMergeOnReadRDD to cl…
xiarixiaoyao Dec 9, 2021
3fb2f97
[MINOR] FAQ link in SUPPORT_REQUEST template (#4266)
Arun-kc Dec 9, 2021
8321d20
Claiming RFC for data skipping index for updated version (#4271)
nsivabalan Dec 10, 2021
ea154bc
Revert "Claiming RFC for data skipping index for updated version (#42…
nsivabalan Dec 10, 2021
456d74c
[HUDI-2901] Fixed the bug clustering jobs cannot running in parallel …
xiarixiaoyao Dec 10, 2021
c7473a7
[HUDI-2936] Add data count checks in async clustering tests (#4236)
codope Dec 10, 2021
f194566
[HUDI-2849] Improve SparkUI job description for write path (#4222)
YuweiXiao Dec 10, 2021
be36826
[HUDI-2952] Fixing metadata table for non-partitioned dataset (#4243)
nsivabalan Dec 10, 2021
3ad9b12
[HUDI-2912] Fix CompactionPlanOperator typo (#4187)
yuzhaojing Dec 10, 2021
3ce0526
Adding verbose output for metadata validate files command (#4166)
nsivabalan Dec 10, 2021
3ba2909
[HUDI-2892][BUG] Pending Clustering may stain the ActiveTimeLine and …
zhangyue19921010 Dec 10, 2021
72901a3
[HUDI-2784] Add a hudi-trino-bundle for Trino (#4279)
yihua Dec 10, 2021
2d864f7
[HUDI-2814] Make Z-index more generic Column-Stats Index (#4106)
Dec 10, 2021
c48a2a1
[HUDI-2527] Multi writer test with conflicting async table services (…
manojpec Dec 11, 2021
9797fdf
[HUDI-2974] Make the prefix for metrics name configurable (#4274)
rmahindra123 Dec 11, 2021
9bdcee0
[HUDI-2959] Fix the thread leak of cleaning service (#4252)
danny0405 Dec 11, 2021
2dcb3f0
[HUDI-2985] Shade jackson for hudi flink bundle jar (#4284)
danny0405 Dec 11, 2021
b5f05fd
[HUDI-2906] Add a repair util to clean up dangling data and log files…
yihua Dec 11, 2021
8dd0444
[HUDI-2984] Implement #close for AbstractTableFileSystemView (#4285)
danny0405 Dec 11, 2021
15444c9
[HUDI-2946] Upgrade maven plugins to be compatible with higher Java v…
zhedoubushishi Dec 12, 2021
b22c2c6
[HUDI-2938] Metadata table util to get latest file slices for reader/…
manojpec Dec 12, 2021
dd96129
[HUDI-2990] Sync to HMS when deleting partitions (#4291)
XuQianJin-Stars Dec 13, 2021
46de25d
[HUDI-2994] Add judgement to existed partitionPath in the catch code …
minchowang Dec 13, 2021
29bc5fd
[HUDI-2996] Flink streaming reader 'skip_compaction' option does not …
Fugle666 Dec 14, 2021
c8d6bd8
[HUDI-2997] Skip the corrupt meta file for pending rollback action (#…
danny0405 Dec 14, 2021
bc8bf04
[HUDI-2995] Enabling metadata table by default (#4295)
manojpec Dec 14, 2021
dbec6c5
[HUDI-3022] Fix NPE for isDropPartition method (#4319)
XuQianJin-Stars Dec 15, 2021
9a2030a
[HUDI-3024] Add explicit write handler for flink (#4329)
minchowang Dec 15, 2021
3b89457
[HUDI-3025] Add additional wait time for namenode availability during…
yihua Dec 15, 2021
27907de
[HUDI-3028] Use blob storage to speed up CI downloads (#4331)
xushiyan Dec 15, 2021
f5b07a7
[HUDI-2998] claiming rfc number for consistent hashing index (#4303)
YuweiXiao Dec 15, 2021
ea2eba1
[HUDI-3015] Implement #reset and #sync for metadata filesystem view (…
danny0405 Dec 16, 2021
a8a192a
[Minor] Catch and ignore all the exceptions in quietDeleteMarkerDir (…
zhangyue19921010 Dec 16, 2021
294d712
[HUDI-3001] Clean up the marker directory when finish bootstrap opera…
xiarixiaoyao Dec 16, 2021
7e7ad15
[HUDI-3043] Revert async cleaner leak commit to unblock CI failure (#…
nsivabalan Dec 17, 2021
d0087d4
[HUDI-3037] Add back remote view storage config for flink (#4338)
danny0405 Dec 17, 2021
e4cfb42
[HUDI-3046] Claim RFC number for RFC for Compaction / Clustering Serv…
yuzhaojing Dec 17, 2021
9246b16
[HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, wh…
xiarixiaoyao Dec 17, 2021
6eba834
[HUDI-3043] Adding some test fixes to continuous mode multi writer te…
nsivabalan Dec 17, 2021
7784249
[HUDI-2962] InProcess lock provider to guard single writer process wi…
manojpec Dec 18, 2021
4785244
[HUDI-3043] De-coupling multi writer tests (#4362)
nsivabalan Dec 18, 2021
d1d48ed
[HUDI-3029] Transaction manager: avoid deadlock when doing begin and…
manojpec Dec 18, 2021
733732b
[HUDI-3029] Transaction manager: avoid deadlock when doing begin and…
manojpec Dec 18, 2021
dc40397
[HUDI-3064] Fixing a bug in TransactionManager and FileSystemTestLock…
nsivabalan Dec 18, 2021
77abb5c
[HUDI-3054] Fixing default lock configs for FileSystemBasedLock and f…
nsivabalan Dec 18, 2021
f57e28f
[MINOR] Azure CI IT tasks clean up (#4337)
xushiyan Dec 19, 2021
bb99836
[HUDI-3052] Fix flaky testJsonKafkaSourceResetStrategy (#4381)
xushiyan Dec 19, 2021
478f9f3
[minor] fix NetworkUtils#getHostname (#4355)
danny0405 Dec 19, 2021
03f71ef
[HUDI-2970] Adding tests for archival of replace commit actions (#4268)
nsivabalan Dec 19, 2021
4a48f99
[HUDI-3064][HUDI-3054] FileSystemBasedLockProviderTestClass tryLock f…
manojpec Dec 19, 2021
3ca9210
remove unused import (#4349)
xuzifu666 Dec 20, 2021
f166dda
[MINOR] Remove unused method in HoodieActiveTimeline (#4401)
xuzifu666 Dec 20, 2021
982ae3d
[MINOR] Increasing CI timeout to 90 mins (#4407)
nsivabalan Dec 21, 2021
f3f6112
[HUDI-3070] Add rerunFailingTestsCount for flakly testes (#4398)
zhangyue19921010 Dec 21, 2021
32a44bb
[HUDI-2970] Add test for archiving replace commit (#4345)
xushiyan Dec 21, 2021
7d046f9
[HUDI-3008] Fixing HoodieFileIndex partition column parsing for neste…
harsh1231 Dec 14, 2021
92f54ce
[HUDI-3027] Update hudi-examples README.md (#4330)
Aimiyoo Dec 21, 2021
f1286c2
[HUDI-3032] Do not clean the log files right after compaction for met…
danny0405 Dec 22, 2021
15eb7e8
[HUDI-2547] Schedule Flink compaction in service (#4254)
yuzhaojing Dec 22, 2021
b5890cd
Merge pull request #4308 from harsh1231/HUDI-3008
xiarixiaoyao Dec 22, 2021
1a5f869
[HUDI-3011] Adding ability to read entire data with HoodieIncrSource …
nsivabalan Dec 22, 2021
5d93edc
[HUDI-3060] drop table for spark sql (#4364)
XuQianJin-Stars Dec 22, 2021
57f43de
[MINOR] Fix DedupeSparkJob typo (#4418)
Aimiyoo Dec 22, 2021
032b883
[HUDI-3014] Add table option to set utc timezone (#4306)
xuzifu666 Dec 23, 2021
4721073
[MINOR] Remove unused method in HoodieActiveTimeline (#4435)
xuzifu666 Dec 24, 2021
7b07aac
[HUDI-3101] Excluding compaction instants from pending rollback info …
danny0405 Dec 25, 2021
c81df99
[HUDI-3102] Do not store rollback plan in inflight instant (#4445)
danny0405 Dec 25, 2021
282aa68
[HUDI-3099] Purge drop partition for spark sql (#4436)
XuQianJin-Stars Dec 28, 2021
6409fc7
[HUDI-2374] Fixing AvroDFSSource does not use the overridden schema t…
harsh1231 Dec 28, 2021
1f7afba
[HUDI-3093] fix spark-sql query table that write with TimestampBasedK…
YannByron Dec 28, 2021
32505d5
[HUDI-3106] Fix HiveSyncTool not sync schema (#4452)
XuQianJin-Stars Dec 28, 2021
05942e0
[HUDI-2811] Support Spark 3.2 (#4270)
YannByron Dec 28, 2021
3d7a869
Fixing dynamoDbLockConfig required prop check (#4422)
nsivabalan Dec 28, 2021
9412281
[HUDI-2983] Remove Log4j2 transitive dependencies (#4281)
umehrot2 Dec 28, 2021
a29b27c
[MINOR] HoodieInstantTimeGenerator improve method used (#4462)
xuzifu666 Dec 29, 2021
504747e
[HUDI-3108] Fix Purge Drop MOR Table Cause error (#4455)
XuQianJin-Stars Dec 29, 2021
5c0e4ce
Revert "[HUDI-3043] Revert async cleaner leak commit to unblock CI fa…
nsivabalan Dec 30, 2021
674c149
[HUDI-3083] Support component data types for flink bulk_insert (#4470)
lsyldliu Dec 30, 2021
436becf
[HUDI-2675] Fix the exception 'Not an Avro data file' when archive an…
dongkelun Dec 30, 2021
0f0088f
[HUDI-3124] Bootstrap when timeline have completed instant (#4467)
yuzhaojing Dec 30, 2021
a4e622a
[HUDI-1951] Add bucket hash index, compatible with the hive bucket (#…
minihippo Dec 30, 2021
e88b5fd
[HUDI-3120] Cache compactionPlan in buffer (#4463)
yuzhaojing Dec 31, 2021
2444f40
[HUDI-3095] abstract partition filter logic to enable code reuse (#4454)
YuweiXiao Dec 31, 2021
ef9923f
[HUDI-3107]Fix HiveSyncTool drop partitions using JDBC or hivesql or …
zhangyue19921010 Dec 31, 2021
bfa169d
[HUDI-3040] Fix HoodieSparkBootstrapExample error info for usage (#4341)
Aimiyoo Jan 1, 2022
188d033
[HUDI-3134] Fix insert error after adding columns on Spark 3.2.0 (#4488)
leesf Jan 2, 2022
1622b52
[HUDI-3136] Fix merge/insert/show partitions error on Spark3.2 (#4490)
YannByron Jan 2, 2022
fe9406d
[HUDI-3131] fix ctas error in spark3.1.1 (#4476)
YannByron Jan 2, 2022
1e2d2c4
[HUDI-3138] Fix broken UT test for TestHiveSyncTool.testDropPartition…
zhangyue19921010 Jan 3, 2022
0273f2e
[MINOR] Update README.md (#4492)
xushiyan Jan 3, 2022
2b2ae34
[HUDI-2558] Fixing Clustering w/ sort columns with null values fails …
harsh1231 Jan 3, 2022
29ab6fb
[HUDI-3140] Fix bulk_insert failure on Spark 3.2.0 (#4498)
leesf Jan 4, 2022
7329d22
Adding tests to validate different key generators (#4473)
nsivabalan Jan 4, 2022
aaf5727
[HUDI-2774] Handle duplicate instants when fetching pending clusterin…
codope Jan 4, 2022
bf4e3d6
[HUDI-3141] Metadata merged log record reader - avoiding NullPointerE…
manojpec Jan 4, 2022
37b15ff
[HUDI-3147] Add endpoint_url to dynamodb lock provider (#4500)
parisni Jan 4, 2022
a66212d
[HUDI-2966] Closing LogRecordScanner in compactor (#4478)
nsivabalan Jan 5, 2022
0e297c0
[HUDI-3171] Sync empty table to hive metastore (#4511)
danny0405 Jan 5, 2022
75133f9
[HUDI-3170] Do not preserve filename when preserveCommitMetadata enab…
codope Jan 5, 2022
eee715b
[HUDI-3168] Fixing null schema with empty commit in incremental relat…
vinishjail97 Jan 5, 2022
205e48f
[HUDI-3132] Minor fixes for HoodieCatalog
lsyldliu Dec 31, 2021
50fa5a6
Update HiveIncrementalPuller to configure filesystem (#4431)
hehexiaoduantui Jan 6, 2022
b6891d2
[HUDI-44] Adding support to preserve commit metadata for compaction (…
nsivabalan Jan 6, 2022
2954027
[HUDI-52] Enabling savepoint and restore for MOR table (#4507)
nsivabalan Jan 6, 2022
8718c30
[HUDI-3165] Enabling InProcessLockProvider for all multi-writer tests…
nsivabalan Jan 6, 2022
f0c2912
[MINOR] Remove unused methods in HoodieColumnProjectionUtils (#4408)
xuzifu666 Jan 6, 2022
d7afc58
[HUDI-3118] Add default HUDI_DIR in setupKafka.sh (#4460)
cdmikechen Jan 6, 2022
b2b23f5
[HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with …
zhangyue19921010 Jan 7, 2022
2467c13
[HUDI-3100] Add config for hive conditional sync (#4440)
xushiyan Jan 7, 2022
76a7264
[HUDI-3188] Update quick start guide for Kafka Connect Sink for Hudi …
yihua Jan 7, 2022
b1df606
[MINOR] fix typos in DDLExecutor (#4534)
dongkelun Jan 7, 2022
2e561de
[HUDI-2947] Fixing checkpoint fetch in detlastreamer (#4485)
nsivabalan Jan 7, 2022
518488c
[HUDI-3185] HoodieConfig#getBoolean should return false when default …
codope Jan 7, 2022
4f6cdd7
[HUDI-3192] Spark metastore schema evolution broken (#4533)
dongkelun Jan 8, 2022
03a83ff
[HUDI-3195] optimize spark3 pom and modify build command (#4538)
YannByron Jan 8, 2022
8275499
[HUDI-2909] Handle logical type in TimestampBasedKeyGenerator (#4203)
codope Jan 8, 2022
46bb00e
[HUDI-3139] Shade htrace and parquet-avro in presto bundle (#4495)
codope Jan 8, 2022
98ec215
[HUDI-3178] Fixing metadata table compaction so as to not include unc…
nsivabalan Jan 8, 2022
0d8ca8d
[HUDI-3104] Kafka-connect support of hadoop config environments and p…
cdmikechen Jan 9, 2022
3679070
[HUDI-3125] spark-sql write timestamp directly (#4471)
YannByron Jan 9, 2022
cf362fb
[MINOR] Fix some code style issues based on check-style plugin (#4532)
zhangyue19921010 Jan 9, 2022
977d3c6
[HUDI-3157] Remove aws jars from hudi bundles (#4542)
Jan 9, 2022
604d988
[HUDI-3009] making some fixes to S3 incremental source (#4517)
nsivabalan Jan 9, 2022
e9a7f49
[HUDI-3112] Fix KafkaConnect cannot sync to Hive Problem (#4458)
cdmikechen Jan 9, 2022
56f93f4
Removing rollbacks instants from timeline for restore operation (#4518)
nsivabalan Jan 10, 2022
251d4eb
[HUDI-3030] InProcessLockPovider as default when any async servcies e…
manojpec Jan 10, 2022
bc95571
[HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi (#4544)
yihua Jan 10, 2022
7a8b94c
[HUDI-3180] Include files from completed commits while bootstrapping …
nsivabalan Jan 10, 2022
f230eca
[MINOR] Fix port number in setupKafka.sh (#4546)
yihua Jan 10, 2022
c8df9b0
[HUDI-3148] Create pushgateway client based on port (#4497)
low-on-mana Jan 10, 2022
f1e3762
[HUDI-2950] Addressing performance traps in Bulk Insert/Layout Optimi…
Jan 11, 2022
67ad499
Removing extraneous warn logs in ClusteringUtils (#4553)
nsivabalan Jan 11, 2022
f74cd57
[HUDI-3195] Fix spark 3 pom (#4554)
xushiyan Jan 11, 2022
c9bc626
[HUDI-3211] Claim RFC number for RFC for Hudi Connector for Presto (#…
7c00 Jan 11, 2022
4b2fd37
[MINOR] Remove unused static var in HoodieAvroWriteSupport (#4543)
xuzifu666 Jan 11, 2022
6cdcd89
[HUDI-3094] Unify Hive's InputFormat implementations to avoid duplica…
Jan 11, 2022
a392e9b
[HUDI-485] Corrected the check for incremental sql (#2768)
pratyakshsharma Jan 12, 2022
4b01119
[HUDI-3184] hudi-flink support timestamp-micros (#4548)
AirToSupply Jan 12, 2022
017ddbb
[MINOR] Fix typos (#4567)
dongkelun Jan 12, 2022
9fe28e5
[HUDI-3045] New clustering regex match config to choose partitions wh…
zhangyue19921010 Jan 12, 2022
2969fb3
[HUDI-3233] Make metadata commit synchronous for flink batch
todd5167 Jan 12, 2022
8a40d95
[HUDI-3225] Claim RFC-45 for async metadata indexing (#4569)
codope Jan 12, 2022
12e9577
[HUDI-3235] Fix ClassNotFoundException due to log4j-core dependency (…
codope Jan 12, 2022
397795c
[HUDI-3007] Fix issues in HoodieRepairTool (#4564)
yihua Jan 12, 2022
209f91c
[HUDI-3010] Unbundle parquet-avro and shade other dependencies in prs…
codope Jan 13, 2022
195dac9
[MINOR] Disable flaky tests to unlock CI (#4592)
codope Jan 14, 2022
5ce45c4
[HUDI-3172] Refactor hudi existing modules to make more code reuse in…
leesf Jan 14, 2022
7d163ee
[MINOR] Fix local flaky test in TestFSUtils (#4596)
yihua Jan 14, 2022
53f75f8
[HUDI-2785] Add Trino setup in Docker Demo (#4300)
yihua Jan 14, 2022
5e0171a
[HUDI-3198] Improve Spark SQL create table from existing hudi table (…
YannByron Jan 14, 2022
822230d
[MINOR] Optimize variable names and logs (#4581)
dongkelun Jan 16, 2022
28b3b6a
[MINOR] Remove org.apache.directory.api.util.Strings import (#4601)
0x3E6 Jan 16, 2022
d2dda55
[HUDI-2968] add UT for update/delete on non-pk condition (#4568)
YannByron Jan 16, 2022
ed92c21
[MINOR] Delete unused parameter in TablePathUtils (#4595)
Timzhang01 Jan 17, 2022
75caa7d
[HUDI-3179] Extracted common `AbstractHoodieTableFileIndex` to be sha…
Jan 17, 2022
36a9f63
[HUDI-3257] Excluding clustering instants from pending rollback info …
danny0405 Jan 17, 2022
d365337
[HUDI-3194] fix MOR snapshot query during compaction (#4540)
YuweiXiao Jan 17, 2022
20e7983
[HUDI-3252] Avoid creating empty requestedReplaceCommit in the startC…
dongkelun Jan 17, 2022
f184474
[HUDI-1558] Struct Stream Source Support Spark3 (#4586)
Jan 18, 2022
3d93e85
[MINOR] Minor improvement in JsonkafkaSource (#4620)
wangxianghu Jan 18, 2022
3b56320
[HUDI-3261] Read rt table by hive cli throw NoSuchMethodError (#4624)
EchoLee5 Jan 18, 2022
45f054f
[HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…
danny0405 Jan 18, 2022
a09c231
[HUDI-2903] get table schema from the last commit with data written (…
YannByron Jan 18, 2022
caeea94
[HUDI-3245] Convert uppercase letters to lowercase in storage configs…
cdmikechen Jan 18, 2022
4bea758
[HUDI-3191] Rebasing Hive's FileInputFormat onto `AbstractHoodieTable…
Jan 18, 2022
7647562
[HUDI-2833][Design] Merge small archive files instead of expanding in…
zhangyue19921010 Jan 19, 2022
db93ad2
[HUDI-3277] Filter non-parquet files in bootstrap procedure (#4639)
wangxianghu Jan 19, 2022
a08a2b7
[MINOR] Add instructions to build and upload Docker Demo images (#4612)
yihua Jan 20, 2022
31b57a2
[HUDI-3236] use fields'comments persisted in catalog to fill in schem…
YannByron Jan 20, 2022
b7a79aa
[HUDI-3283] Bootstrap support overwrite existing table (#4647)
wangxianghu Jan 20, 2022
14d08bb
[MINOR] Fix typo in the doc of BULK_INSERT_SORT_MODE (#4652)
wangxianghu Jan 20, 2022
a66004a
[HUDI-3285] Drop unused method SparkBootstrapCommitActionExecutor#han…
wangxianghu Jan 20, 2022
2071e3b
[HUDI-3250] Upgrade Presto docker image (#4646)
codope Jan 20, 2022
79bf6ab
[HUDI-3281][Performance]Tuning performance of getAllPartitionPaths AP…
zhangyue19921010 Jan 20, 2022
8547f11
[HUDI-3271] Code optimization and clean up unused code in HoodieSpark…
dongkelun Jan 20, 2022
4b90850
[HUDI-3268] Fix NPE while reading table with Spark datasource (#4630)
yihua Jan 21, 2022
64b1426
[minor] Fix hive-exec scope of flink bundle jar (#4664)
danny0405 Jan 23, 2022
56cd8ff
[HUDI-2837] Add support for using database name in incremental query …
dongkelun Jan 23, 2022
e72553a
[HUDI-3262] Fixing utilities and integ test suite bundle to include h…
nsivabalan Jan 23, 2022
f7a7796
[HUDI-1850][HUDI-3234] Fixing read of a empty table but with failed w…
nsivabalan Jan 23, 2022
cfde45b
[HUDI-3282] Fix delete exception for Spark SQL when sync Hive (#4644)
dongkelun Jan 23, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/SUPPORT_REQUEST.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ labels: question

**_Tips before filing an issue_**

- Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
- Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?

- Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

Expand Down
8 changes: 8 additions & 0 deletions .github/workflows/bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,16 @@ jobs:
include:
- scala: "scala-2.11"
spark: "spark2"
- scala: "scala-2.11"
spark: "spark2,spark-shade-unbundle-avro"
- scala: "scala-2.12"
spark: "spark3.1.x"
- scala: "scala-2.12"
spark: "spark3.1.x,spark-shade-unbundle-avro"
- scala: "scala-2.12"
spark: "spark3"
- scala: "scala-2.12"
spark: "spark3,spark-shade-unbundle-avro"
steps:
- uses: actions/checkout@v2
- name: Set up JDK 8
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ local.properties
# IntelliJ specific files/directories #
#######################################
.out
.idea
.idea/*
!.idea/vcs.xml
*.ipr
*.iws
*.iml
Expand Down
36 changes: 36 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -159,3 +159,9 @@ its NOTICE file:
This product includes software developed at
StreamSets (http://www.streamsets.com/).

--------------------------------------------------------------------------------

This product includes code from hilbert-curve project
* Copyright https://github.com/davidmoten/hilbert-curve
* Licensed under the Apache-2.0 License

10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Prerequisites for building Apache Hudi:
* Unix-like system (like Linux, Mac OS X)
* Java 8 (Java 9 or 10 may work)
* Git
* Maven
* Maven (>=3.3.1)

```
# Checkout code and build
Expand All @@ -78,12 +78,16 @@ The default Scala version supported is 2.11. To build for Scala 2.12 version, bu
mvn clean package -DskipTests -Dscala-2.12
```

### Build with Spark 3.0.0
### Build with Spark 3

The default Spark version supported is 2.4.4. To build for Spark 3.0.0 version, build using `spark3` profile
The default Spark version supported is 2.4.4. To build for different Spark 3 versions, use the corresponding profile

```
# Build against Spark 3.2.0 (the default build shipped with the public Spark 3 bundle)
mvn clean package -DskipTests -Dspark3

# Build against Spark 3.1.2
mvn clean package -DskipTests -Dspark3.1.x
```

### Build without spark-avro module
Expand Down
43 changes: 27 additions & 16 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,14 @@ variables:
MAVEN_OPTS: '-Dmaven.repo.local=$(MAVEN_CACHE_FOLDER) -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true'
SPARK_VERSION: '2.4.4'
HADOOP_VERSION: '2.7'
SPARK_HOME: $(Pipeline.Workspace)/spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION)
SPARK_ARCHIVE: spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION)

stages:
- stage: test
jobs:
- job: UT_FT_1
displayName: UT FT common & flink & UT client/spark-client
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
Expand All @@ -47,7 +48,7 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
options: -DskipTests
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
Expand All @@ -71,6 +72,7 @@ stages:
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- job: UT_FT_2
displayName: FT client/spark-client
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
Expand All @@ -85,7 +87,7 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
options: -DskipTests
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
Expand All @@ -99,7 +101,8 @@ stages:
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- job: UT_FT_3
displayName: UT FT cli & utilities & sync/hive-sync
displayName: UT FT clients & cli & utilities & sync/hive-sync
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
Expand All @@ -114,30 +117,31 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
options: -DskipTests
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- task: Maven@3
displayName: UT cli & utilities & sync/hive-sync
displayName: UT clients & cli & utilities & sync/hive-sync
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Punit-tests -pl hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
options: -Punit-tests -pl hudi-client/hudi-client-common,hudi-client/hudi-flink-client,hudi-client/hudi-java-client,hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT cli & utilities & sync/hive-sync
displayName: FT clients & cli & utilities & sync/hive-sync
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Pfunctional-tests -pl hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
options: -Pfunctional-tests -pl hudi-client/hudi-client-common,hudi-client/hudi-flink-client,hudi-client/hudi-java-client,hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- job: UT_FT_4
displayName: UT FT other modules
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
Expand All @@ -152,7 +156,7 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
options: -DskipTests
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
Expand All @@ -161,7 +165,7 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Punit-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
options: -Punit-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
Expand All @@ -170,16 +174,23 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Pfunctional-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
options: -Pfunctional-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- job: IT
steps:
- task: AzureCLI@2
displayName: Prepare for IT
inputs:
azureSubscription: apachehudici-service-connection
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
echo 'Downloading $(SPARK_ARCHIVE)'
az storage blob download -c ci-caches -n $(SPARK_ARCHIVE).tgz -f $(Pipeline.Workspace)/$(SPARK_ARCHIVE).tgz --account-name apachehudici
tar -xvf $(Pipeline.Workspace)/$(SPARK_ARCHIVE).tgz -C $(Pipeline.Workspace)/
mkdir /tmp/spark-events/
- script: |
echo 'Downloading spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION)'
wget https://archive.apache.org/dist/spark/spark-$(SPARK_VERSION)/spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION).tgz -O $(Pipeline.Workspace)/spark-$(SPARK_VERSION).tgz
tar -xvf $(Pipeline.Workspace)/spark-$(SPARK_VERSION).tgz -C $(Pipeline.Workspace)/
mkdir /tmp/spark-events/
mvn $(MAVEN_OPTS) -Pintegration-tests verify
displayName: IT
26 changes: 26 additions & 0 deletions conf/hudi-defaults.conf.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running Hudi jobs.
# This is useful for setting default environmental settings.

# Example:
# hoodie.datasource.hive_sync.jdbcurl jdbc:hive2://localhost:10000
# hoodie.datasource.hive_sync.use_jdbc true
# hoodie.datasource.hive_sync.support_timestamp false
# hoodie.index.type BLOOM
# hoodie.metadata.enable false
5 changes: 5 additions & 0 deletions doap_HUDI.rdf
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,11 @@
<created>2021-08-26</created>
<revision>0.9.0</revision>
</Version>
<Version>
<name>Apache Hudi 0.10.0</name>
<created>2021-12-08</created>
<revision>0.10.0</revision>
</Version>
</release>
<repository>
<GitRepository>
Expand Down
93 changes: 93 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
<!--
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
-->

# Docker Demo for Hudi

This repo contains the docker demo resources for building docker demo images, set up the demo, and running Hudi in the
docker demo environment.

## Repo Organization

### Configs for assembling docker images - `/hoodie`

The `/hoodie` folder contains all the configs for assembling necessary docker images. The name and repository of each
docker image, e.g., `apachehudi/hudi-hadoop_2.8.4-trinobase_368`, is defined in the maven configuration file `pom.xml`.

### Docker compose config for the Demo - `/compose`

The `/compose` folder contains the yaml file to compose the Docker environment for running Hudi Demo.

### Resources and Sample Data for the Demo - `/demo`

The `/demo` folder contains useful resources and sample data use for the Demo.

## Build and Test Image locally

To build all docker images locally, you can run the script:

```shell
./build_local_docker_images.sh
```

To build a single image target, you can run

```shell
mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.build.skip=false -pl :<image_target> -am
# For example, to build hudi-hadoop-trinobase-docker
mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.build.skip=false -pl :hudi-hadoop-trinobase-docker -am
```

Alternatively, you can use `docker` cli directly under `hoodie/hadoop`. Note that, you need to manually name your local
image by using `-t` option to match the naming in the `pom.xml`, so that you can update the corresponding image
repository in Docker Hub (detailed steps in the next section).

```shell
# Run under hoodie/hadoop, the <tag> is optional, "latest" by default
docker build <image_folder_name> -t <hub-user>/<repo-name>[:<tag>]
# For example, to build trinobase
docker build trinobase -t apachehudi/hudi-hadoop_2.8.4-trinobase_368
```

After new images are built, you can run the following script to bring up docker demo with your local images:

```shell
./setup_demo.sh dev
```

## Upload Updated Image to Repository on Docker Hub

Once you have built the updated image locally, you can push the corresponding this repository of the image to the Docker
Hud registry designated by its name or tag:

```shell
docker push <hub-user>/<repo-name>:<tag>
# For example
docker push apachehudi/hudi-hadoop_2.8.4-trinobase_368
```

You can also easily push the image to the Docker Hub using Docker Desktop app: go to `Images`, search for the image by
the name, and then click on the three dots and `Push to Hub`.

![Push to Docker Hub](push_to_docker_hub.png)

Note that you need to ask for permission to upload the Hudi Docker Demo images to the repositories.

You can find more information on [Docker Hub Repositories Manual](https://docs.docker.com/docker-hub/repos/).

## Docker Demo Setup

Please refer to the [Docker Demo Docs page](https://hudi.apache.org/docs/docker_demo).
Loading