[Feature][Connector-V2] Optimize hudi sink #7662

happyboy1024 · 2024-09-13T13:39:50Z

Purpose of this pull request

This pr mainly focuses on some optimization of hudi sink. #7597
Moreover, fixed dependency problem with S3 as storage, and fixed multiple tables in spark and flink engines that did not execute savemode properly

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config
Update the release-note.

…sed abnormally (apache#5008)

…efore fetch task start (apache#5008)

hailin0 · 2024-09-13T13:59:03Z

docs/en/connector-v2/sink/Hudi.md

-| min_commits_to_keep        | Int    | no       | 20            |
-| max_commits_to_keep        | Int    | no       | 30            |
-| common-options             | config | no       | -             |
+Base configuration:


Please update zh docs

Please update zh docs

done.

Hisoka-X

Overall LGTM. cc @liugddx

Hisoka-X · 2024-09-18T11:56:34Z

...nnector-hudi-e2e/src/test/java/org/apache/seatunnel/e2e/connector/hudi/HudiMultiTableIT.java

+                            "sh", "-c", "cd /tmp" + " && tar -zxvf " + NAMESPACE_TAR);
+                    try {
+                        Process process = processBuilder.start();
+                        // 等待命令执行完成


Please remove Chinese

Please remove Chinese

done.

Hisoka-X · 2024-09-19T02:29:52Z

...unnel-e2e/seatunnel-connector-v2-e2e/connector-hudi-e2e/src/test/resources/fake_to_hudi.conf

+    op_type="UPSERT"
    table_dfs_path = "/tmp/hudi"
+    database = "st"
    table_name = "st_test"
+    table_type="COPY_ON_WRITE"
+    record_key_fields="c_bigint"


Without new config key op_type, database, table_type, record_key_fields, can the config execute normally?

Without new config key op_type, database, table_type, record_key_fields, can the config execute normally?

Sorry, I did not consider that these item is empty, I have modified it.

Hisoka-X · 2024-09-19T10:42:43Z

Thanks @happyboy1024

liugddx

Thanks!

RoderickAdriance · 2024-10-12T02:15:36Z

I have find some problem when using hudi sink connector.
Use hudi sink connector sink multiple mysql source table to hudi .
It encounter a problem when I use spark sql to query data.
This issue occurs when MySQL data fields contain the decimal type.

happyboy1024 · 2024-10-15T09:16:57Z

I have find some problem when using hudi sink connector. Use hudi sink connector sink multiple mysql source table to hudi . It encounter a problem when I use spark sql to query data. This issue occurs when MySQL data fields contain the decimal type.

You can temporarily set spark.sql.parquet.enableVectorizedReader=false in spark sql enviroment. This error will be fixed in this pr. #7845

dengjunjie and others added 30 commits July 3, 2023 17:15

[Bug][connector-cdc-mysql] mysql connections and memory of jvm increa…

4756e79

…sed abnormally (apache#5008)

Merge branch 'apache:dev' into dev

2cb759a

Merge branch 'apache:dev' into dev

6ada441

Merge branch 'apache:dev' into dev

cbe375d

Merge branch 'apache:dev' into dev

d6275ec

[bugfix][connector-cdc-mysql] reset the listener of binaryLogClient b…

68dbb48

…efore fetch task start (apache#5008)

Merge branch 'apache:dev' into dev

5b7e4d5

Merge branch 'apache:dev' into dev

7769017

Merge branch 'apache:dev' into dev

0de3c21

Merge branch 'apache:dev' into dev

7a3467e

Merge branch 'apache:dev' into dev

7262bd9

Merge branch 'apache:dev' into dev

0d55ee3

Merge branch 'apache:dev' into dev

0bfa9b8

Merge branch 'apache:dev' into dev

22a9a38

Merge branch 'apache:dev' into dev

1ce9847

Merge branch 'apache:dev' into dev

8dfcc4b

Merge branch 'apache:dev' into dev

d1dc354

Merge branch 'apache:dev' into dev

3df3e1a

Merge branch 'apache:dev' into dev

4e7302b

Merge branch 'apache:dev' into dev

48ea3fe

Merge branch 'apache:dev' into dev

d9a2dbb

Merge branch 'apache:dev' into dev

f664b3a

Merge branch 'apache:dev' into dev

8b28cbb

Merge branch 'apache:dev' into dev

f0e7ca6

Merge branch 'apache:dev' into dev

de0cf26

Merge branch 'apache:dev' into dev

f5033d2

Merge branch 'apache:dev' into dev

0ca5c12

Merge branch 'apache:dev' into dev

6808127

Merge branch 'apache:dev' into dev

b576be2

Merge branch 'apache:dev' into dev

d85ef98

Merge remote-tracking branch 'origin/dev' into hudi-sink

8f59bf1

github-actions bot added document core SeaTunnel core module connectors-v2 e2e hudi labels Sep 13, 2024

fix word error

413c105

hailin0 reviewed Sep 13, 2024

View reviewed changes

happyboy1024 added 3 commits September 14, 2024 11:33

fix license error and adjust word

092b6bf

fix ci error

332ae42

change hadoop-aws dependency

bc68375

Hisoka-X added this to the 2.3.8 milestone Sep 14, 2024

Hisoka-X linked an issue Sep 14, 2024 that may be closed by this pull request

[Bug] [seatunnel-hadoop3-3.1.4-uber] This jar package is not up to date #7588

Closed

3 tasks

happyboy1024 added 2 commits September 18, 2024 16:25

rollback hadoop-aws dependency and fix s3 case in st.

95d2be7

fix st-dist test error.

d48c21c

Hisoka-X reviewed Sep 18, 2024

View reviewed changes

Hisoka-X changed the title ~~[Feature][Hudi Sink] Optimize hudi sink~~ [Feature][Connector-V2] Optimize hudi sink Sep 18, 2024

remove chinese description

8818495

Hisoka-X reviewed Sep 19, 2024

View reviewed changes

happyboy1024 added 2 commits September 19, 2024 15:14

fix the configuration item is not set error.

2824198

fix code style check.

eeb3dd3

Hisoka-X approved these changes Sep 19, 2024

View reviewed changes

github-actions bot added approved reviewed labels Sep 19, 2024

liugddx approved these changes Sep 19, 2024

View reviewed changes

liugddx merged commit 0d12520 into apache:dev Sep 19, 2024
7 checks passed

happyboy1024 deleted the hudi-sink branch September 20, 2024 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][Connector-V2] Optimize hudi sink #7662

[Feature][Connector-V2] Optimize hudi sink #7662

happyboy1024 commented Sep 13, 2024

hailin0 Sep 13, 2024

happyboy1024 Sep 14, 2024

Hisoka-X left a comment

Hisoka-X Sep 18, 2024

happyboy1024 Sep 19, 2024

Hisoka-X Sep 19, 2024

happyboy1024 Sep 19, 2024

Hisoka-X commented Sep 19, 2024

liugddx left a comment

RoderickAdriance commented Oct 12, 2024

happyboy1024 commented Oct 15, 2024

[Feature][Connector-V2] Optimize hudi sink #7662

[Feature][Connector-V2] Optimize hudi sink #7662

Conversation

happyboy1024 commented Sep 13, 2024

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

hailin0 Sep 13, 2024

Choose a reason for hiding this comment

happyboy1024 Sep 14, 2024

Choose a reason for hiding this comment

Hisoka-X left a comment

Choose a reason for hiding this comment

Hisoka-X Sep 18, 2024

Choose a reason for hiding this comment

happyboy1024 Sep 19, 2024

Choose a reason for hiding this comment

Hisoka-X Sep 19, 2024

Choose a reason for hiding this comment

happyboy1024 Sep 19, 2024

Choose a reason for hiding this comment

Hisoka-X commented Sep 19, 2024

liugddx left a comment

Choose a reason for hiding this comment

RoderickAdriance commented Oct 12, 2024

happyboy1024 commented Oct 15, 2024