Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connector-V2] Optimize hudi sink #7662

Merged
merged 45 commits into from
Sep 19, 2024
Merged

Conversation

happyboy1024
Copy link
Contributor

Purpose of this pull request

This pr mainly focuses on some optimization of hudi sink. #7597
Moreover, fixed dependency problem with S3 as storage, and fixed multiple tables in spark and flink engines that did not execute savemode properly

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

dengjunjie and others added 30 commits July 3, 2023 17:15
| min_commits_to_keep | Int | no | 20 |
| max_commits_to_keep | Int | no | 30 |
| common-options | config | no | - |
Base configuration:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update zh docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update zh docs

done.

@Hisoka-X Hisoka-X added this to the 2.3.8 milestone Sep 14, 2024
@Hisoka-X Hisoka-X linked an issue Sep 14, 2024 that may be closed by this pull request
3 tasks
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. cc @liugddx

"sh", "-c", "cd /tmp" + " && tar -zxvf " + NAMESPACE_TAR);
try {
Process process = processBuilder.start();
// 等待命令执行完成
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove Chinese

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove Chinese

done.

@Hisoka-X Hisoka-X changed the title [Feature][Hudi Sink] Optimize hudi sink [Feature][Connector-V2] Optimize hudi sink Sep 18, 2024
Comment on lines +49 to +54
op_type="UPSERT"
table_dfs_path = "/tmp/hudi"
database = "st"
table_name = "st_test"
table_type="COPY_ON_WRITE"
record_key_fields="c_bigint"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without new config key op_type, database, table_type, record_key_fields, can the config execute normally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without new config key op_type, database, table_type, record_key_fields, can the config execute normally?

Sorry, I did not consider that these item is empty, I have modified it.

@Hisoka-X
Copy link
Member

Thanks @happyboy1024

Copy link
Member

@liugddx liugddx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@liugddx liugddx merged commit 0d12520 into apache:dev Sep 19, 2024
7 checks passed
@happyboy1024 happyboy1024 deleted the hudi-sink branch September 20, 2024 00:46
@RoderickAdriance
Copy link

I have find some problem when using hudi sink connector.
Use hudi sink connector sink multiple mysql source table to hudi .
It encounter a problem when I use spark sql to query data.
This issue occurs when MySQL data fields contain the decimal type.

499617f07a05aaa6942df857d342920

@happyboy1024
Copy link
Contributor Author

I have find some problem when using hudi sink connector. Use hudi sink connector sink multiple mysql source table to hudi . It encounter a problem when I use spark sql to query data. This issue occurs when MySQL data fields contain the decimal type.

499617f07a05aaa6942df857d342920

You can temporarily set spark.sql.parquet.enableVectorizedReader=false in spark sql enviroment. This error will be fixed in this pr. #7845

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [seatunnel-hadoop3-3.1.4-uber] This jar package is not up to date
5 participants