-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5301] Spark SQL queries support setting parameters through set #7339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| .key("hoodie.query.use.database") | ||
| .defaultValue(false) | ||
| .withDocumentation("Whether to add database name to qualify table name when setting parameters in Spark SQL query") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this modification have somethings to do with the pr title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiarixiaoyao This title is not reflected because the form of set parameter is not supported previously. Adding this parameter is mainly consistent with Hive incremental query: ` HoodieHiveUtils.HOODIE_ INCREMENTAL_ USE_ DATABASE ', mainly considering the case that different databases have the same table name.
The reason why it is not described in detail in the PR is that it is uncertain whether the community will approve this form of query. If necessary, I can add a detailed description in the PR. In addition, only incremental queries are added to the test cases, excluding other query types. If necessary, I can add more detailed test cases
| "(obtain latest view, by merging base and (if any) log files)") | ||
|
|
||
| val QUERY_USE_DATABASE: ConfigProperty[Boolean] = ConfigProperty | ||
| .key("hoodie.query.use.database") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I am a little confused about the config and the use case of this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leesf This configuration is reflected in the test case. The main consideration is that if different databases have the same table name, such as db1.table1 and db2.table1, and if the two tables are queried in the same session at the same time, I only want to set the incremental query parameters of db1.table1:
set hoodie.table1.datasource.query.type=incremental;
set hoodie.table1.datasource.read.begin.instanttime=20221130163703640;In this way, although I only want to query db1.table1 incrementally, I will also perform incremental queries when querying db2.table1. This is not the effect I expected, so I have this parameter:
set hoodie.query.use.database = true;
set hoodie.db1.table1.datasource.query.type=incremental;
set hoodie.db1.table1.datasource.read.begin.instanttime=20221130163703640;In this way, we can only perform incremental queries on db1.table1. This configuration is false by default, which is consistent with the Hive incremental query parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR of Hive incremental query:#4083
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it only affects incremental query, maybe hoodie.query.incremental.database is a better name? or it is also affect other types of query? then we need to add more test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also affects other types of queries. I can add test cases of other query types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leesf Hello, I have added test cases of other query types
48d8e1e to
47c76c2
Compare
|
@dongkelun I'm not a big fan of querying hudi table in other modes by setting spark conf. Maybe |
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing this PR now since a better should be used. @dongkelun Please reopen if needed with revisions based on the suggestion. Thank you.
Change Logs
[HUDI-5301] Spark SQL queries support setting parameters through set
Impact
[HUDI-5301] Spark SQL queries support setting parameters through set
Risk level (write none, low medium or high below)
none
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist