Skip to content

Conversation

@dongkelun
Copy link
Contributor

@dongkelun dongkelun commented Nov 30, 2022

Change Logs

[HUDI-5301] Spark SQL queries support setting parameters through set

Impact

[HUDI-5301] Spark SQL queries support setting parameters through set

Risk level (write none, low medium or high below)

none

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@xiarixiaoyao xiarixiaoyao self-requested a review November 30, 2022 09:49
.key("hoodie.query.use.database")
.defaultValue(false)
.withDocumentation("Whether to add database name to qualify table name when setting parameters in Spark SQL query")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this modification have somethings to do with the pr title?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiarixiaoyao This title is not reflected because the form of set parameter is not supported previously. Adding this parameter is mainly consistent with Hive incremental query: ` HoodieHiveUtils.HOODIE_ INCREMENTAL_ USE_ DATABASE ', mainly considering the case that different databases have the same table name.

The reason why it is not described in detail in the PR is that it is uncertain whether the community will approve this form of query. If necessary, I can add a detailed description in the PR. In addition, only incremental queries are added to the test cases, excluding other query types. If necessary, I can add more detailed test cases

"(obtain latest view, by merging base and (if any) log files)")

val QUERY_USE_DATABASE: ConfigProperty[Boolean] = ConfigProperty
.key("hoodie.query.use.database")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I am a little confused about the config and the use case of this config.

Copy link
Contributor Author

@dongkelun dongkelun Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leesf This configuration is reflected in the test case. The main consideration is that if different databases have the same table name, such as db1.table1 and db2.table1, and if the two tables are queried in the same session at the same time, I only want to set the incremental query parameters of db1.table1:

set hoodie.table1.datasource.query.type=incremental;
set hoodie.table1.datasource.read.begin.instanttime=20221130163703640;

In this way, although I only want to query db1.table1 incrementally, I will also perform incremental queries when querying db2.table1. This is not the effect I expected, so I have this parameter:

set hoodie.query.use.database = true;
set hoodie.db1.table1.datasource.query.type=incremental;
set hoodie.db1.table1.datasource.read.begin.instanttime=20221130163703640;

In this way, we can only perform incremental queries on db1.table1. This configuration is false by default, which is consistent with the Hive incremental query parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR of Hive incremental query:#4083

Copy link
Contributor

@leesf leesf Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it only affects incremental query, maybe hoodie.query.incremental.database is a better name? or it is also affect other types of query? then we need to add more test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also affects other types of queries. I can add test cases of other query types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leesf Hello, I have added test cases of other query types

@dongkelun dongkelun force-pushed the HUDI-5301 branch 2 times, most recently from 48d8e1e to 47c76c2 Compare December 3, 2022 02:33
@hudi-bot
Copy link
Collaborator

hudi-bot commented Dec 3, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@YannByron
Copy link
Contributor

@dongkelun I'm not a big fan of querying hudi table in other modes by setting spark conf. Maybe TableValuedFunction is a better way.

@nsivabalan nsivabalan added priority:blocker Production down; release blocker release-0.12.2 Patches targetted for 0.12.2 labels Dec 5, 2022
@codope codope added priority:critical Production degraded; pipelines stalled area:sql SQL interfaces and removed priority:blocker Production down; release blocker release-0.12.2 Patches targetted for 0.12.2 labels Dec 7, 2022
@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Feb 26, 2024
Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing this PR now since a better should be used. @dongkelun Please reopen if needed with revisions based on the suggestion. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:sql SQL interfaces priority:critical Production degraded; pipelines stalled release-1.0.0 size:M PR with lines of changes in (100, 300]

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

9 participants