-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-26227: Add support of catalog related statements for Hive ql #3288
base: master
Are you sure you want to change the base?
Conversation
@pvary @deniskuzZ: Could you also review this PR? |
A very great work! it's much easier for us to manage catalogs with DDL. |
,catalog.return_ratio | ||
,catalog.return_rank | ||
,catalog.currency_rank | ||
,`catalog`.item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a backward incompatible change. Could we make the catalog
a non-reserved keyword?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, done
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
|
As iceberg REST catalogs are popular it would make sense to add a catalog type as well like HIVE_CATALOG, ICEBERG_REST_CATALOG etc. When adding a new catalog this must a be mandatory. |
@zratkai Does the catalog name meet the requirements? For example, hive corresponds to HIVE_CATALOG, and iceberg_rest corresponds to ICEBERG_REST_CATALOG. |
@wecharyu, I think we need to provide catalog type & connection details:
see https://iceberg.apache.org/docs/1.4.0/flink-ddl/ TYPES:
@zhangbutao, @okumin WDYT? |
I assume we'd like to implement something similar to a federate catalog of Glue Catalog stored in HMS and accessible from Hive. For example, it provides S3 Table integration. It sounds nice. The type(Hive or Iceberg) + properties make sense to express arbitrary access to Iceberg REST catalogs. |
@@ -932,7 +932,7 @@ nonReserved | |||
: | |||
KW_ABORT | KW_ADD | KW_ADMIN | KW_AFTER | KW_ANALYZE | KW_ARCHIVE | KW_ASC | KW_BEFORE | KW_BUCKET | KW_BUCKETS | |||
| KW_CASCADE | KW_CBO | KW_CHANGE | KW_CHECK | KW_CLUSTER | KW_CLUSTERED | KW_CLUSTERSTATUS | KW_COLLECTION | KW_COLUMNS | |||
| KW_COMMENT | KW_COMPACT | KW_COMPACTIONS | KW_COMPUTE | KW_CONCATENATE | KW_CONTINUE | KW_COST | KW_DATA | KW_DAY | |||
| KW_COMMENT | KW_COMPACT | KW_COMPACTIONS | KW_COMPUTE | KW_CONCATENATE | KW_CONTINUE | KW_COST | KW_DATA | KW_DAY | KW_CATALOG | KW_CATALOGS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed CATALOG
is not a reserved word in SQL:2023 👍
POSTHOOK: query: DESC CATALOG test_cat | ||
POSTHOOK: type: DESCCATALOG | ||
POSTHOOK: Input: catalog:test_cat | ||
#### A masked pattern was here #### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly think this should not be masked while it might not be trivial to show it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's masked by the 'tmp' path, splitting the result to multi-lines could display some information in the test.
I am fine with this syntax. But this PR is really just a supplement to HIVE-18685. It just added sql capabilities on top of HIVE-18685 for ease of operation releated HMS catalog. @deniskuzZ @okumin If we're on the same page, we want a multi-catalog capability like Trino. And the multi-catalog is different the HMS catalog of HIVE-18685. Multi-catalog can be used for federated query by using three-layer identifiers like catalog_name.dbName.tblName. For example, select * from hive_catalog.testhivedb.testhivetbl join iceberg_catalog.testdb.testicetbl on testhivetbl.id = testicetbl.id; we can also add other datasource in multi-catalog like jdbc catalog. BTW, HIVE-24396 added the data connector which can map a jdbc database instead of a jdbc table, but it can not map all external databases. With multi-catalog, we can map all external databases at once, just like trino jdbc catalog. Now, I have not figured out how to achieve this multi-catalog ability. I think multi-catalog is beyond the scope of this PR. Of course, maybe we can implement the multi-catalog based on this PR & HIVE-18685. :) |
Yes, I would pursue the multi-catalog capability. This PR mainly focused on new SQL for catalog registration. |
I would like a single source of the overview of the current strategy so that everyone, including new people, can understand it.
|
@deniskuzZ @zhangbutao @okumin Catalog type and properties are needed to support multiple catalogs in hive engine, I think it's better to raise a new PR for it. |
totally ok with that. I'll raise a new feature JIRA and add subtasks discussed here. |
|
What changes were proposed in this pull request?
Implement the ddl statements related to catalog, the statements can refer to HIVE-26227.
Why are the changes needed?
To support basic ddl operation for catalog through Hive ql.
Does this PR introduce any user-facing change?
Yes, we should add these new statements to DDL Document.
How was this patch tested?
Add a qtest
catalog.q
, can be test by command:mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=catalog.q