-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-15314: [C++][Java][FlightRPC] Add missing metadata on Arrow schemas returned by Flight SQL #11999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15314: [C++][Java][FlightRPC] Add missing metadata on Arrow schemas returned by Flight SQL #11999
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format? or See also: |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the metadata here sufficient to cover at least JDBC and ODBC?
java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSqlColumnMetadata.java
Outdated
Show resolved
Hide resolved
java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSqlColumnMetadata.java
Outdated
Show resolved
Hide resolved
Yes, we thought of this values thinking on both JDBC and ODBC. |
|
Just a heads up, this needs rebasing now (sorry for the churn). Given there are some format changes here, we can bundle this with the type info method vote. |
70d11ea to
5ff5df7
Compare
|
No problem. I've already rebased both PRs. |
…of the map in column metadata
5ff5df7 to
b172c92
Compare
|
Sure @lidavidm, let's combine both implementation and make one vote. By the way we don't have the Jira Ticket for this, where can I create the ticket? |
|
It can be created here: https://issues.apache.org/jira/secure/Dashboard.jspa Then just update the PR title as described above: #11999 (comment) |
|
|
|
Thanks for linking this up. BTW, are integration tests planned here too? |
|
@lidavidm Sorry, I've found an error here. I'm fixing it and I will let you know when I finish it |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Just one little thing I noticed on a last pass.
|
@lidavidm I've added the integration test for both PRs and fix the nits on both PRs |
format/FlightSql.proto
Outdated
| * - return 4 (\b100) => [SQL_SUBQUERIES_IN_INS]; | ||
| * - return 5 (\b101) => [SQL_SUBQUERIES_IN_COMPARISONS, SQL_SUBQUERIES_IN_INS]; | ||
| * - return 6 (\b110) => [SQL_SUBQUERIES_IN_INS, SQL_SUBQUERIES_IN_EXISTS]; | ||
| * - return 6 (\b110) => [SQL_SUBQUERIES_IN_COMPARISONS, SQL_SUBQUERIES_IN_EXISTS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change valid? Judging by the above, SQL_SUBQUERIES_IN_COMPARISONS is 0b001 while SQL_SUBQUERIES_IN_INS is 0b100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this seems wrong. Additionally the enum values are sequential (0, 1, 2, 3) not bitmasks (unless those values are intended to be bit indices in which case we should document them as such).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcralmeida any comment here? I forgot about this, let's fix these last few things and then kick off the vote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry @lidavidm , I've changed back to how it was which was the correct one. I don't know how/why I've changed it
format/FlightSql.proto
Outdated
| * - ARROW:FLIGHT:SQL:DB_SCHEMA_NAME - Database schema name | ||
| * - ARROW:FLIGHT:SQL:TABLE_NAME - Table name | ||
| * - ARROW:FLIGHT:SQL:PRECISION - Column precision/size | ||
| * - ARROW:FLIGHT:SQL:SCALE - Column scale/decimal digits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are precision and scale useful? Can they annotate non-decimal columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also requested by the JDBC and ODBC, to be extracted by them via ResultSetMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we annotate this as (if applicable for the column type) or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes senses. Added
|
|
||
| /* | ||
| * Represents a SQL query. Used in the command member of FlightDescriptor | ||
| * for the following RPC calls: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have the same description as CommandGetTables? Do we have two different messages for doing the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message itself is different, what is similar here is the response since the column metadata can be retrieved via both command
|
There was one doc lint hidden in a pipeline: I think other failures are flakes/unrelated. |
|
Sorry @lidavidm. There were some errors related to the cases of the parameter. I've already fixed them |
|
Hi @lidavidm I've got some failures on some CI jobs but according to I'm seeing they are unrelated to code. Is it right? |
|
I believe they're unrelated, I'll kick them to try again (the Java one is definitely unrelated) |
| * - ARROW:FLIGHT:SQL:CATALOG_NAME - Table's catalog name | ||
| * - ARROW:FLIGHT:SQL:DB_SCHEMA_NAME - Database schema name | ||
| * - ARROW:FLIGHT:SQL:TABLE_NAME - Table name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these three are duplicated from the fields above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because they should unified in every place it appears
| * - ARROW:FLIGHT:SQL:IS_CASE_SENSITIVE - "1" indicates if the column is case sensitive, "0" otherwise. | ||
| * - ARROW:FLIGHT:SQL:IS_READ_ONLY - "1" indicates if the column is read only, "0" otherwise. | ||
| * - ARROW:FLIGHT:SQL:IS_SEARCHABLE - "1" indicates if the column is searchable via WHERE clause, "0" otherwise. | ||
| * The returned data should be ordered by catalog_name, db_schema_name, table_name, then table_type, followed by table_schema if requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering by table_schema is just a bytewise lexicographic ordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some questions below.
|
Benchmark runs are scheduled for baseline = acc6c2e and contender = c2fac05. c2fac05 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…emas returned by Flight SQL This add an auxiliary class FlightSqlColumnMetadata (Java) and ColumnMetadata(CPP) meant to read and write known metadata for Arrow schema fields, such as: - CATALOG_NAME - SCHEMA_NAME - TABLE_NAME - PRECISION - SCALE - IS_AUTO_INCREMENT - IS_CASE_SENSITIVE - IS_READ_ONLY - IS_SEARCHABLE Closes apache#11999 from jcralmeida/flight-sql-column-metadata Authored-by: Jose Almeida <[email protected]> Signed-off-by: David Li <[email protected]>
This add an auxiliary class FlightSqlColumnMetadata (Java) and ColumnMetadata(CPP) meant to read and write known metadata for Arrow schema fields, such as: