Normalize ColumnMetadata to support case-sensitive column names#24983
Normalize ColumnMetadata to support case-sensitive column names#24983agrawalreetika merged 2 commits intoprestodb:masterfrom
Conversation
6d9e7da to
d94fc21
Compare
9877280 to
8bc7a44
Compare
|
@ethanyzhang imported this issue as lakehouse/presto #24983 |
8bc7a44 to
c1613fa
Compare
c1613fa to
f2eba02
Compare
|
@ethanyzhang imported this issue as lakehouse/tracker #24983 |
f2eba02 to
723f5c2
Compare
723f5c2 to
e206568
Compare
|
Do we need documentation? Maybe in https://prestodb.io/docs/current/connector/mysql.html#general-configuration-properties. Can you think of better, or other locations for documenting this? How does this |
e206568 to
92cf350
Compare
@steveburnett Its going to be catalog level property, in the last PR it was added for mysql. I have added the documentatioin for other JDBC connectors as well since its added as a config to others as well in this PR. Please check. |
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local doc build, everything looks good. Thanks!
|
@hantangwangd @ZacBlanco @ScrapCodes @aaneja Could you please help me with the review of this PR whenever you get a chance, this is a follow-up PR for columns #24551 |
| return new ConnectorTableMetadata(tableName, table.getColumnsMetadata()); | ||
| List<ColumnMetadata> columns = table.getColumnsMetadata().stream() | ||
| .map(column -> normalizedColumnMetadata(session, column)) | ||
| .collect(toList()); |
There was a problem hiding this comment.
| .collect(toList()); | |
| .collect(toImmutableList()); |
| return new ConnectorTableMetadata(tableName, table.getColumnsMetadata()); | ||
| List<ColumnMetadata> columns = table.getColumnsMetadata().stream() | ||
| .map(column -> normalizedColumnMetadata(session, column)) | ||
| .collect(toList()); |
There was a problem hiding this comment.
| .collect(toList()); | |
| .collect(toImmutableList()); |
| return ColumnMetadata.builder() | ||
| .setName(normalizeIdentifier(session, columnMetadata.getName())) | ||
| .setType(columnMetadata.getType()) | ||
| .setHidden(columnMetadata.isHidden()) | ||
| .setNullable(columnMetadata.isNullable()) | ||
| .setComment(columnMetadata.getComment().orElse(null)) | ||
| .setProperties(columnMetadata.getProperties()) | ||
| .setExtraInfo(columnMetadata.getExtraInfo().orElse(null)) | ||
| .build(); |
There was a problem hiding this comment.
This seems like a reoccurring method. Wonder if we can add a utility method on ColumnMetadata which effectively does this and accepts a lambda or already-normalized name. Will reduce code duplication
| for (ColumnMetadata column : table.get().getColumnsMetadata()) { | ||
| List<ColumnMetadata> columns = table.get().getColumnsMetadata().stream() | ||
| .map(column -> normalizedColumnMetadata(session, column)) | ||
| .collect(toList()); |
There was a problem hiding this comment.
| .collect(toList()); | |
| .collect(toImmutableList()); |
| String normalizedName = normalizeIdentifier(session, column.getColumnName()); | ||
| return column.getColumnMetadata(normalizedName); |
There was a problem hiding this comment.
Debugging? Any reason we can't just inline?
| .setProperties(columnMetadata.getProperties()) | ||
| .setExtraInfo(columnMetadata.getExtraInfo().orElse(null)) | ||
| .build(); | ||
| } |
There was a problem hiding this comment.
another instance where utility method would help
| return tables.values().stream() | ||
| .filter(table -> prefix.matches(table.toSchemaTableName())) | ||
| .collect(toMap(MemoryTableHandle::toSchemaTableName, handle -> handle.toTableMetadata().getColumns())); | ||
| .collect(toMap(MemoryTableHandle::toSchemaTableName, handle -> toTableMetadata(handle, session).getColumns())); |
There was a problem hiding this comment.
| .collect(toMap(MemoryTableHandle::toSchemaTableName, handle -> toTableMetadata(handle, session).getColumns())); | |
| .collect(toImmutableMap(MemoryTableHandle::toSchemaTableName, handle -> toTableMetadata(handle, session).getColumns())); |
| String normalizedName = normalizeIdentifier(session, column.getName()); | ||
| return column.toColumnMetadata(normalizedName); | ||
| }) | ||
| .collect(toList()); |
There was a problem hiding this comment.
| .collect(toList()); | |
| .collect(toImmutableList()); |
|
|
||
| assertQueryFails("CREATE TABLE test (a integer, A integer)", | ||
| "line 1:31: Column name 'A' specified more than once"); | ||
| "Duplicate column name 'A'"); |
There was a problem hiding this comment.
Hmm, are we losing the line/column information from the error? I feel that is useful for large queries.
There was a problem hiding this comment.
So I remember it right since we are enabling case-senetive for column as well in this PR for mysql, this is coming from MySQL Exception, which is why I think this was modified -
SQL Error [1060] [42S21]: Duplicate column name 'A'
| query("CREATE TABLE " + CATALOG + ".\"" + SCHEMA_NAME + "\".\"" + TABLE_NAME_JOIN_LOWER + "\" AS " + | ||
| "SELECT d.* FROM " + CATALOG + ".\"" + SCHEMA_NAME + "\".\"" + TABLE_NAME + "\" d " + | ||
| "INNER JOIN " + CATALOG + ".\"" + SCHEMA_NAME + "\".\"" + TABLE_NAME + "\" m " + | ||
| query("CREATE TABLE " + CATALOG + "." + SCHEMA_NAME + "." + TABLE_NAME_0 + " (name VARCHAR(50), id INT)"); |
There was a problem hiding this comment.
Why are we changing all these tests? Seems like it is just th quote escaping. Does it really need to change?
There was a problem hiding this comment.
This I just found extra eascape not really neeede, so I just cleanup it up. I can revert if you think we should handle it separately?
There was a problem hiding this comment.
+1. Can we only modify the parts that are necessary due to this change? It's a little tricky to figure out which parts of tests apply to the current change. Or if the change of quote escaping for schema names and table names in the test cases are indeed necessary, could we extract those changes into a separate commit?
f53cc50 to
172ac47
Compare
|
@agrawalreetika I address the conflicts internally, you need to update your PR here. |
172ac47 to
8cc3ad5
Compare
ZacBlanco
left a comment
There was a problem hiding this comment.
LGTM, but just a few nits and one comment
presto-bigquery/src/main/java/com/facebook/presto/plugin/bigquery/BigQueryMetadata.java
Outdated
Show resolved
Hide resolved
presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java
Outdated
Show resolved
Hide resolved
0ce4e53 to
a3cecfe
Compare
a3cecfe to
6a05e77
Compare
ZacBlanco
left a comment
There was a problem hiding this comment.
Thanks for the changes @agrawalreetika ! LGTM
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks for the work. LGTM!
Description
Follow up of #24551
Improves identifier handling (column name) to align with SQL standards for better compatibility with case-sensitive and case-normalizing databases, while minimizing SPI-breaking changes.
Motivation and Context
RFC details - prestodb/rfcs#36
Currently, column names are lowercased at the SPI level (ColumnMetadata.java#L45). Removing this generic lowercase conversion will require updates to normalize column names via the metadata API in each connector.
Impact
Improves identifier handling (column name) to align with SQL standards for better compatibility with case-sensitive and case-normalizing databases, while minimizing SPI-breaking changes.
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.