Skip to content

Normalize ColumnMetadata to support case-sensitive column names#24983

Merged
agrawalreetika merged 2 commits intoprestodb:masterfrom
agrawalreetika:mixed-case-v2-columns-2
Jul 17, 2025
Merged

Normalize ColumnMetadata to support case-sensitive column names#24983
agrawalreetika merged 2 commits intoprestodb:masterfrom
agrawalreetika:mixed-case-v2-columns-2

Conversation

@agrawalreetika
Copy link
Member

@agrawalreetika agrawalreetika commented Apr 25, 2025

Description

Follow up of #24551

Improves identifier handling (column name) to align with SQL standards for better compatibility with case-sensitive and case-normalizing databases, while minimizing SPI-breaking changes.

Motivation and Context

RFC details - prestodb/rfcs#36

Currently, column names are lowercased at the SPI level (ColumnMetadata.java#L45). Removing this generic lowercase conversion will require updates to normalize column names via the metadata API in each connector.

Impact

Improves identifier handling (column name) to align with SQL standards for better compatibility with case-sensitive and case-normalizing databases, while minimizing SPI-breaking changes.

Test Plan

  • Existing UT passing
  • Added support for Mysql and new UT added for Mysql for when mixed-case support is enabled

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add case-senstive support for column names. It can be enabled for JDBC based connector by setting `case-sensitive-name-matching=true` at the catalog level

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Apr 25, 2025
@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch 8 times, most recently from 6d9e7da to d94fc21 Compare May 7, 2025 16:22
@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch 10 times, most recently from 9877280 to 8bc7a44 Compare May 14, 2025 03:03
@prestodb-ci
Copy link
Contributor

@ethanyzhang imported this issue as lakehouse/presto #24983

@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch from 8bc7a44 to c1613fa Compare May 22, 2025 17:17
@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch from c1613fa to f2eba02 Compare May 30, 2025 07:16
@prestodb-ci
Copy link
Contributor

@ethanyzhang imported this issue as lakehouse/tracker #24983

@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch from f2eba02 to 723f5c2 Compare June 9, 2025 17:00
@agrawalreetika agrawalreetika changed the title [DO NOT REVIEW] Mixed case v2 columns 2 Normalize ColumnMetadata to support case-sensitive column names Jul 1, 2025
@agrawalreetika agrawalreetika marked this pull request as ready for review July 1, 2025 13:33
@agrawalreetika agrawalreetika requested a review from ScrapCodes July 1, 2025 13:47
@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch from 723f5c2 to e206568 Compare July 1, 2025 13:58
@steveburnett
Copy link
Contributor

Do we need documentation? Maybe in https://prestodb.io/docs/current/connector/mysql.html#general-configuration-properties. Can you think of better, or other locations for documenting this?

How does this case-sensitive-name-matching config property interact with the existing case-insensitive-name-matching MySQL config property?

@agrawalreetika
Copy link
Member Author

Do we need documentation? Maybe in https://prestodb.io/docs/current/connector/mysql.html#general-configuration-properties. Can you think of better, or other locations for documenting this?

How does this case-sensitive-name-matching config property interact with the existing case-insensitive-name-matching MySQL config property?

@steveburnett Its going to be catalog level property, in the last PR it was added for mysql. I have added the documentatioin for other JDBC connectors as well since its added as a config to others as well in this PR. Please check.

steveburnett
steveburnett previously approved these changes Jul 1, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, everything looks good. Thanks!

@agrawalreetika
Copy link
Member Author

agrawalreetika commented Jul 2, 2025

@hantangwangd @ZacBlanco @ScrapCodes @aaneja Could you please help me with the review of this PR whenever you get a chance, this is a follow-up PR for columns #24551

@agrawalreetika agrawalreetika requested a review from aaneja July 2, 2025 05:08
return new ConnectorTableMetadata(tableName, table.getColumnsMetadata());
List<ColumnMetadata> columns = table.getColumnsMetadata().stream()
.map(column -> normalizedColumnMetadata(session, column))
.collect(toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.collect(toList());
.collect(toImmutableList());

return new ConnectorTableMetadata(tableName, table.getColumnsMetadata());
List<ColumnMetadata> columns = table.getColumnsMetadata().stream()
.map(column -> normalizedColumnMetadata(session, column))
.collect(toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.collect(toList());
.collect(toImmutableList());

Comment on lines 129 to 137
return ColumnMetadata.builder()
.setName(normalizeIdentifier(session, columnMetadata.getName()))
.setType(columnMetadata.getType())
.setHidden(columnMetadata.isHidden())
.setNullable(columnMetadata.isNullable())
.setComment(columnMetadata.getComment().orElse(null))
.setProperties(columnMetadata.getProperties())
.setExtraInfo(columnMetadata.getExtraInfo().orElse(null))
.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reoccurring method. Wonder if we can add a utility method on ColumnMetadata which effectively does this and accepts a lambda or already-normalized name. Will reduce code duplication

for (ColumnMetadata column : table.get().getColumnsMetadata()) {
List<ColumnMetadata> columns = table.get().getColumnsMetadata().stream()
.map(column -> normalizedColumnMetadata(session, column))
.collect(toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.collect(toList());
.collect(toImmutableList());

Comment on lines 186 to 187
String normalizedName = normalizeIdentifier(session, column.getColumnName());
return column.getColumnMetadata(normalizedName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging? Any reason we can't just inline?

.setProperties(columnMetadata.getProperties())
.setExtraInfo(columnMetadata.getExtraInfo().orElse(null))
.build();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another instance where utility method would help

return tables.values().stream()
.filter(table -> prefix.matches(table.toSchemaTableName()))
.collect(toMap(MemoryTableHandle::toSchemaTableName, handle -> handle.toTableMetadata().getColumns()));
.collect(toMap(MemoryTableHandle::toSchemaTableName, handle -> toTableMetadata(handle, session).getColumns()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.collect(toMap(MemoryTableHandle::toSchemaTableName, handle -> toTableMetadata(handle, session).getColumns()));
.collect(toImmutableMap(MemoryTableHandle::toSchemaTableName, handle -> toTableMetadata(handle, session).getColumns()));

String normalizedName = normalizeIdentifier(session, column.getName());
return column.toColumnMetadata(normalizedName);
})
.collect(toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.collect(toList());
.collect(toImmutableList());


assertQueryFails("CREATE TABLE test (a integer, A integer)",
"line 1:31: Column name 'A' specified more than once");
"Duplicate column name 'A'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are we losing the line/column information from the error? I feel that is useful for large queries.

Copy link
Member Author

@agrawalreetika agrawalreetika Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I remember it right since we are enabling case-senetive for column as well in this PR for mysql, this is coming from MySQL Exception, which is why I think this was modified -

SQL Error [1060] [42S21]: Duplicate column name 'A'

query("CREATE TABLE " + CATALOG + ".\"" + SCHEMA_NAME + "\".\"" + TABLE_NAME_JOIN_LOWER + "\" AS " +
"SELECT d.* FROM " + CATALOG + ".\"" + SCHEMA_NAME + "\".\"" + TABLE_NAME + "\" d " +
"INNER JOIN " + CATALOG + ".\"" + SCHEMA_NAME + "\".\"" + TABLE_NAME + "\" m " +
query("CREATE TABLE " + CATALOG + "." + SCHEMA_NAME + "." + TABLE_NAME_0 + " (name VARCHAR(50), id INT)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing all these tests? Seems like it is just th quote escaping. Does it really need to change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I just found extra eascape not really neeede, so I just cleanup it up. I can revert if you think we should handle it separately?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Can we only modify the parts that are necessary due to this change? It's a little tricky to figure out which parts of tests apply to the current change. Or if the change of quote escaping for schema names and table names in the test cases are indeed necessary, could we extract those changes into a separate commit?

@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch 2 times, most recently from f53cc50 to 172ac47 Compare July 3, 2025 10:16
@ethanyzhang
Copy link
Contributor

@agrawalreetika I address the conflicts internally, you need to update your PR here.

@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch from 172ac47 to 8cc3ad5 Compare July 8, 2025 20:14
@agrawalreetika agrawalreetika requested a review from ZacBlanco July 10, 2025 17:09
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but just a few nits and one comment

@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch 2 times, most recently from 0ce4e53 to a3cecfe Compare July 15, 2025 08:51
@agrawalreetika agrawalreetika force-pushed the mixed-case-v2-columns-2 branch from a3cecfe to 6a05e77 Compare July 15, 2025 11:28
@agrawalreetika agrawalreetika requested a review from ZacBlanco July 15, 2025 13:09
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @agrawalreetika ! LGTM

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work. LGTM!

@agrawalreetika agrawalreetika merged commit e9bba9d into prestodb:master Jul 17, 2025
109 checks passed
@prestodb-ci prestodb-ci mentioned this pull request Jul 28, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants