-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(cassandra): Enable case-sensitive support #25690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
44c3dcc to
73ec9cd
Compare
agrawalreetika
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes and adding the support.
I think currently, schemas are still getting converted to lowercase? https://github.com/bibith4/presto/blob/73ec9cd4a49982120eb3af5e781a943baf755205/presto-cassandra/src/main/java/com/facebook/presto/cassandra/CassandraMetadata.java#L101
Please verify and make the changes along with the tests accordingly.
5dd62cf to
c81c097
Compare
@agrawalreetika Added the requested changes. Please check |
| session.execute("CREATE KEYSPACE test_keyspace WITH REPLICATION = {'class':'SimpleStrategy', 'replication_factor': 1}"); | ||
| assertContainsEventually(() -> getQueryRunner().execute("SHOW SCHEMAS FROM cassandra"), resultBuilder(getSession(), createUnboundedVarcharType()) | ||
| .row("test_keyspace") | ||
| .build(), new Duration(1, MINUTES)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add schema creation for same KEYSPACE in upper case as well and then check for lowercase and upper cae for in testShowSchemas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the documenation details as well for new config - https://prestodb.io/docs/current/connector/cassandra.html#configuration-properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the documenation details as well for new config - https://prestodb.io/docs/current/connector/cassandra.html#configuration-properties
Yes, please add documentation. You can use the MySQL doc as a model where the case-sensitive-name-matching property is documented in the last row of the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika @steveburnett updated the document with new config property. Please check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add schema creation for same KEYSPACE in upper case as well and then check for lowercase and upper cae for in testShowSchemas
@agrawalreetika Added the required changes. Please check
c81c097 to
6b547ef
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs! Minor nits.
6b547ef to
58971f4
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
imjalpreet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bibith4, some changes appear to be in the wrong commit. Please check and update the commits to include only relevant changes. It would really help in the review process. Thank you!
@imjalpreet As discussed created a seperate PR for Enable and fix all Cassandra connector tests in CI(#26022 (comment)). |
aa7e9d7 to
a1fbb34
Compare
|
@imjalpreet Updated the PR to include only mixed-case support for the Cassandra change. Can you please take a look. |
| session.execute("DROP KEYSPACE keyspace_1"); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bibith4 Can you add a test method for ALTER operations as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika Cassandra does not support ALTER operations
imjalpreet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @bibith4.
| try { | ||
| for (String tableName : cassandraSession.getCaseSensitiveTableNames(schemaName)) { | ||
| tableNames.add(new SchemaTableName(schemaName, tableName.toLowerCase(ENGLISH))); | ||
| String finalTableName = normalizeIdentifier(session, tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| String finalTableName = normalizeIdentifier(session, tableName); | |
| String normalizedTableName = normalizeIdentifier(session, tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Changed as per suggestion. Please check
| for (CassandraColumnHandle columnHandle : table.getColumns()) { | ||
| columnHandles.put(CassandraCqlUtils.cqlNameToSqlName(columnHandle.getName()).toLowerCase(ENGLISH), columnHandle); | ||
| String columnName = CassandraCqlUtils.cqlNameToSqlName(columnHandle.getName()); | ||
| String finalColumnName = normalizeIdentifier(session, columnName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| String finalColumnName = normalizeIdentifier(session, columnName); | |
| String normalizedColumnName = normalizeIdentifier(session, columnName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Changed as per suggestion. Please check
| ImmutableMap.Builder<String, ColumnHandle> columnHandles = ImmutableMap.builder(); | ||
| for (CassandraColumnHandle columnHandle : table.getColumns()) { | ||
| columnHandles.put(CassandraCqlUtils.cqlNameToSqlName(columnHandle.getName()).toLowerCase(ENGLISH), columnHandle); | ||
| String columnName = CassandraCqlUtils.cqlNameToSqlName(columnHandle.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: static import for cqlNameToSqlName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Changed as per suggestion. Please check
| if (result != null) { | ||
| throw new PrestoException( | ||
| NOT_SUPPORTED, | ||
| format("More than one keyspace has been found for the case insensitive schema name: %s -> (%s, %s)", | ||
| caseInsensitiveSchemaName, result.getName(), keyspace.getName())); | ||
| format("More than one keyspace has been found for the schema name: %s -> (%s, %s)", | ||
| caseSensitiveSchemaName, result.getName(), keyspace.getName())); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this check when caseSensitiveNameMatchingEnabled is true? I think it can only happen if we match case-insensitively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Changed to check only if caseSensitiveNameMatchingEnabled is false
| keyspace.getTables().stream(), | ||
| keyspace.getMaterializedViews().stream()) | ||
| .filter(table -> table.getName().equalsIgnoreCase(caseInsensitiveTableName)) | ||
| .filter(table -> caseSensitiveNameMatchingEnabled ? table.getName().equals(caseSensitiveTableName) : table.getName().equalsIgnoreCase(caseSensitiveTableName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we can make the matchesSchemaName method generic and use it for both table and schema checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Changed to make the method generic
| if (columnMap.containsKey(columnNameKey)) { | ||
| throw new PrestoException( | ||
| NOT_SUPPORTED, | ||
| format("More than one column has been found for the case insensitive column name: %s -> (%s, %s)", | ||
| lowercaseName, lowercaseNameToColumnMap.get(lowercaseName).getName(), column.getName())); | ||
| columnNameKey, columnMap.get(columnNameKey).getName(), column.getName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here. Do we need this when caseSensitiveNameMatchingEnabled is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Changed to check only if caseSensitiveNameMatchingEnabled is false
| { | ||
| private CassandraServer server; | ||
| private CassandraSession session; | ||
| private static final String KEYSPACE = "test_connetor"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| private static final String KEYSPACE = "test_connetor"; | |
| private static final String KEYSPACE = "test_connector"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Corrected. Please check
| assertQueryFails("CREATE TABLE test_connector.TEST_CREATEAS_FAIL_Join AS SELECT c.custkey, o.orderkey FROM " + | ||
| "tpch.customer c INNER JOIN tpch.ORDERS1 o ON c.custkey = o.custkey WHERE c.mktsegment = 'BUILDING'", "Table cassandra.tpch.ORDERS1 does not exist"); //failure scenario since tpch.ORDERS1 doesn't exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the significance of this test? Shouldn't we test with just ORDERS rather than ORDERS1 as lowercase orders exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Corrected to use ORDERS. Please check
| "tpch.customer Cus INNER JOIN tpch.orders Ord ON Cus.custkey = Ord.custkey WHERE Cus.mktsegment = 'BUILDING'"); | ||
| assertTrue(getQueryRunner().tableExists(session, "Test_CreateAs_Mixed_Join")); | ||
| } | ||
| finally { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you missed dropping TEST_CREATEAS_Join
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet added drop statement for TEST_CREATEAS_Join
| assertQuery("SELECT COUNT(*) FROM test_connetor.test_select", "VALUES 2"); | ||
| } | ||
| finally { | ||
| getQueryRunner().execute(session, "DROP TABLE IF EXISTS TEST_SELECT"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We created lowercase TEST_SELECT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Corrected. Please check
a1fbb34 to
6dfb570
Compare
imjalpreet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @bibith4.
Changes LGTM now, last few nits.
| format("More than one keyspace has been found for the case insensitive schema name: %s -> (%s, %s)", | ||
| caseInsensitiveSchemaName, result.getName(), keyspace.getName())); | ||
| format("More than one keyspace has been found for the schema name: %s -> (%s, %s)", | ||
| caseSensitiveSchemaName, result.getName(), keyspace.getName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we should print the lowercased schema name in the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Corrected. Please check
| format("More than one table has been found for the case insensitive table name: %s -> (%s)", | ||
| caseInsensitiveTableName, tableNames)); | ||
| caseSensitiveTableName, tableNames)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this the last time, similar to schema and column names, this is also only needed when caseSensitiveNameMatchingEnabled is false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, ignore this. Its already handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But print the lowercased table name in error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Corrected. Please check
| private static void checkColumnNames(List<ColumnMetadata> columns) | ||
| { | ||
| Map<String, ColumnMetadata> lowercaseNameToColumnMap = new HashMap<>(); | ||
| Map<String, ColumnMetadata> columnMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe this line change can now be reverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet Corrected. Please check
5cbeef3 to
2aa6330
Compare
2aa6330 to
d3cb81e
Compare
d3cb81e to
22b61df
Compare
imjalpreet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @bibith4.
LGTM.
| connectorProperties = new HashMap<>(ImmutableMap.copyOf(connectorProperties)); | ||
| connectorProperties.putIfAbsent("cassandra.contact-points", server.getHost()); | ||
| connectorProperties.putIfAbsent("cassandra.native-protocol-port", Integer.toString(server.getPort())); | ||
| connectorProperties.putIfAbsent("cassandra.allow-drop-table", "true"); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these related to the case sensitivity changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were already being used. The change is to add them to the newly introduced connectorProperties map.
## Description Enable case sensitivity support in Cassandra connector ## Motivation and Context Cassandra connector always treated table and column names as lowercase. With this change, the connector can preserve case sensitivity, allowing the creation of tables and columns with the same name but different casing, depending on the configuration setting. ## Impact Supports creating tables and columns with the same name in different cases
Description
Enable case sensitivity support in Cassandra connector
Motivation and Context
Cassandra connector always treated table and column names as lowercase. With this change, the connector can preserve case sensitivity, allowing the creation of tables and columns with the same name but different casing, depending on the configuration setting.
Impact
Supports creating tables and columns with the same name in different cases
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.