feat(plugin-kafka): Enable case-senstive support for Kafka connector #26023
feat(plugin-kafka): Enable case-senstive support for Kafka connector #26023agrawalreetika merged 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideThis PR introduces support for case-sensitive identifier matching in the Kafka connector by adding a new config flag, implementing a normalization path in metadata, and updating test runners and configs to exercise the new behavior. Entity relationship diagram for case-sensitive name matching configerDiagram
KAFKA_CONNECTOR_CONFIG {
boolean caseSensitiveNameMatchingEnabled
}
KAFKA_METADATA {
boolean caseSensitiveNameMatching
}
KAFKA_CONNECTOR_CONFIG ||--o| KAFKA_METADATA : "configures"
Class diagram for updated KafkaConnectorConfig and KafkaMetadataclassDiagram
class KafkaConnectorConfig {
- List<File> resourceConfigFiles
- boolean caseSensitiveNameMatchingEnabled
+ boolean isCaseSensitiveNameMatching()
+ KafkaConnectorConfig setCaseSensitiveNameMatching(boolean)
}
class KafkaMetadata {
- String connectorId
- boolean hideInternalColumns
- TableDescriptionSupplier tableDescriptionSupplier
- boolean caseSensitiveNameMatching
+ String normalizeIdentifier(ConnectorSession, String)
}
KafkaMetadata --> KafkaConnectorConfig : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
38a7b36 to
c52341b
Compare
bc3e15d to
e89bd50
Compare
|
Thanks for adding the doc! I'll review when you mark this PR ready for review. |
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- Add the new
case-sensitive-name-matchingproperty to the Kafka connector documentation so users know how to enable this behavior. - Consider normalizing schema identifiers as well as table identifiers to ensure consistent case sensitivity across all metadata lookups.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Add the new `case-sensitive-name-matching` property to the Kafka connector documentation so users know how to enable this behavior.
- Consider normalizing schema identifiers as well as table identifiers to ensure consistent case sensitivity across all metadata lookups.
## Individual Comments
### Comment 1
<location> `presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaMetadata.java:314-318` </location>
<code_context>
}
+
+ @Override
+ public String normalizeIdentifier(ConnectorSession session, String identifier)
+ {
+ return caseSensitiveNameMatching ? identifier : identifier.toLowerCase(ENGLISH);
</code_context>
<issue_to_address>
**suggestion:** Locale usage in normalization could be clarified.
If identifiers may include non-ASCII characters, ensure this normalization meets all requirements, or document any limitations.
```suggestion
/**
* Normalizes the identifier for case-insensitive matching.
* <p>
* If {@code caseSensitiveNameMatching} is false, the identifier is converted to lower case using {@link Locale#ENGLISH}.
* Note: This normalization is suitable for ASCII identifiers. Identifiers containing non-ASCII characters may not be normalized as expected.
* If your use case requires support for non-ASCII identifiers, consider using {@link Locale#ROOT} or a more robust normalization strategy.
*
* @param session the connector session
* @param identifier the identifier to normalize
* @return the normalized identifier
*/
@Override
public String normalizeIdentifier(ConnectorSession session, String identifier)
{
// Using Locale.ENGLISH for normalization; see method documentation for limitations.
return caseSensitiveNameMatching ? identifier : identifier.toLowerCase(ENGLISH);
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
agrawalreetika
left a comment
There was a problem hiding this comment.
Thanks for making the changes.
Could you please Test class as well ocvering different scenarios?
presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaMetadata.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaMetadata.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaMetadata.java
Outdated
Show resolved
Hide resolved
be33346 to
dc77654
Compare
presto-kafka/src/test/java/com/facebook/presto/kafka/TestKafkaIntegrationMixedCase.java
Outdated
Show resolved
Hide resolved
818907c to
29bfd29
Compare
f9147e7 to
2f2edfb
Compare
| assertQueryFails(session, "DESCRIBE Orders", ".*"); | ||
| assertQueryFails(session, "DESCRIBE oRdErS", ".*"); | ||
| assertQueryFails(session, "DESCRIBE TPCH.orders", ".*"); | ||
| assertQueryFails(session, "DESCRIBE tpch.ORDERS", ".*"); |
There was a problem hiding this comment.
Can you add a table with some different case columns and test DESCRIBE on Metadata?
There was a problem hiding this comment.
I tried creating a view but the Kafka connector doesn’t support it I hit this error:
Caused by: com.facebook.presto.spi.PrestoException: This connector does not support creating views
at com.facebook.presto.spi.connector.ConnectorMetadata.createView(ConnectorMetadata.java:608)
There was a problem hiding this comment.
Not view, I meant having table with different casing on columns and then run DESCRIBE to match the metadata
| assertQueryFails(session, "SELECT count(*) FROM Orders o1 JOIN orders o2 ON o1.orderkey = o2.orderkey", "Table kafka.tpch.Orders does not exist"); | ||
| assertQueryFails(session, "SELECT count(*) FROM orders o1 JOIN Orders o2 ON o1.orderkey = o2.orderkey", "Table kafka.tpch.Orders does not exist"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Also it would be good to add tests with same name with different casing for schemas & tables names
There was a problem hiding this comment.
The schema and table casing scenarios are is covered in testSelect(), including cases like tpch.orders, TPCH.orders, and tpch.ORDERS. What additional test cases would you like me to add?
presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaMetadata.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaMetadata.java
Outdated
Show resolved
Hide resolved
| { | ||
| assertTrue(getQueryRunner().tableExists(session, "orders")); | ||
|
|
||
| assertFalse(getQueryRunner().tableExists(session, "ORDERS")); |
There was a problem hiding this comment.
@Mariamalmesfer Can we create tables with named "ORDERS","Orders" and test the success scenario?
There was a problem hiding this comment.
Kafka connector doesn’t support CREATE TABLE , tables come from the (e.g., tpch/orders.json).
78543de to
c9c6b48
Compare
| "('TOTALPRICE','double')) AS t(column_name, data_type)"); | ||
| } | ||
| finally { | ||
| // No cleanup needed for read-only DESCRIBE operations |
There was a problem hiding this comment.
If no cleanup then why are we using try-finally here?
There was a problem hiding this comment.
I removed it, it was there expected error message in the DESCRIBE . I found it and fixed it now.
| assertQuerySucceeds(session, "SELECT o.orderkey FROM orders o LIMIT 1"); | ||
|
|
||
| assertQueryFails(session, "SELECT COUNT(*) FROM Orders WHERE OrderKey > 100", "Table kafka.tpch.Orders does not exist"); | ||
| assertQueryFails(session, "SELECT * FROM TPCH.Orders", "Schema TPCH does not exist"); |
There was a problem hiding this comment.
There is also ORDERS table you are creating, lets add tests for that as well?
| } | ||
|
|
||
| @Test | ||
| public void testSelectAndDescribeForOrdersAndORDERS() |
There was a problem hiding this comment.
This is a little confusing, I think we could have overall 2 tests for testing all tables -
- testSelect
- testDescribe
presto-kafka/src/test/java/com/facebook/presto/kafka/TestKafkaIntegrationMixedCase.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/test/java/com/facebook/presto/kafka/KafkaQueryRunner.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/test/java/com/facebook/presto/kafka/TestKafkaDistributed.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/test/java/com/facebook/presto/kafka/util/TestUtils.java
Outdated
Show resolved
Hide resolved
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local doc build. Looks good, thanks!
Description
This PR adds support for case-sensitive identifiers in the Kafka connector.
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.