Skip to content

feat(plugin-kafka): Enable case-senstive support for Kafka connector #26023

Merged
agrawalreetika merged 1 commit intoprestodb:masterfrom
Mariamalmesfer:kafka-mixedcase
Oct 29, 2025
Merged

feat(plugin-kafka): Enable case-senstive support for Kafka connector #26023
agrawalreetika merged 1 commit intoprestodb:masterfrom
Mariamalmesfer:kafka-mixedcase

Conversation

@Mariamalmesfer
Copy link
Contributor

@Mariamalmesfer Mariamalmesfer commented Sep 11, 2025

Description

This PR adds support for case-sensitive identifiers in the Kafka connector.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

==RELEASE NOTE ==

Kafka Connector Changes
* Add mixed case support for Kafka connector.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Sep 11, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 11, 2025

Reviewer's Guide

This PR introduces support for case-sensitive identifier matching in the Kafka connector by adding a new config flag, implementing a normalization path in metadata, and updating test runners and configs to exercise the new behavior.

Entity relationship diagram for case-sensitive name matching config

erDiagram
    KAFKA_CONNECTOR_CONFIG {
        boolean caseSensitiveNameMatchingEnabled
    }
    KAFKA_METADATA {
        boolean caseSensitiveNameMatching
    }
    KAFKA_CONNECTOR_CONFIG ||--o| KAFKA_METADATA : "configures"
Loading

Class diagram for updated KafkaConnectorConfig and KafkaMetadata

classDiagram
    class KafkaConnectorConfig {
        - List<File> resourceConfigFiles
        - boolean caseSensitiveNameMatchingEnabled
        + boolean isCaseSensitiveNameMatching()
        + KafkaConnectorConfig setCaseSensitiveNameMatching(boolean)
    }
    class KafkaMetadata {
        - String connectorId
        - boolean hideInternalColumns
        - TableDescriptionSupplier tableDescriptionSupplier
        - boolean caseSensitiveNameMatching
        + String normalizeIdentifier(ConnectorSession, String)
    }
    KafkaMetadata --> KafkaConnectorConfig : uses
Loading

File-Level Changes

Change Details Files
Implement case-sensitive identifier normalization in metadata
  • Add caseSensitiveNameMatching field to KafkaMetadata and initialize from config
  • Introduce normalizeIdentifier(session, identifier) to apply lowercase normalization when disabled
  • Apply normalizeIdentifier to tableName in getTableHandle and listTables
presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaMetadata.java
Add case-sensitive matching config option
  • Introduce caseSensitiveNameMatchingEnabled property with getter and setter
  • Annotate setter with @config("case-sensitive-name-matching") and description
presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaConnectorConfig.java
Propagate connectorProperties in KafkaQueryRunner
  • Add connectorProperties parameter to createKafkaQueryRunner overloads
  • Pass connectorProperties to createCatalog calls
presto-kafka/src/test/java/com/facebook/presto/kafka/KafkaQueryRunner.java
Update tests for new property and runner signature
  • Include case-sensitive-name-matching in TestKafkaConnectorConfig default and explicit mappings
  • Adjust TestKafkaDistributed to pass empty connectorProperties in runner creation
presto-kafka/src/test/java/com/facebook/presto/kafka/TestKafkaConnectorConfig.java
presto-kafka/src/test/java/com/facebook/presto/kafka/TestKafkaDistributed.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@Mariamalmesfer Mariamalmesfer force-pushed the kafka-mixedcase branch 2 times, most recently from 38a7b36 to c52341b Compare September 15, 2025 10:04
@steveburnett
Copy link
Contributor

Please include documentation. See #25863 and #26038 for examples.

@Mariamalmesfer Mariamalmesfer force-pushed the kafka-mixedcase branch 2 times, most recently from bc3e15d to e89bd50 Compare September 16, 2025 11:59
@steveburnett
Copy link
Contributor

Thanks for adding the doc! I'll review when you mark this PR ready for review.

@Mariamalmesfer Mariamalmesfer marked this pull request as ready for review September 17, 2025 12:06
@prestodb-ci prestodb-ci requested review from a team, NivinCS and infvg and removed request for a team September 17, 2025 12:06
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Add the new case-sensitive-name-matching property to the Kafka connector documentation so users know how to enable this behavior.
  • Consider normalizing schema identifiers as well as table identifiers to ensure consistent case sensitivity across all metadata lookups.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Add the new `case-sensitive-name-matching` property to the Kafka connector documentation so users know how to enable this behavior.
- Consider normalizing schema identifiers as well as table identifiers to ensure consistent case sensitivity across all metadata lookups.

## Individual Comments

### Comment 1
<location> `presto-kafka/src/main/java/com/facebook/presto/kafka/KafkaMetadata.java:314-318` </location>
<code_context>
     }
+
+    @Override
+    public String normalizeIdentifier(ConnectorSession session, String identifier)
+    {
+        return caseSensitiveNameMatching ? identifier : identifier.toLowerCase(ENGLISH);
</code_context>

<issue_to_address>
**suggestion:** Locale usage in normalization could be clarified.

If identifiers may include non-ASCII characters, ensure this normalization meets all requirements, or document any limitations.

```suggestion
    /**
     * Normalizes the identifier for case-insensitive matching.
     * <p>
     * If {@code caseSensitiveNameMatching} is false, the identifier is converted to lower case using {@link Locale#ENGLISH}.
     * Note: This normalization is suitable for ASCII identifiers. Identifiers containing non-ASCII characters may not be normalized as expected.
     * If your use case requires support for non-ASCII identifiers, consider using {@link Locale#ROOT} or a more robust normalization strategy.
     *
     * @param session the connector session
     * @param identifier the identifier to normalize
     * @return the normalized identifier
     */
    @Override
    public String normalizeIdentifier(ConnectorSession session, String identifier)
    {
        // Using Locale.ENGLISH for normalization; see method documentation for limitations.
        return caseSensitiveNameMatching ? identifier : identifier.toLowerCase(ENGLISH);
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

steveburnett
steveburnett previously approved these changes Sep 17, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

Copy link
Member

@agrawalreetika agrawalreetika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes.

Could you please Test class as well ocvering different scenarios?

@Mariamalmesfer Mariamalmesfer force-pushed the kafka-mixedcase branch 6 times, most recently from 818907c to 29bfd29 Compare September 29, 2025 20:01
@Mariamalmesfer Mariamalmesfer force-pushed the kafka-mixedcase branch 14 times, most recently from f9147e7 to 2f2edfb Compare October 13, 2025 09:18
assertQueryFails(session, "DESCRIBE Orders", ".*");
assertQueryFails(session, "DESCRIBE oRdErS", ".*");
assertQueryFails(session, "DESCRIBE TPCH.orders", ".*");
assertQueryFails(session, "DESCRIBE tpch.ORDERS", ".*");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a table with some different case columns and test DESCRIBE on Metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried creating a view but the Kafka connector doesn’t support it I hit this error:

Caused by: com.facebook.presto.spi.PrestoException: This connector does not support creating views
	at com.facebook.presto.spi.connector.ConnectorMetadata.createView(ConnectorMetadata.java:608)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not view, I meant having table with different casing on columns and then run DESCRIBE to match the metadata

assertQueryFails(session, "SELECT count(*) FROM Orders o1 JOIN orders o2 ON o1.orderkey = o2.orderkey", "Table kafka.tpch.Orders does not exist");
assertQueryFails(session, "SELECT count(*) FROM orders o1 JOIN Orders o2 ON o1.orderkey = o2.orderkey", "Table kafka.tpch.Orders does not exist");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it would be good to add tests with same name with different casing for schemas & tables names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema and table casing scenarios are is covered in testSelect(), including cases like tpch.orders, TPCH.orders, and tpch.ORDERS. What additional test cases would you like me to add?

{
assertTrue(getQueryRunner().tableExists(session, "orders"));

assertFalse(getQueryRunner().tableExists(session, "ORDERS"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mariamalmesfer Can we create tables with named "ORDERS","Orders" and test the success scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kafka connector doesn’t support CREATE TABLE , tables come from the (e.g., tpch/orders.json).

@Mariamalmesfer Mariamalmesfer force-pushed the kafka-mixedcase branch 2 times, most recently from 78543de to c9c6b48 Compare October 16, 2025 13:00
"('TOTALPRICE','double')) AS t(column_name, data_type)");
}
finally {
// No cleanup needed for read-only DESCRIBE operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no cleanup then why are we using try-finally here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it, it was there expected error message in the DESCRIBE . I found it and fixed it now.

assertQuerySucceeds(session, "SELECT o.orderkey FROM orders o LIMIT 1");

assertQueryFails(session, "SELECT COUNT(*) FROM Orders WHERE OrderKey > 100", "Table kafka.tpch.Orders does not exist");
assertQueryFails(session, "SELECT * FROM TPCH.Orders", "Schema TPCH does not exist");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also ORDERS table you are creating, lets add tests for that as well?

}

@Test
public void testSelectAndDescribeForOrdersAndORDERS()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing, I think we could have overall 2 tests for testing all tables -

  • testSelect
  • testDescribe

Copy link
Member

@agrawalreetika agrawalreetika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes lgtm now

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build. Looks good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants