-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to map a topic name to a valid Hive tablenames? #155
Comments
@rmoff are That said, we should be able to add an optional topic mapping index, so I'll label this as an enhancement. |
Thanks @rmoff for clarifying. I think this is good to keep open as an enhancement then. If you would like to implement it, please submit a PR and I'm happy to do an initial review. |
@rmoff just an FYI #137 became #164 as we had locally reforked so to address review comments needed to make a new PR. Re supporting configuration based table name mapping, should be fairly straight forward to add now once #164 is merged. As to resolve table name with dots i added a method to io.confluent.connect.hdfs.hive.HiveMetaStore which everything calls to resolve the end hive table name allowing to logic to be in one place rather than repeating. currently its just doing very basic replacement of dots.
but you can easily make/enhance the io.confluent.connect.hdfs.hive.HiveMetaStore take in a map of table name translations/mappings on construction/via config, and extend/alter this method to do a table name translation/mapping lookup also. |
Great stuff, thanks @ig-michaelpearce |
@cotedm I have plenty of interest, but ATM little time :) So go ahead and have a crack, I'd be happy to try and validate any fix you make. |
I made an implementation of a topic to hive name mapping here: https://github.com/mafc/kafka-connect-hdfs/tree/topic_map. I'm not sure if I should make a pull request off it since it uses java 8 features and also contains an bump of avro version to 1.8.1. |
This should also be doable via SMTs, right? I think RegexRouter would work here? |
+1 for this enhancement. One thing that #164 does not handle is topic names that start with an underscore. The reason is that Hive does not consider (Parquet) files with a leading unserscore. This is another use case where topic mapping would be desirable. |
How about adding two config If it looks ok, I would like to create a PR to implement it. I tried |
…to hive tables names confluentinc#155
I have data coming through from GoldenGate Kafka Connector, which takes the fully-qualified Oracle table name as the topic name. From what I can see there is no way to override this.
When kafka-connect-hdfs tries to create a Hive table, it fails.
In the elasticsearch connector there is a
topic.index.map
configuration to address this exact scenario (source topic names being invalid target objects). Is there an equivalent for the HDFS connector (or another suitable workaround?).The text was updated successfully, but these errors were encountered: