Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bwilliams fix for topics with dot #73

Conversation

bwilliams42
Copy link

If a topic name has a . this breaks when attempting to extract the offset. I've refactored the extract logic to first remove the extension and then split by + which is used to create the file name.

@ConfluentJenkins
Copy link
Contributor

Can one of the admins verify this patch?

@ghost
Copy link

ghost commented Jun 9, 2016

Hey @bwilliams42,
thank you for your Pull Request.

It looks like you haven't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

@bwilliams42
Copy link
Author

[clabot:check]

@ghost
Copy link

ghost commented Jun 9, 2016

@confluentinc It looks like @bwilliams42 just signed our Contributor License Agreement. 👍

Always at your service,

clabot

@ewencp
Copy link
Contributor

ewencp commented Jun 9, 2016

@bwilliams42 This looks like it might be a duplicate of #70?

@bwilliams42
Copy link
Author

@ewencp you are correct. Seeing that thats already merged, i'll close this. Thanks.

@bwilliams42 bwilliams42 closed this Jun 9, 2016
@michaelandrepearce
Copy link

We took the latest master and built and ran it, we still seem to get issue if the topic has dots in its name:

2016-10-06 14:37:29,966 - ERROR [pool-1-thread-1:WorkerSinkTask@401] - Task avro-hdfs-connector-1-2 threw an uncaught and unrecoverable exception
java.lang.RuntimeException: java.util.concurrent.ExecutionException: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Invalid table
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:226)
at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:103)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:381)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:227)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:170)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:142)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Invalid table
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:220)
... 12 more
Caused by: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Invalid table
at io.confluent.connect.hdfs.hive.HiveMetaStore.createTable(HiveMetaStore.java:200)
at io.confluent.connect.hdfs.avro.AvroHiveUtil.createTable(AvroHiveUtil.java:51)
at io.confluent.connect.hdfs.TopicPartitionWriter$1.call(TopicPartitionWriter.java:620)
at io.confluent.connect.hdfs.TopicPartitionWriter$1.call(TopicPartitionWriter.java:617)
... 4 more
Caused by: InvalidObjectException(message:TEST.CONNECT.2 is not a valid object name)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:29974)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:29951)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:29877)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1075)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1061)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2050)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:669)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:657)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
at com.sun.proxy.$Proxy51.createTable(Unknown Source)
at io.confluent.connect.hdfs.hive.HiveMetaStore$5.call(HiveMetaStore.java:185)
at io.confluent.connect.hdfs.hive.HiveMetaStore$5.call(HiveMetaStore.java:182)
at io.confluent.connect.hdfs.hive.HiveMetaStore.doAction(HiveMetaStore.java:87)
at io.confluent.connect.hdfs.hive.HiveMetaStore.createTable(HiveMetaStore.java:193)
... 7 more
2016-10-06 14:37:29,967 - ERROR [pool-1-thread-1:WorkerSinkTask@402] - Task is being killed and will not recover until manually restarted
2016-10-06 14:37:30,494 - INFO [pool-1-thread-1:WorkerSinkTask@261] - WorkerSinkTask{id=avro-hdfs-connector-1-2} Committing offsets
2016-10-06 14:37:30,509 - ERROR [pool-1-thread-1:WorkerTask@142] - Task avro-hdfs-connector-1-2 threw an uncaught and unrecoverable exception
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:403)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:227)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:170)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:142)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

@michaelandrepearce
Copy link

michaelandrepearce commented Oct 10, 2016

Looks like something simple like altering these methods in HiveMetaStore.java to replace the "." with "_" would make hive happy.

hivemetastore-patch.diff.txt

@bwilliams42
Copy link
Author

@michaelandrepearce humm..did you try this fork?

@ig-michaelpearce
Copy link
Contributor

Sorry, no we had seen this is duplicate of #70 which got merged to truck which we tried that from the master as noted. Comment probably is better fitted to that. Saying that looking at the code in this branch also it doesn't seem to address issue that the table name cannot have dot's in it hcatalog/hive cannot have a table name with dots. (e.g. currently table name = topic name)

We've supplied our current work around for this which is simply making the table name in catalog/hive be the same as the topic name but with "." replaced with "_" #137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants