-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive metastore exception fail some task of connector #218
Comments
Hello, I have same issue (see trace below), could someone give some advises and explanations. Thanks a lot!!! [2018-01-12 14:00:00,264] ERROR Task SYS_4G_PCMD_RAW-7 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerSinkTask:404) |
We have also been experiencing this issue on a single KC instance running in distributed mode (by itself). It seems to occur only when a message with a new value for the field the topic is being partitioned on. Kafka 0.10.1.1 & Confluent 3.1.2
|
Let me give you more information about the issue, we have 5 connectors, each one listening to one topic. We have 8 workers writing in //. The issue does not happened with every connector but with one of them. This issue seems really problematic as in our case, the connector is systematically re balanced in our worker-distributed context. And during re-balancing the workers is not writing any more. Then we have some delay in the writing process, which is bad, since we have to make sure that the data is available in less than 5 minutes in our case. |
May someone could help, the issue is identified since beginning Aug. 2017? |
We are also seeing this issue. This is a critical one because of the data loss that is happening in the connector. We where able to see the records that are missed in the .Trash/ in hdfs and it is located under the same folder structure prefixed with .Trash/ . We are thinking may be we can catch the exception that is happening in the function addHivePartition(final String location) and handle it in order for the data to be brought into the desired folder rather than .Trash folder. I some case if we increase the rotation.schedule.timeout.ms to more than an hour we are not seeing the movement to often but the exception is coming and sometimes the data is moved to .Trash Please let me if you guys think of any other options that we can try here @Cricket007 @ewencp @kkonstantine @maxzheng |
Hi,
We have connectors run distributed mode and set task number as 16. Then some of the tasks begin failing one by one by Hive meta store error(probably 2 or 3 tasks failed per day).But the connector will still be going well because not all of them failed.
We can restart that failed task, but want to find the root reason. Did anyone see this error before? Or give some advises. Thanks
I got failed workers of our connector by error like:
[2017-08-08 20:49:06,966] ERROR Task bid_parquet_prod_v00-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerSinkTask:449)
java.lang.RuntimeException: java.util.concurrent.ExecutionException: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Hive MetaStore exception
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:226)
at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:103)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:429)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:250)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:179)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:139)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:182)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Hive MetaStore exception
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:220)
... 12 more
Caused by: io.confluent.connect.hdfs.errors.HiveMetaStoreException: Hive MetaStore exception
at io.confluent.connect.hdfs.hive.HiveMetaStore.addPartition(HiveMetaStore.java:109)
at io.confluent.connect.hdfs.TopicPartitionWriter$3.call(TopicPartitionWriter.java:662)
at io.confluent.connect.hdfs.TopicPartitionWriter$3.call(TopicPartitionWriter.java:659)
... 4 more
Caused by: MetaException(message:Got exception: java.io.IOException Failed to move to trash: hdfs://nameservice1/topics/prod/bid_parquet_prod_v00/topic_name/year=2017/month=06/day=21/hour=14)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result$append_partition_by_name_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:51637)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result$append_partition_by_name_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:51596)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$append_partition_by_name_with_environment_context_result.read(ThriftHiveMetastore.java:51519)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_append_partition_by_name_with_environment_context(ThriftHiveMetastore.java:1667)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.append_partition_by_name_with_environment_context(ThriftHiveMetastore.java:1651)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:606)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:600)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
at com.sun.proxy.$Proxy52.appendPartition(Unknown Source)
at io.confluent.connect.hdfs.hive.HiveMetaStore$1.call(HiveMetaStore.java:97)
at io.confluent.connect.hdfs.hive.HiveMetaStore$1.call(HiveMetaStore.java:91)
at io.confluent.connect.hdfs.hive.HiveMetaStore.doAction(HiveMetaStore.java:87)
at io.confluent.connect.hdfs.hive.HiveMetaStore.addPartition(HiveMetaStore.java:103)
... 6 more
[2017-08-08 20:49:06,967] ERROR Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerSinkTask:450)
]
The text was updated successfully, but these errors were encountered: