You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kafka Connect HDFS is running in distributed mode (the same problem was observed in standalone mode).
Sometimes after Kafka Connect HDFS is restarted, small (4 bytes) parquet files appear in some landing directories on HDFS.
Kafka Connect HDFS reads those invalid Parquet files and cannot start anymore.
Kafka Connect HDFS version: 3.3.0
Stacktrace:
ERROR Exception on topic partition table_name-2: (io.confluent.connect.hdfs.TopicPartitionWriter)
java.io.IOException: Could not read footer: java.lang.RuntimeException: hdfs://staging/staging/raw/kafka_topics/table_name/partition=2/table_name+2+0010255764+0010265763.parquet is not a Parquet file (too small)
at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:247)
at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:188)
at org.apache.parquet.hadoop.ParquetReader.<init>(ParquetReader.java:114)
at org.apache.parquet.hadoop.ParquetReader.<init>(ParquetReader.java:47)
at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:192)
at io.confluent.connect.hdfs.parquet.ParquetFileReader.getSchema(ParquetFileReader.java:43)
at io.confluent.connect.hdfs.TopicPartitionWriter.write(TopicPartitionWriter.java:275)
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:234)
at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:103)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:435)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:251)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:180)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:148)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: hdfs://staging/staging/raw/kafka_topics/table_name/partition=2/table_name+2+0010255764+0010265763.parquet is not a Parquet file (too small)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412)
at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)
at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)
... 4 more
The text was updated successfully, but these errors were encountered:
Kafka Connect HDFS is running in distributed mode (the same problem was observed in standalone mode).
Sometimes after Kafka Connect HDFS is restarted, small (4 bytes) parquet files appear in some landing directories on HDFS.
Kafka Connect HDFS reads those invalid Parquet files and cannot start anymore.
Kafka Connect HDFS version: 3.3.0
Stacktrace:
The text was updated successfully, but these errors were encountered: