-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hdfs connector not consuming kafka topic #162
Comments
@heifrank I see two things I would change here to troubleshoot:
|
@cotedm Thanks for your reply.
It's wierd that my offset.flush.interval.ms is set to 10000 but it didn't show any commit message in this time interval. I waited 3 minutes and check kafka manager but didn't find any consumer consuming my topic. However using connect-file-sink properties works fine. |
@heifrank do you see any directories created in HDFS? If the connection to HDFS is working properly, you should see a |
@cotedm the /topics and /logs directory are created, but /logs is empty. /topics only has a /+tmp directory which is empty too. So it can't be an permission issue. Two things I notice:
|
@heifrank I think the problem then is with the fact that we don't currently have a good way for you to write Json data built into the connector. It's some future work (see #74) but there are a lot of decisions to be made in order for it to really work for everyone. In the meantime, you can directly dump the Json data you have in to HDFS via the pluggable format of the connector. There is a connector built off of this one that has a SourceFormat that could work for you. Note, that hive integration won't work if you do this, but you can get your Json data over this way. If you are wondering, the easiest way to plug this in is to build that connector's jar file and add the jar to the classpath. Then update the |
I have the same problem with json format record. Like heifrank said: it works fine by using avro format and schema registry. But once I changed to json, it could write data in topic but couldn't be consumed from kafka-hdfs-sink connector... see error logs below [2017-04-04 13:49:49,835] ERROR Failed to convert config data to Kafka Connect format: (org.apache.kafka.connect.storage.KafkaConfigBackingStore:440) |
@xiabai84 Your problem seems unrelated. It looks like you have data in the Kafka Connect config topic that is in a different format than expected. This could happen if you had multiple Kafka Connect clusters trying to write to the same topic (with different internal converter configs), or a misconfigured worker. I'd suggest opening a new issue as this is unrelated to the original issue in this thread. |
@heifrank Note that
in the log. The lack of these messages suggests you're not hitting any of the conditions that triggers rotating files. If you can't get any of these to trigger, you could also try the |
I have the same problem as @heifrank . I am using the latest confluentinc/cp-kafka-connect docker container. Connector job seems to stuck with
As you see I also tried with ParquetFormat the result is the same. |
Still facing same problem as @heifrank and @baluchicken. @ewencp , any work around or suggestion would be appreciated? hdfs config
connector config
My schema with payload
Problem statement1. If I enable schema
|
Same problem exists for me: |
@gwenshap , @ewencp i am also having the same issue. Unable to figure out the exact error and fix the issue. Kindly help in resolving. ERROR: { |
Same problem here, all schemas disabled in the Kafka HDFS connector, but receiving error message:
|
Hi all,
I came to a problem that hdfs connector not consuming my kafka topic, however connect-file-sink do. After I started the job, i've wait about 3 minutes but none commit log ever show up, so i kill the job using ctrl + c. I noticed an error log appeared. The detail are as shown below.
The runtime log is as follows:
hdfs.url = hdfs://10.103.18.9:9000
hdfs.authentication.kerberos = false
hive.metastore.uris =
partition.field.name = date kerberos.ticket.renew.period.ms = 3600000 shutdown.timeout.ms = 3000
partitioner.class = io.confluent.connect.hdfs.partitioner.FieldPartitioner
storage.class = io.confluent.connect.hdfs.storage.HdfsStorage path.format = (io.confluent.connect.hdfs.HdfsSinkConnectorConfig:135)
[2017-01-10 12:37:10,018] INFO Hadoop configuration directory (io.confluent.connect.hdfs.DataWriter:94)
[2017-01-10 12:37:10,295] WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (org.apache.hadoop.util.NativeCodeLoader:62)
[2017-01-10 12:37:10,901] INFO Started recovery for topic partition test_kafka_docs2-2 (io.confluent.connect.hdfs.TopicPartitionWriter:193)
[2017-01-10 12:37:10,912] INFO Finished recovery for topic partition test_kafka_docs2-2 (io.confluent.connect.hdfs.TopicPartitionWriter:208) [2017-01-10 12:37:10,912] INFO Started recovery for topic partition test_kafka_docs2-1 (io.confluent.connect.hdfs.TopicPartitionWriter:193) [2017-01-10 12:37:10,916] INFO Finished recovery for topic partition test_kafka_docs2-1 (io.confluent.connect.hdfs.TopicPartitionWriter:208) [2017-01-10 12:37:10,917] INFO Started recovery for topic partition test_kafka_docs2-0 (io.confluent.connect.hdfs.TopicPartitionWriter:193) [2017-01-10 12:37:10,921] INFO Finished recovery for topic partition test_kafka_docs2-0 (io.confluent.connect.hdfs.TopicPartitionWriter:208)
[2017-01-10 12:37:10,921] INFO Started recovery for topic partition test_kafka_docs2-5 (io.confluent.connect.hdfs.TopicPartitionWriter:193) [2017-01-10 12:37:10,925] INFO Finished recovery for topic partition test_kafka_docs2-5 (io.confluent.connect.hdfs.TopicPartitionWriter:208)
[2017-01-10 12:37:10,925] INFO Started recovery for topic partition test_kafka_docs2-4 (io.confluent.connect.hdfs.TopicPartitionWriter:193)
[2017-01-10 12:37:10,929] INFO Finished recovery for topic partition test_kafka_docs2-4 (io.confluent.connect.hdfs.TopicPartitionWriter:208)
[2017-01-10 12:37:10,929] INFO Started recovery for topic partition test_kafka_docs2-3 (io.confluent.connect.hdfs.TopicPartitionWriter:193)
[2017-01-10 12:37:10,932] INFO Finished recovery for topic partition test_kafka_docs2-3 (io.confluent.connect.hdfs.TopicPartitionWriter:208)
[2017-01-10 12:37:10,932] INFO Sink task org.apache.kafka.connect.runtime.WorkerSinkTask@47c548fc finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:155)
^C[2017-01-10 12:40:13,518] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:68)
[2017-01-10 12:40:13,527] INFO Stopped ServerConnector@3382f8ae{HTTP/1.1}{0.0.0.0:8083} (org.eclipse.jetty.server.ServerConnector:306)
[2017-01-10 12:40:13,539] INFO Stopped o.e.j.s.ServletContextHandler@2974f221{/,null,UNAVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:865)
[2017-01-10 12:40:13,541] INFO Herder stopping (org.apache.kafka.connect.runtime.standalone.StandaloneHerder:62)
[2017-01-10 12:40:13,541] INFO Stopping task hdfs-sink3-0 (org.apache.kafka.connect.runtime.Worker:305)
[2017-01-10 12:40:13,543] INFO Starting graceful shutdown of thread WorkerSinkTask-hdfs-sink3-0 (org.apache.kafka.connect.util.ShutdownableThread:119)
[2017-01-10 12:40:18,544] INFO Forcing shutdown of thread WorkerSinkTask-hdfs-sink3-0 (org.apache.kafka.connect.util.ShutdownableThread:141)
[2017-01-10 12:40:18,546] ERROR Graceful stop of task org.apache.kafka.connect.runtime.WorkerSinkTask@47c548fc failed. (org.apache.kafka.connect.runtime.Worker:312)
[2017-01-10 12:40:18,564] INFO Stopping connector hdfs-sink3 (org.apache.kafka.connect.runtime.Worker:226)
[2017-01-10 12:40:18,565] INFO Stopped connector hdfs-sink3 (org.apache.kafka.connect.runtime.Worker:240)
[2017-01-10 12:40:18,565] INFO Herder stopped (org.apache.kafka.connect.runtime.standalone.StandaloneHerder:77)
[2017-01-10 12:40:18,565] INFO Worker stopping (org.apache.kafka.connect.runtime.Worker:115)
[2017-01-10 12:40:18,566] INFO Stopped FileOffsetBackingStore (org.apache.kafka.connect.storage.FileOffsetBackingStore:61)
[2017-01-10 12:40:18,566] INFO Worker stopped (org.apache.kafka.connect.runtime.Worker:155)
[2017-01-10 12:40:18,566] INFO Kafka Connect stopped (org.apache.kafka.connect.runtime.Connect:74)
The command i start job is ./bin/connect-standalone etc/kafka/connect-standalone.properties etc/kafka-connect-hdfs/quickstart-hdfs.properties
The connect-standalone.properties is as follows:
bootstrap.servers=10.103.17.106:9092
key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter internal.key.converter.schemas.enable=false internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets offset.flush.interval.ms=10000
The quickstart-hdfs.properties is as follows:
name=hdfs-sink3 connector.class=io.confluent.connect.hdfs.HdfsSinkConnector tasks.max=1 topics=test_kafka_docs2
#topics=test_hdfs_kafka hdfs.url=hdfs://10.103.18.9:9000
flush.size=1000
Could anyone give some suggestions? thanks
The text was updated successfully, but these errors were encountered: