Problem consuming topic with multiple workers #223

talnicolas · 2017-08-31T13:54:56Z

Hi,

I have a topic containing roughly 42 millions messages that I want to dump to HDFS. At first I had only one worker instance with the following connector:

config: {
  connector.class: "HdfsSinkConnector",
  topics.dir: "/hdfs/path/topicname",
  hadoop.conf.dir: "/etc/hadoop/conf",
  flush.size: "50000",
  schema.compatibility: "BACKWARD",
  tasks.max: "16",
  timezone: "UTC",
  connect.hdfs.principal: "someuser",
  connect.hdfs.keytab: "somekeytab",
  topics: "topicname",
  hdfs.url: "hdfs://hdfsurl/",
  hive.database: "databasename",
  rotate.interval.ms: "15000",
  hdfs.authentication.kerberos: "true",
  hive.metastore.uris: "thrift://metastoreuri:9083",
  logs.dir: "/hdfs/path/topicname/logs",
  partition.field.name: "transactiondateid",
  hive.integration: "true",
  partitioner.class: "mycustompartitioner",
  hdfs.namenode.principal: "hdfs/principal",
  name: "connectorname",
  hive.conf.dir: "/etc/hive/conf",
  rotate.schedule.interval.ms: "15000"
}

I had the feeling that it was not consuming enough from the topic (on and off, low volume). I followed those steps to add two other worker instances:

Pause the connector through the REST API
Stopped the running instance
Started the three worker instances sequentially (docker containers, same CONNECT_GROUP_ID)
Resumes the connector through the REST API

It has been running for more than an hour and what I see now is that only one worker is sometimes actually commiting to HDFS but in a irregular way: a few thousands messages during 1 or 2 minutes and then nothing for 5 to 10 minutes. Most of the time it is just outputting those in the logs for the three workers:

....
[2017-08-31 13:46:32,845] INFO Started recovery for topic partition topicname-12 (io.confluent.connect.hdfs.TopicPartitionWriter)
[2017-08-31 13:46:34,138] INFO Finished recovery for topic partition topicname-15 (io.confluent.connect.hdfs.TopicPartitionWriter)
[2017-08-31 13:46:34,138] INFO Started recovery for topic partition topicname-14 (io.confluent.connect.hdfs.TopicPartitionWriter)
[2017-08-31 13:46:34,165] INFO Finished recovery for topic partition topicname-7 (io.confluent.connect.hdfs.TopicPartitionWriter)
.....

On one of the worker I get a few of those:

[2017-08-31 13:48:05,417] INFO Cannot acquire lease on WAL hdfs://hdfsurl///hdfs/path/topicname/logs/topicname/12/log (io.confluent.connect.hdfs.wal.FSWAL)
[2017-08-31 13:48:06,285] INFO Cannot acquire lease on WAL hdfs://hdfsurl///hdfs/path/topicname/logs/topicname/8/log (io.confluent.connect.hdfs.wal.FSWAL)
...

To me it looks like there are a lot of rebalance that are done and I don't understand why this is mainly one worker that is writing to HDFS. I thought that 42M messages would have been ingested much faster through Connect.

Right now the lag is reducing very very slowly and it doesn't seem like it will come to 0 anytime soon.

Do you have some guidance on how to configure properly a cluster of worker instances? Do you see anything weird with my configuration?

Thanks.

The text was updated successfully, but these errors were encountered:

talnicolas · 2017-08-31T15:48:47Z

Actually looking at the lag on my topic partitions, it seems like this one worker that was producing to HDFS did consume all of the messages for the partitions that were assigned to it. 9 partitions have been completely consumed (topic has 28 partitions), others still have a ~1M lag. So it confirms that only one worker is doing its job in my cluster...

codenamelxl · 2017-09-22T04:22:09Z

It might be related to this: #142 :-?

talnicolas changed the title ~~Tasks repartition with multiple workers~~ Problem consuming topic with multiple workers Aug 31, 2017

talnicolas closed this as completed Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem consuming topic with multiple workers #223

Problem consuming topic with multiple workers #223

talnicolas commented Aug 31, 2017 •

edited

Loading

talnicolas commented Aug 31, 2017

codenamelxl commented Sep 22, 2017

Problem consuming topic with multiple workers #223

Problem consuming topic with multiple workers #223

Comments

talnicolas commented Aug 31, 2017 • edited Loading

talnicolas commented Aug 31, 2017

codenamelxl commented Sep 22, 2017

talnicolas commented Aug 31, 2017 •

edited

Loading