Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem consuming topic with multiple workers #223

Closed
talnicolas opened this issue Aug 31, 2017 · 2 comments
Closed

Problem consuming topic with multiple workers #223

talnicolas opened this issue Aug 31, 2017 · 2 comments

Comments

@talnicolas
Copy link

talnicolas commented Aug 31, 2017

Hi,

I have a topic containing roughly 42 millions messages that I want to dump to HDFS. At first I had only one worker instance with the following connector:

config: {
  connector.class: "HdfsSinkConnector",
  topics.dir: "/hdfs/path/topicname",
  hadoop.conf.dir: "/etc/hadoop/conf",
  flush.size: "50000",
  schema.compatibility: "BACKWARD",
  tasks.max: "16",
  timezone: "UTC",
  connect.hdfs.principal: "someuser",
  connect.hdfs.keytab: "somekeytab",
  topics: "topicname",
  hdfs.url: "hdfs://hdfsurl/",
  hive.database: "databasename",
  rotate.interval.ms: "15000",
  hdfs.authentication.kerberos: "true",
  hive.metastore.uris: "thrift://metastoreuri:9083",
  logs.dir: "/hdfs/path/topicname/logs",
  partition.field.name: "transactiondateid",
  hive.integration: "true",
  partitioner.class: "mycustompartitioner",
  hdfs.namenode.principal: "hdfs/principal",
  name: "connectorname",
  hive.conf.dir: "/etc/hive/conf",
  rotate.schedule.interval.ms: "15000"
}

I had the feeling that it was not consuming enough from the topic (on and off, low volume). I followed those steps to add two other worker instances:

  • Pause the connector through the REST API
  • Stopped the running instance
  • Started the three worker instances sequentially (docker containers, same CONNECT_GROUP_ID)
  • Resumes the connector through the REST API

It has been running for more than an hour and what I see now is that only one worker is sometimes actually commiting to HDFS but in a irregular way: a few thousands messages during 1 or 2 minutes and then nothing for 5 to 10 minutes. Most of the time it is just outputting those in the logs for the three workers:

....
[2017-08-31 13:46:32,845] INFO Started recovery for topic partition topicname-12 (io.confluent.connect.hdfs.TopicPartitionWriter)
[2017-08-31 13:46:34,138] INFO Finished recovery for topic partition topicname-15 (io.confluent.connect.hdfs.TopicPartitionWriter)
[2017-08-31 13:46:34,138] INFO Started recovery for topic partition topicname-14 (io.confluent.connect.hdfs.TopicPartitionWriter)
[2017-08-31 13:46:34,165] INFO Finished recovery for topic partition topicname-7 (io.confluent.connect.hdfs.TopicPartitionWriter)
.....

On one of the worker I get a few of those:

[2017-08-31 13:48:05,417] INFO Cannot acquire lease on WAL hdfs://hdfsurl///hdfs/path/topicname/logs/topicname/12/log (io.confluent.connect.hdfs.wal.FSWAL)
[2017-08-31 13:48:06,285] INFO Cannot acquire lease on WAL hdfs://hdfsurl///hdfs/path/topicname/logs/topicname/8/log (io.confluent.connect.hdfs.wal.FSWAL)
...

To me it looks like there are a lot of rebalance that are done and I don't understand why this is mainly one worker that is writing to HDFS. I thought that 42M messages would have been ingested much faster through Connect.

Right now the lag is reducing very very slowly and it doesn't seem like it will come to 0 anytime soon.

Do you have some guidance on how to configure properly a cluster of worker instances? Do you see anything weird with my configuration?

Thanks.

@talnicolas
Copy link
Author

Actually looking at the lag on my topic partitions, it seems like this one worker that was producing to HDFS did consume all of the messages for the partitions that were assigned to it. 9 partitions have been completely consumed (topic has 28 partitions), others still have a ~1M lag. So it confirms that only one worker is doing its job in my cluster...

@talnicolas talnicolas changed the title Tasks repartition with multiple workers Problem consuming topic with multiple workers Aug 31, 2017
@codenamelxl
Copy link

It might be related to this: #142 :-?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants