Allow Apache Kafka scaler to scale using sum of lag for all topics within a consumer group #2409

PaulLiang1 · 2021-12-18T00:34:04Z

Allow kafka scaler to use sum of lag for all topic partition when no topic is supplied.
This is useful when the consumer is subscribed to multiple topics;

Checklist

Commits are signed with Developer Certificate of Origin (DCO - learn more)
Tests have been added
A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
A PR is opened to update the documentation on (repo) (if applicable) docs: update Apache Kafka Scaler Doc for multi topic lag keda-docs#613
Changelog has been updated

JorTurFer · 2021-12-20T07:43:46Z

/run-e2e kafka.test*
Update: You can check the progres here

JorTurFer

I'm not an expert in Kafka but LGTM.
Could you add a unit test checking the new metric name please?
You can add the test case here

CHANGELOG.md

pkg/scalers/kafka_scaler.go

JorTurFer · 2021-12-20T08:21:18Z

hi @PaulLiang1
e2e test is failing, could you review it? You could find the logs here

JorTurFer · 2021-12-20T11:08:04Z

/run-e2e kafka.test*
Update: You can check the progres here

PaulLiang1 · 2021-12-20T11:33:36Z

/run-e2e kafka.test* Update: You can check the progres here

Hi @JorTurFer , thanks for your help.
the test failed again. i will debug it further & update the ticket once i had more info.

JorTurFer · 2021-12-20T11:52:51Z

the test failed again. i will debug it further & update the ticket once i had more info.

Thanks @PaulLiang1 ,
No rush at all, when you have some time

PaulLiang1 · 2021-12-20T14:26:44Z

the test failed again. i will debug it further & update the ticket once i had more info.

Thanks @PaulLiang1 , No rush at all, when you have some time

Hi @JorTurFer , turns out i previously had some mis-understanding for lagThreshold..
I've updated e2e test. would you mind trigger other run for me? thanks

JorTurFer · 2021-12-20T14:41:51Z

/run-e2e kafka.test*
Update: You can check the progres here

PaulLiang1 · 2021-12-20T22:55:36Z

/run-e2e kafka.test* Update: You can check the progres here

Hi @JorTurFer . the tests passed. would you mind take another look at the PR? thanks

JorTurFer

LGTM!
Only a little suggestion

pkg/scalers/kafka_scaler.go

JorTurFer · 2021-12-22T12:53:57Z

/run-e2e kafka.test*
Update: You can check the progres here

JorTurFer

LGTM!
Thanks for this contribution ❤️
(in any case, let's wait till other pair of eyes takes a look at this)

bpinske · 2021-12-23T22:45:08Z

Any particular reason you chose to implement this, rather than setting up different kafka triggers for each topic individually, all within the same scaledObject?

I have no idea what your applications do, but I'd imagine the different kafka topics to be producing different messages that get consumed in very different ways. Is it really valuable to be having a mixed measure that doesn't discriminate based on what the messages really are?

I do have consumers that consume from multiple topics, and I've had good success with tracking each topic separately with different thresholds set to account for different volumes/computational intensity to process the different message types.

PaulLiang1 · 2021-12-23T23:49:40Z

Any particular reason you chose to implement this, rather than setting up different kafka triggers for each topic individually, all within the same scaledObject?

I have no idea what your applications do, but I'd imagine the different kafka topics to be producing different messages that get consumed in very different ways. Is it really valuable to be having a mixed measure that doesn't discriminate based on what the messages really are?

I do have consumers that consume from multiple topics, and I've had good success with tracking each topic separately with different thresholds set to account for different volumes/computational intensity to process the different message types.

Thanks for the discussion.

Context:

We had a homogeneous topic setup -> 1:1 relationship between topic to event type;
For the very specific app we talking about here, it need to consume ~60 types of event, where from business's perspective, they are of similar "importance";
These event are also ingested from multiple clusters from different geographic locations and being processed in central aggregation cluster via MirrorMaker;
MirrorMaker by default prefix the mirrored/destination topic with the source cluster, so if we have ~60 topics of different event type with 8 regions, the total mount of topic could become 480;
On actual deployment we do bulkhead the application into different consumer groups / deployments etc for DR reasons, but from business's perspective they are equally important;

...setting up different kafka triggers for each topic individually...

Due to the amount of topics it need to consume, it became an operational burden to list all the topics into the trigger; Especially the consumer could be using wildcards to "auto discover" the topics.

...Is it really valuable to be having a mixed measure that doesn't discriminate based on what the messages really are...

It would be great that decision can made by the user, instead of being enforced by;
Though it's could be good indicator that the doc can be updated to highlight the differences between these approaches.

...I've had good success with tracking each topic separately with different thresholds set to account for different volumes...

This change does not forbid that, it's providing an alternative for ppl to adopt to their scenario.

bpinske · 2021-12-24T00:16:36Z

Context:

We had a homogeneous topic setup -> 1:1 relationship between topic to event type;

For the very specific app we talking about here, it need to consume ~60 types of event, where from business's perspective, they are of similar "importance";

These event are also ingested from multiple clusters from different geographic locations and being processed in central aggregation cluster via MirrorMaker;

MirrorMaker by default prefix the mirrored/destination topic with the source cluster, so if we have ~60 topics of different event type with 8 regions, the total mount of topic could become 480;

On actual deployment we do bulkhead the application into different consumer groups / deployments etc for DR reasons, but from business's perspective they are equally important;

...setting up different kafka triggers for each topic individually...

Due to the amount of topics it need to consume, it became an operational burden to list all the topics into the trigger; Especially the consumer could be using wildcards to "auto discover" the topics.

...Is it really valuable to be having a mixed measure that doesn't discriminate based on what the messages really are...

It would be great that decision can made by the user, instead of being enforced by;

Though it's could be good indicator that the doc can be updated to highlight the differences between these approaches.

...I've had good success with tracking each topic separately with different thresholds set to account for different volumes...

This change does not forbid that, it's providing an alternative for ppl to adopt to their scenario.

I'll admit I never considered the possibility of somebody consuming from 480 topics at once :)

It would definitely be good to emphasize the autodiscovery behaviour that leaving the topic string empty would be then in the docs. I've had problems before in almost the inverse situation where I was accidentally scaling based off of the total lag of all consumer groups on a topic.

One limitation I can think of is that the autodiscovery is limited to a single Kafka cluster. All topics you discovery from must be present within the same Cluster. You could, of course, just supply multiple kafka scaler triggers each autodiscovering a different cluster if necessary. Just something else to maybe highlight in the docs.

I suppose the other thing to be aware of is that with 480 topics, any significant number of partitions per topic will quickly explode to a very large total number of partitions that must be queried. There is currently another active PR to ensure querying brokers is concurrent. By the time this PR gets included in the next public release, concurrency should alleviate any performance concerns there.

The PR seems sane to me.

PaulLiang1 · 2021-12-24T00:40:55Z

It would definitely be good to emphasize the autodiscovery behaviour that leaving the topic string empty would be then in the docs. I've had problems before in almost the inverse situation where I was accidentally scaling based off of the total lag of all consumer groups on a topic.

PR for doc: kedacore/keda-docs#613
But it did not highlight the difference between auto discovery vs multi-tigger;

... limited to a single Kafka cluster...

In the example provided above, topics from other geo location are mirrored into a single aggregation cluster, where the consumer is only consuming from the single cluster.

... just supply multiple kafka scaler triggers each autodiscovering a different cluster if necessary...

Correct. This change does not forbid that.

zroubalik · 2022-01-03T11:00:09Z

But it did not highlight the difference between auto discovery vs multi-tigger;

@PaulLiang1 do you think you can a few words about this to the docs? So users are aware of the consequences?

zroubalik

@PaulLiang1 Could you please rebase your PR, there are conflicts because of #2409

Thanks!

PaulLiang1 · 2022-01-09T23:12:16Z

Sure, sorry just back from holidays. will work on it in the next few days

Signed-off-by: Jinli Liang <[email protected]>

PaulLiang1 · 2022-01-11T01:45:30Z

But it did not highlight the difference between auto discovery vs multi-tigger;

@PaulLiang1 do you think you can a few words about this to the docs? So users are aware of the consequences?

Hi @zroubalik

Changes have rebased, would you mind kick off another run of integration test for me? thanks;
Failure for Github action does not seem to relate to my change
where it complain about The unauthenticated git protocol on port 9418 is no longer supported.
what would be the best course of action to address this?
Doc had been updated with extra info, would you mind take a look?

zroubalik · 2022-01-11T09:57:53Z

/run-e2e kafka.test*
Update: You can check the progres here

zroubalik

LGTM

Once the Doc PR is fixed, we can merge this one. Great job @PaulLiang1

shaswa · 2022-04-27T02:51:37Z

@PaulLiang1 Thank you for this! What would be the maximum replicas? Is it the sum of partitions of all the topics in a consumer group?

For example. Consumer group has Topic 1 with 10 partitions, and Topic 2 with 6 partitions. Will it scale up to 16 pods or 10 pods?

Edit: Oops, just found the answer to my question as soon as I posted this!

PaulLiang1 · 2022-04-27T03:33:22Z

@shaswa

when allowIdleConsumers=true it scales to the number of totalCGLag/desiredLag
when allowIdleConsumers=false it scales to the number of max(totalCGLag/desiredLag, totalNbOfPartitionsInCG), using the example outlined, it would be 16

ref:

https://github.com/kedacore/keda/pull/2409/files#diff-03cf62956adcec86c0717723ab146472f0979221b0e5942a6e5460fa6d5e4f08R452-R456

PaulLiang1 requested a review from a team as a code owner December 18, 2021 00:34

PaulLiang1 mentioned this pull request Dec 18, 2021

docs: update Apache Kafka Scaler Doc for multi topic lag kedacore/keda-docs#613

Merged

1 task

JorTurFer requested changes Dec 20, 2021

View reviewed changes

larvinloy reviewed Dec 20, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

larvinloy reviewed Dec 20, 2021

View reviewed changes

pkg/scalers/kafka_scaler.go Outdated Show resolved Hide resolved

PaulLiang1 force-pushed the allow-kafka-scaler-to-scale-without-a-specific-topic branch from 68650ca to d745707 Compare December 20, 2021 10:41

PaulLiang1 requested a review from JorTurFer December 20, 2021 22:55

PaulLiang1 mentioned this pull request Dec 22, 2021

Kafka scaler: concurrent offset fetches #2405

Merged

5 tasks

JorTurFer requested a review from a team December 22, 2021 07:44

JorTurFer requested changes Dec 22, 2021

View reviewed changes

pkg/scalers/kafka_scaler.go Outdated Show resolved Hide resolved

JorTurFer requested a review from a team December 22, 2021 07:56

PaulLiang1 requested review from JorTurFer and removed request for a team December 22, 2021 12:14

JorTurFer approved these changes Dec 22, 2021

View reviewed changes

JorTurFer requested a review from a team December 22, 2021 12:55

zroubalik reviewed Jan 3, 2022

View reviewed changes

zroubalik added this to the v2.6.0 milestone Jan 3, 2022

loicmathieu mentioned this pull request Jan 6, 2022

More flexible Kafka topic configuration #2447

Open

PaulLiang1 added 3 commits January 11, 2022 11:25

allow kafka scaler to lag for all topics within a consumer group

329c8bd

Signed-off-by: Jinli Liang <[email protected]>

update e2e test

41d7ad2

Signed-off-by: Jinli Liang <[email protected]>

address comment

f8b6735

Signed-off-by: Jinli Liang <[email protected]>

PaulLiang1 force-pushed the allow-kafka-scaler-to-scale-without-a-specific-topic branch from 36a3ae4 to f8b6735 Compare January 11, 2022 00:27

PaulLiang1 requested review from JorTurFer and zroubalik January 11, 2022 01:46

zroubalik approved these changes Jan 11, 2022

View reviewed changes

zroubalik merged commit 92c75bc into kedacore:main Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Apache Kafka scaler to scale using sum of lag for all topics within a consumer group #2409

Allow Apache Kafka scaler to scale using sum of lag for all topics within a consumer group #2409

PaulLiang1 commented Dec 18, 2021 •

edited

Loading

JorTurFer commented Dec 20, 2021 •

edited by github-actions bot

Loading

JorTurFer left a comment •

edited

Loading

JorTurFer commented Dec 20, 2021

JorTurFer commented Dec 20, 2021 •

edited by github-actions bot

Loading

PaulLiang1 commented Dec 20, 2021

JorTurFer commented Dec 20, 2021

PaulLiang1 commented Dec 20, 2021

JorTurFer commented Dec 20, 2021 •

edited by github-actions bot

Loading

PaulLiang1 commented Dec 20, 2021

JorTurFer left a comment

JorTurFer commented Dec 22, 2021 •

edited by github-actions bot

Loading

JorTurFer left a comment •

edited

Loading

bpinske commented Dec 23, 2021 •

edited

Loading

PaulLiang1 commented Dec 23, 2021

bpinske commented Dec 24, 2021 •

edited

Loading

PaulLiang1 commented Dec 24, 2021

zroubalik commented Jan 3, 2022

zroubalik left a comment

PaulLiang1 commented Jan 9, 2022

PaulLiang1 commented Jan 11, 2022

zroubalik commented Jan 11, 2022 •

edited by github-actions bot

Loading

zroubalik left a comment •

edited

Loading

shaswa commented Apr 27, 2022 •

edited

Loading

PaulLiang1 commented Apr 27, 2022

Allow Apache Kafka scaler to scale using sum of lag for all topics within a consumer group #2409

Allow Apache Kafka scaler to scale using sum of lag for all topics within a consumer group #2409

Conversation

PaulLiang1 commented Dec 18, 2021 • edited Loading

Checklist

JorTurFer commented Dec 20, 2021 • edited by github-actions bot Loading

JorTurFer left a comment • edited Loading

Choose a reason for hiding this comment

JorTurFer commented Dec 20, 2021

JorTurFer commented Dec 20, 2021 • edited by github-actions bot Loading

PaulLiang1 commented Dec 20, 2021

JorTurFer commented Dec 20, 2021

PaulLiang1 commented Dec 20, 2021

JorTurFer commented Dec 20, 2021 • edited by github-actions bot Loading

PaulLiang1 commented Dec 20, 2021

JorTurFer left a comment

Choose a reason for hiding this comment

JorTurFer commented Dec 22, 2021 • edited by github-actions bot Loading

JorTurFer left a comment • edited Loading

Choose a reason for hiding this comment

bpinske commented Dec 23, 2021 • edited Loading

PaulLiang1 commented Dec 23, 2021

bpinske commented Dec 24, 2021 • edited Loading

PaulLiang1 commented Dec 24, 2021

zroubalik commented Jan 3, 2022

zroubalik left a comment

Choose a reason for hiding this comment

PaulLiang1 commented Jan 9, 2022

PaulLiang1 commented Jan 11, 2022

zroubalik commented Jan 11, 2022 • edited by github-actions bot Loading

zroubalik left a comment • edited Loading

Choose a reason for hiding this comment

shaswa commented Apr 27, 2022 • edited Loading

PaulLiang1 commented Apr 27, 2022

PaulLiang1 commented Dec 18, 2021 •

edited

Loading

JorTurFer commented Dec 20, 2021 •

edited by github-actions bot

Loading

JorTurFer left a comment •

edited

Loading

JorTurFer commented Dec 20, 2021 •

edited by github-actions bot

Loading

JorTurFer commented Dec 20, 2021 •

edited by github-actions bot

Loading

JorTurFer commented Dec 22, 2021 •

edited by github-actions bot

Loading

JorTurFer left a comment •

edited

Loading

bpinske commented Dec 23, 2021 •

edited

Loading

bpinske commented Dec 24, 2021 •

edited

Loading

zroubalik commented Jan 11, 2022 •

edited by github-actions bot

Loading

zroubalik left a comment •

edited

Loading

shaswa commented Apr 27, 2022 •

edited

Loading