scaling job trigger should still be "active" if trigger value cannot not be obtained temporarily #2561

taylorchu · 2022-01-26T00:41:01Z

Report

we sometimes see the job is under provisioned for a brief period because metrics-api times out.

But it should be applied to other scalers as well.

Expected Behavior

keda/pkg/scalers/metrics_api_scaler.go

Line 240 in f6e4be8

httpLog.Error(err, fmt.Sprintf("Error when checking metric value: %s", err))

it is common in keda that the trigger is inactive when trigger value cannot be obtained. but this is temporary.

Actual Behavior

scaler ignores this trigger, and under provisions.

Steps to Reproduce the Problem

specify metrics-api endpoint
shut down metrics-api endpoint after some success requests

Logs from KEDA operator

│ {"level":"error","ts":1643138171.2917557,"logger":"metrics_api_scaler","msg":"Error when checking metric value: Get \"http://xxxxx\": dial tcp xxxxxxxx:80: connec │

KEDA Version

2.5.0

Kubernetes Version

1.21

Platform

Google Cloud

Scaler Details

metrics-api

Anything else?

No response

The text was updated successfully, but these errors were encountered:

taylorchu · 2022-01-26T00:47:12Z

can we specify last trigger value for an error duration? similar to scaled object, we could add fallback duration to fallback config, and then expose fallback config in scaled job.
alternatively, expose hpa raw config like in scaled object. we could make hpa scale down slowly to wait for recovery.

JorTurFer · 2022-01-26T09:19:50Z

Hi @taylorchu
Did you try increasing the http timeout? This could improve the response in case of slow response from the upstream. Related with the fallback config for ScaledJob, I don't have any strong opinion, @kedacore/keda-core-contributors ?

Related with your second point, I don't get your point. What do you mean with hpa row config? The ScaledJob doesn't use HPA, it's the operator itself who manages the jobs, not the HPA Controller

taylorchu · 2022-01-26T17:37:57Z

Timeout won’t help because it is due to the endpoint is restarting, or autoscaled down. It really needs retry.

Ignore my second point. I thought it uses hpa.

JorTurFer · 2022-02-10T20:22:24Z

I think that this can be closed after #2604
There was an unexpected behavior that scaled to zero if error on the upstream
WDYT @taylorchu?

taylorchu · 2022-02-11T05:33:00Z

I just updated keda to the latest 3 hours ago, I will reply if it is still a problem later. it happens roughly once every 2 week.

taylorchu added the bug Something isn't working label Jan 26, 2022

tomkerkhove added this to Roadmap - KEDA Core Feb 10, 2022

tomkerkhove moved this to Backlog in Roadmap - KEDA Core Feb 10, 2022

taylorchu closed this as completed Feb 11, 2022

Repository owner moved this from Backlog to Ready To Ship in Roadmap - KEDA Core Feb 11, 2022

tomkerkhove moved this from Ready To Ship to Done in Roadmap - KEDA Core Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scaling job trigger should still be "active" if trigger value cannot not be obtained temporarily #2561

scaling job trigger should still be "active" if trigger value cannot not be obtained temporarily #2561

taylorchu commented Jan 26, 2022 •

edited

Loading

taylorchu commented Jan 26, 2022

JorTurFer commented Jan 26, 2022

taylorchu commented Jan 26, 2022

JorTurFer commented Feb 10, 2022 •

edited

Loading

taylorchu commented Feb 11, 2022

scaling job trigger should still be "active" if trigger value cannot not be obtained temporarily #2561

scaling job trigger should still be "active" if trigger value cannot not be obtained temporarily #2561

Comments

taylorchu commented Jan 26, 2022 • edited Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

taylorchu commented Jan 26, 2022

JorTurFer commented Jan 26, 2022

taylorchu commented Jan 26, 2022

JorTurFer commented Feb 10, 2022 • edited Loading

taylorchu commented Feb 11, 2022

taylorchu commented Jan 26, 2022 •

edited

Loading

JorTurFer commented Feb 10, 2022 •

edited

Loading