Skip to content

Conversation

@gabemontero
Copy link
Collaborator

@gabemontero gabemontero commented May 3, 2024

finally caught a rash of pendings live and realized that while the amount of pendings oscillated, it was not the same taskruns,

so created new metric for taskruns, and then a similar one for pipelineruns

did not add a name label, but the controller's cache which is reset on each scan tracks specific runs

also made reason/message check more flexible, allow either to be set

and added env var based filter specification to rule out konflux development namespaces which are more likely to have erroneous, unanticipated TaskRun construction issues

rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED

@enarha @jkhelil fyi / ptal

@savitaashture - a peek into the "too soon to upstream" type of metrics I'm building for Konflux to satisfy app sre level SLI/SLA wrt metrics, alerts, docs

Once I fee really good about metrics in this component, I try to move them to tektoncd/pipelines

@gabemontero gabemontero requested review from enarha and jkhelil May 3, 2024 19:42
@gabemontero gabemontero changed the title tweaks to 'taskrun_pod_create_not_attempted_count' after konflux prod testing PLNSRVCE-1692: tweaks to 'taskrun_pod_create_not_attempted_count' after konflux prod testing May 3, 2024
@gabemontero
Copy link
Collaborator Author

/hold

@gabemontero
Copy link
Collaborator Author

@gabemontero
Copy link
Collaborator Author

closing out in favor of a new PR off the same branch

@gabemontero gabemontero closed this May 8, 2024
@gabemontero
Copy link
Collaborator Author

I though git would allow that but it doesn't ... reopening / rewording

@gabemontero gabemontero reopened this May 8, 2024
@gabemontero gabemontero changed the title PLNSRVCE-1692: tweaks to 'taskrun_pod_create_not_attempted_count' after konflux prod testing PLNSRVCE-1692: overhaui taskrun deadlocked to work of same taskrun flagged on consecutive scans; add similar metric for pipelineruns May 8, 2024
@gabemontero gabemontero changed the title PLNSRVCE-1692: overhaui taskrun deadlocked to work of same taskrun flagged on consecutive scans; add similar metric for pipelineruns PLNSRVCE-1692: overhaul taskrun controller deadlocked to work of same taskrun flagged on consecutive scans; add similar metric for pipelinerun controller May 8, 2024
@gabemontero gabemontero force-pushed the deadlock-stat-handle-pending branch 2 times, most recently from 4526319 to 2483899 Compare May 9, 2024 15:27
…d on consecutive scans; add similar metric for pipelinerun controller

finally caught a rash of pendings live and realized that while the amount of pendings oscillated, it was not the same taskruns,
so created new metric for taskruns, and then a similar one for pipelineruns

did not add a name label, but the controller's cache which is reset on each scan tracks specific runs

also made reason/message check more flexible, allow either to be set

and added env var based filter specification to rule out konflux development namespaces which are more likely to have erroneous, unanticipated TaskRun construction issues

rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED
@gabemontero gabemontero force-pushed the deadlock-stat-handle-pending branch from 2483899 to 62fbf30 Compare May 9, 2024 17:41
@gabemontero gabemontero merged commit d010978 into openshift-pipelines:main May 13, 2024
@gabemontero gabemontero deleted the deadlock-stat-handle-pending branch May 13, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant