Skip to content
This repository was archived by the owner on Jan 28, 2022. It is now read-only.
This repository was archived by the owner on Jan 28, 2022. It is now read-only.

Delays in reconciliation under load #140

@stuartleeks

Description

@stuartleeks

(Pre-emptive shout-out to @EliiseS, @storey247 and @lawrencegripper as this work has been a group effort)
As mentioned in #131 we have been performing some load tests against the operator. Our initial load run shows raised work-queue latency and an increasing work-queue depth.

image

It's worth noting that the histogram buckets for the latency are 0.1s, 1s, 10s, so a value of 10 on the graph in effect means somewhere between 1s and 10s.

Looking at the metrics for the mock api that we're using for the load tests, the reponse times for that look pretty constant:

image

What we can see in the mock api metrics are periods of time where there are no requests being made to the API (and these become more pronounced as the test load ramps up).

Looking at this, our hypothesis was that there is something causing the reconciliation loops to block.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions