Vulnerability scanning encounters "etcdserver: request is too large" #899

FrederikNJS · 2022-01-10T18:33:18Z

What steps did you take and what happened:

I installed Starboard-operator using the helm chart and allowed it to run on my entire cluster. Some of the vulnerability scan jobs get stuck and the starboard-operator is logging messages about "etcdserver: request is too large". Here's a complete log line:

{"level":"error","ts":1641839401.717973,"logger":"controller.job","msg":"Reconciler error","reconciler group":"batch","reconciler kind":"Job","name":"scan-vulnerabilityreport-68cbdf566b","namespace":"starboard-system","error":"etcdserver: request is too large","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

I suspect that I have some images with way too many vulnerabilities... So being able to store them so I can track them down would be really nice.

What did you expect to happen:

I expected Starboard to be able to store the vulnerability reports properly.

Anything else you would like to add:

It seems to already be discussed in #208, but it seems that some information was stripped out of the vulnerabilityreport, and the issue was closed due to being "too unlikely", even though the issue still occurs for me.

My complete values for the helm chart is:

targetNamespaces: ""
trivy:
  githubToken: <REDACTED>
  resources:
    limits:
      memory: 1000Mi
    requests:
      memory: 1000Mi

Environment:

Helm chart version: 0.8.2
Starboard version (use starboard version): 0.13.2
Kubernetes version (use kubectl version): 1.19.7

The text was updated successfully, but these errors were encountered:

FrederikNJS · 2022-01-10T23:08:23Z

Additionally I can see that these stuck vulnerability scans count against the scanJobsConcurrentLimit, and the operator doesn't give up on them either when the scanJobTimeout expires...

My scanJobTimeout is set to the default 5 minutes, and I have seen jobs stuck for more than an hour, clogging up the system, blocking other scans from starting.

FrederikNJS · 2022-01-10T23:19:37Z

As a workaround, I have tried limiting trivy's severity to only include HIGH,CRITICAL, which of course cuts down on the amount of vulnerabilities to write into the VulnerabilityReport, and in turn makes the reports small enough to save to etcd. This seems to work nicely, it would however still be nice to be able to save all the vulnerabilities, to get a complete overview.

danielpacak · 2022-01-19T08:21:15Z

👋 @FrederikNS Thank you for the feedback. This is a well known limitation of Starboard (and K8s with its default etcd storage) right now, and we do not implement any fallback strategy. Do you have any ideas what we could do in such case?

BTW, is it possible to share the image and image size or at least the number of all vulnerabilities found by Trivy that cause this error?

Arabus · 2022-01-24T11:03:37Z

In the specific case that the report is too large I propose at least storing everything except the vulnerabilities list and adding an annotation starboard.aquasecurity.github.io/report-too-large=true or something, one could filter and monitor for

On a more general scope I assume compressing the vulnerabilities field could do the trick (helm went that way early on). OC this would require some changes throughout the complete tooling stack.

A more "advanced" change would be to allow storing the reports in a database, at least for the operator deployments. Maybe something memcache compatible, considering that the reports can be ephemeral. TBO abusing the ETCD resource store for this kind of data sounds like a malpractice altogether.

Another way might be to provide a report consumer that collects and stores the reports and services them on request e.g., via a webinterface.

bgoareguer · 2022-01-31T16:45:08Z

I like the current behavior of having the reports in same namespace as the resource they relate to. It makes it possible to use RBAC to restrict access to those reports (which may contain sensitive information).

If the reports are stored outside of Etcd, we need to make sure that we cannot access all the reports with a single set of credentials.

An idea would be to create a set of credentials in each namespace. Using the credentials from a namespace should only give access to the reports related to this namespace.
This kind of behavior could work well with Minio where we could have one bucket per namespace. The credentials stored in a namespace then only give access to the bucket corresponding to this namespace.

danielpacak added this to the Release v0.15.0 milestone Jan 12, 2022

NissesSenap mentioned this issue Jan 14, 2022

Starboard scan for more severitys XenitAB/terraform-modules#514

Merged

danielpacak removed this from the Release v0.15.0 milestone Jan 19, 2022

danielpacak added the 🚀 enhancement New feature or request label Jan 19, 2022

bgoareguer mentioned this issue Apr 14, 2022

integration with postee #1095

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulnerability scanning encounters "etcdserver: request is too large" #899

Vulnerability scanning encounters "etcdserver: request is too large" #899

FrederikNJS commented Jan 10, 2022 •

edited

Loading

FrederikNJS commented Jan 10, 2022

FrederikNJS commented Jan 10, 2022

danielpacak commented Jan 19, 2022 •

edited

Loading

Arabus commented Jan 24, 2022

bgoareguer commented Jan 31, 2022

Vulnerability scanning encounters "etcdserver: request is too large" #899

Vulnerability scanning encounters "etcdserver: request is too large" #899

Comments

FrederikNJS commented Jan 10, 2022 • edited Loading

FrederikNJS commented Jan 10, 2022

FrederikNJS commented Jan 10, 2022

danielpacak commented Jan 19, 2022 • edited Loading

Arabus commented Jan 24, 2022

bgoareguer commented Jan 31, 2022

FrederikNJS commented Jan 10, 2022 •

edited

Loading

danielpacak commented Jan 19, 2022 •

edited

Loading