OCPBUGS-79467: Change default log level from DEBUG to INFO by gcs278 · Pull Request #1402 · openshift/cluster-ingress-operator

gcs278 · 2026-03-24T18:56:03Z

Summary

Fixes excessive log volume by changing the cluster-ingress-operator's default log level from DEBUG to INFO.

Problem

The operator was using zap.NewDevelopment() which defaults to DEBUG level, causing ~2,000 debug logs per hour with Sail Library integration. This created significant noise in production log aggregation systems.

Changes

Changed logger configuration (pkg/log/log.go):
- Use NewDevelopmentConfig() with explicit INFO level
- Keeps console encoding for human-readable logs
```
config := zap.NewDevelopmentConfig()
config.Level = zap.NewAtomicLevelAt(zapcore.InfoLevel)
```
Promoted important debug logs to INFO (pkg/operator/controller/gateway-status/controller.go):
- "no service was found for gateway" (V(1) → INFO)
- "no dnsrecords found for gateway" (V(1) → INFO)
- These messages are important for troubleshooting Gateway deployment issues
Added INFO logs alongside event recorder calls (7 controller files):
- Controller-runtime echoes recorder.Event() calls to DEBUG logs
- After switching to INFO level, these operational events became invisible in logs
- Added explicit log.Info() calls for 16 events including certificate lifecycle, IngressController admission, and DNS warnings
- Note: Events still exist as Kubernetes Events (visible via oc get events)

Implementation Approach

We chose to keep NewDevelopmentConfig() (with INFO level) instead of NewProduction() to maintain the human-readable console logging format.

NewProduction would use JSON encoding:

{"level":"info","ts":1774378939.054826,"logger":"operator.gatewayclass_controller","msg":"..."}

Our approach keeps console format:

2026-03-24T19:04:06.963Z  INFO  operator.gatewayclass_controller  msg...

Console format is more appropriate for OpenShift operators where cluster admins often debug via oc logs rather than log aggregation systems. This preserves familiar formatting while reducing log volume.

Impact

Before: ~2,000 debug logs/hour with Sail Library
After: Estimated <200 INFO+ logs/hour
Sail Library debug logs (drift detection, ownership checks) will no longer appear
Important operational messages remain visible
Log format unchanged - same human-readable console format as before

Testing

✅ Build verification: make build passed
✅ Unit tests: make test passed
✅ Code verification: make verify passed

Summary

Fixes excessive log volume by changing the cluster-ingress-operator's default log level from DEBUG to INFO.

Problem

The operator was using zap.NewDevelopment() which defaults to DEBUG level, causing ~2,000 debug logs per hour with Sail Library integration. This created significant noise in production log aggregation systems.

Changes

Changed logger configuration (pkg/log/log.go):

Switched from zap.NewDevelopment() to zap.NewProduction()

Sets default log level to INFO instead of DEBUG

Promoted important debug logs to INFO (pkg/operator/controller/gateway-status/controller.go):

"no service was found for gateway" (V(1) → INFO)

"no dnsrecords found for gateway" (V(1) → INFO)

These messages are important for troubleshooting Gateway deployment issues

Impact

Before: ~2,000 debug logs/hour with Sail Library

After: Estimated <200 INFO+ logs/hour

Sail Library debug logs (drift detection, ownership checks) will no longer appear

Important operational messages remain visible

Testing

✅ Build verification: make build passed

✅ Unit tests: make test passed

✅ Code verification: make verify passed

Related

Fixes: OCPBUGS-79467

🤖 Generated with Claude Code via /jira:solve OCPBUGS-79467 origin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-03-24T18:56:12Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: aabf73cb-cc60-4dba-a342-a49947bfa8be

📥 Commits

Reviewing files that changed from the base of the PR and between 08e4271 and 9fe57a5.

📒 Files selected for processing (8)

pkg/operator/controller/canary-certificate/controller.go
pkg/operator/controller/canary/daemonset.go
pkg/operator/controller/certificate-publisher/publish_ca.go
pkg/operator/controller/certificate-publisher/publish_certs.go
pkg/operator/controller/certificate/ca.go
pkg/operator/controller/certificate/default_cert.go
pkg/operator/controller/dns/controller.go
pkg/operator/controller/ingress/controller.go

✅ Files skipped from review due to trivial changes (4)

pkg/operator/controller/canary-certificate/controller.go
pkg/operator/controller/dns/controller.go
pkg/operator/controller/certificate/default_cert.go
pkg/operator/controller/ingress/controller.go

🚧 Files skipped from review as they are similar to previous changes (3)

pkg/operator/controller/canary/daemonset.go
pkg/operator/controller/certificate/ca.go
pkg/operator/controller/certificate-publisher/publish_ca.go

📝 Walkthrough

Walkthrough

Logger construction was changed to use zap.NewDevelopmentConfig() with config.Level set to zap.NewAtomicLevelAt(zapcore.InfoLevel) before config.Build(...). Multiple controllers had informational logging added or elevated: gateway-status, canary (daemonset and certificate), canary-certificate, certificate-publisher (CA and certs), certificate (CA and default cert), DNS, and ingress. Several messages were moved from verbose to unconditional log.Info. No exported function or type signatures were changed.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Miciah · 2026-03-24T20:04:50Z

/assign

Miciah · 2026-03-24T20:19:24Z

We chose to keep NewDevelopmentConfig() (with INFO level) instead of NewProduction() to maintain the human-readable console logging format. While NewProduction() optimizes for machine parsing with JSON encoding and includes log sampling, console format is more appropriate [...]

This preserves the familiar log format (ISO8601 timestamps, uppercase levels, console encoding) while reducing log volume by filtering DEBUG messages.

It is interesting that NewProduction() uses a different log format. Out of curiosity, do you have a sample of the log output with NewProduction()? In any case, I agree with the choice to preserve the log format and just suppress the excessive log messages.

/approve
/lgtm

Please do make sure take a look at the CI artifacts once they are available, and compare the ingress-operator log output with the same in the CI artifacts for some pre-#1354 PR.

openshift-ci · 2026-03-24T20:20:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [Miciah]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gcs278 · 2026-03-24T21:22:51Z

Thanks @Miciah - Updated the PR description to also include that.

I'll took a look at the artifacts did a quick comparison - it looks like the debug logs are gone. And I tested locally with Sail Library to make sure it stops logging at the Debug level.

We do get some event recorder logs removed from controller runtime DEBUG logs too I noticed:

2026-03-24T16:24:37.196Z      DEBUG   operator.init.events    record/event.go:418     created canary certificate      {"type": "Normal", "object":
  {"kind":"Secret","namespace":"openshift-ingress-canary","name":"canary-serving-cert","uid":"5761d0e3-61c9-47aa-a561-9d1128c6b1a5","apiVersion":"v1","resourceVersion":"22909"}, "reason":
  "CreatedCanaryCertificate"}

Let me know if you think that's an issue, but my assumption is that it's DEBUG, and not intended for production visibility.

gcs278 · 2026-03-24T23:51:49Z

Infrastructure flake in creating a test pod...I wonder if we need to add a retry if creates are timing out:

=== RUN   TestAll/serial/TestCanaryRoute                                                                                                                                                      
    canary_test.go:47: Checking that the default ingresscontroller is ready...                                                                                                                
    canary_test.go:57: Getting the default ingresscontroller deployment...                                                                                                                    
    canary_test.go:63: Getting the canary route...                                                                                                                                            
    canary_test.go:86: Failed to create pod openshift-ingress-canary/canary-route-check: Internal error occurred: admission plugin "image.openshift.io/ImagePolicy" failed to complete

/test e2e-aws-operator

Miciah · 2026-03-25T03:29:06Z

We do get some event recorder logs removed from controller runtime DEBUG logs too I noticed:
2026-03-24T16:24:37.196Z      DEBUG   operator.init.events    record/event.go:418     created canary certificate      {"type": "Normal", "object":
  {"kind":"Secret","namespace":"openshift-ingress-canary","name":"canary-serving-cert","uid":"5761d0e3-61c9-47aa-a561-9d1128c6b1a5","apiVersion":"v1","resourceVersion":"22909"}, "reason":
  "CreatedCanaryCertificate"}
Let me know if you think that's an issue, but my assumption is that it's DEBUG, and not intended for production visibility.

Losing the "created canary certificate" log message isn't ideal. We should still get the event, but it is useful to get the log message as well.

From a quick look at the code, it appears that we are using the deprecated event broadcaster implementation, and it logs events with V(1).Info():

cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/recorder/recorder.go

Line 116 in 0b27052

    
           p.logger.V(1).Info(e.Message, "type", e.Type, "object", e.InvolvedObject, "reason", e.Reason)

In contrast, the new event broadcaster implementation just uses Info():

cluster-ingress-operator/vendor/k8s.io/client-go/tools/events/event_broadcaster.go

Line 338 in 0b27052

    
           logger.Info("Event occurred", "object", klog.KRef(event.Regarding.Namespace, event.Regarding.Name), "kind", event.Regarding.Kind, "apiVersion", event.Regarding.APIVersion, "type", event.Type, "reason", event.Reason, "action", event.Action, "note", event.Note)

So I think updating to the new event recorder might get us back the log message for the "created canary certificate" event. The following patch should suffice.

Details

diff --git a/pkg/operator/controller/canary-certificate/controller.go b/pkg/operator/controller/canary-certificate/controller.go
index 0f20eac15..7593d0f88 100644
--- a/pkg/operator/controller/canary-certificate/controller.go
+++ b/pkg/operator/controller/canary-certificate/controller.go
@@ -18,7 +18,7 @@ import (
 	"k8s.io/apimachinery/pkg/api/errors"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/types"
-	"k8s.io/client-go/tools/record"
+	"k8s.io/client-go/tools/events"
 
 	"sigs.k8s.io/controller-runtime/pkg/client"
 	"sigs.k8s.io/controller-runtime/pkg/controller"
@@ -48,7 +48,7 @@ type Config struct {
 type reconciler struct {
 	config   Config
 	client   client.Client
-	recorder record.EventRecorder
+	recorder events.EventRecorder
 }
 
 // New creates the canary certificate controller
@@ -61,7 +61,7 @@ func New(mgr manager.Manager, config Config) (controller.Controller, error) {
 	reconciler := &reconciler{
 		config:   config,
 		client:   mgr.GetClient(),
-		recorder: mgr.GetEventRecorderFor(canaryCertControllerName),
+		recorder: mgr.GetEventRecorder(canaryCertControllerName),
 	}
 	c, err := controller.New(canaryCertControllerName, mgr, controller.Options{Reconciler: reconciler})
 	if err != nil {
@@ -206,7 +206,7 @@ func (r *reconciler) createCanaryCertificate(ctx context.Context, certificate *c
 		return err
 	}
 
-	r.recorder.Event(certificate, "Normal", "CreatedCanaryCertificate", "created canary certificate")
+	r.recorder.Eventf(certificate, nil, corev1.EventTypeNormal, "Created", "CreatedCanaryCertificate", "Created canary certificate")
 	return nil
 }
 
@@ -220,7 +220,7 @@ func (r *reconciler) updateCanaryCertificate(ctx context.Context, current, desir
 	if err := r.client.Update(ctx, updated); err != nil {
 		return false, err
 	}
-	r.recorder.Event(updated, "Normal", "UpdatedCanaryCertificate", "updated canary certificate")
+	r.recorder.Eventf(updated, nil, corev1.EventTypeNormal, "Updated", "UpdatedCanaryCertificate", "updated canary certificate")
 	return true, nil
 }

@coderabbitai, is the above analysis correct?

coderabbitai · 2026-03-25T03:31:32Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the deprecated recorder implementation
cat vendor/sigs.k8s.io/controller-runtime/pkg/internal/recorder/recorder.go | grep -n "V(1)\|Info\|func.*Event" | head -40

# Check the new event broadcaster
cat vendor/k8s.io/client-go/tools/events/event_broadcaster.go | grep -n "Info\|func.*Event\|log\." | head -40

# Check the canary-certificate controller
cat pkg/operator/controller/canary-certificate/controller.go

# Check if mgr.GetEventRecorder exists (new API)
grep -rn "GetEventRecorder\b" vendor/sigs.k8s.io/controller-runtime/ | head -20

# Check if mgr.GetEventRecorderFor exists (old API)
grep -rn "GetEventRecorderFor" vendor/sigs.k8s.io/controller-runtime/ | head -20

# Check events.EventRecorder interface Eventf signature
grep -n "Eventf\|EventRecorder" vendor/k8s.io/client-go/tools/events/interfaces.go 2>/dev/null || \
grep -rn "type EventRecorder interface" vendor/k8s.io/client-go/tools/events/ | head -10

# Look at the EventRecorder interface definition
find vendor/k8s.io/client-go/tools/events/ -name "*.go" | xargs grep -n "type EventRecorder\|Eventf(" | head -20

Length of output: 9763

🏁 Script executed:

#!/bin/bash
# Fetch the deprecated recorder implementation from GitHub
curl -s "https://raw.githubusercontent.com/openshift/cluster-ingress-operator/0b27052d94b042d1f492788406a530087e943590/vendor/sigs.k8s.io/controller-runtime/pkg/internal/recorder/recorder.go" | grep -n "V(1)\|\.Info\|func.*Event\|log\." | head -40

echo "---NEW BROADCASTER---"
# Fetch the new event broadcaster from GitHub
curl -s "https://raw.githubusercontent.com/openshift/cluster-ingress-operator/0b27052d94b042d1f492788406a530087e943590/vendor/k8s.io/client-go/tools/events/event_broadcaster.go" | grep -n "\.Info\|log\.\|func.*Event" | head -40

echo "---MANAGER INTERFACE---"
# Check if GetEventRecorder (new) exists on the manager interface
curl -s "https://raw.githubusercontent.com/openshift/cluster-ingress-operator/0b27052d94b042d1f492788406a530087e943590/vendor/sigs.k8s.io/controller-runtime/pkg/manager/manager.go" | grep -n "GetEventRecorder\|EventRecorder" | head -20

echo "---EVENTS EVENTRECORDER INTERFACE---"
# Check events.EventRecorder interface definition
curl -s "https://raw.githubusercontent.com/openshift/cluster-ingress-operator/0b27052d94b042d1f492788406a530087e943590/vendor/k8s.io/client-go/tools/events/interfaces.go" | grep -n "Eventf\|EventRecorder\|type " | head -30

Length of output: 5182

🏁 Script executed:

#!/bin/bash
# Check manager interface more thoroughly
curl -s "https://raw.githubusercontent.com/openshift/cluster-ingress-operator/0b27052d94b042d1f492788406a530087e943590/vendor/sigs.k8s.io/controller-runtime/pkg/manager/manager.go" | grep -n "GetEvent\|EventRecorder\|Recorder" | head -30

echo "---MANAGER INTERFACE FULL---"
curl -s "https://raw.githubusercontent.com/openshift/cluster-ingress-operator/0b27052d94b042d1f492788406a530087e943590/vendor/sigs.k8s.io/controller-runtime/pkg/manager/manager.go" | grep -n "func\|interface\|Manager" | head -50

echo "---RECORDER PROVIDER FULL CONTEXT around line 115-135---"
curl -s "https://raw.githubusercontent.com/openshift/cluster-ingress-operator/0b27052d94b042d1f492788406a530087e943590/vendor/sigs.k8s.io/controller-runtime/pkg/internal/recorder/recorder.go" | sed -n '100,175p'

echo "---EVENTS EVENTRECORDER INTERNAL INTERFACE---"
# Check the internal events package that events.EventRecorder aliases
curl -s "https://raw.githubusercontent.com/openshift/cluster-ingress-operator/0b27052d94b042d1f492788406a530087e943590/vendor/k8s.io/client-go/tools/events/interfaces.go" | head -50

Length of output: 9148

@Miciah: The root cause analysis is correct — the deprecated record.EventRecorder path logs events at V(1), which is now suppressed under the new INFO default. However, the proposed fix has a subtle issue.

The new broadcaster in controller-runtime's Provider also logs at V(1), not plain Info().

Looking at vendor/sigs.k8s.io/controller-runtime/pkg/internal/recorder/recorder.go, the getBroadcaster() function sets up event watchers for both broadcasters:

// init deprecated broadcaster
p.deprecatedBroadcaster.StartEventWatcher(
    func(e *corev1.Event) {
        p.logger.V(1).Info(e.Message, "type", e.Type, "object", e.InvolvedObject, "reason", e.Reason)
    })

// init NEW broadcaster — also V(1)!
stopWatcher, err := p.broadcaster.StartEventWatcher(func(event runtime.Object) {
    e, isEvt := event.(*eventsv1.Event)
    if isEvt {
        p.logger.V(1).Info(e.Note, "type", e.Type, "object", e.Related, "action", e.Action, "reason", e.Reason)
    }
})

The event_broadcaster.go#L338 line you referenced (logger.Info(...)) is inside the StartLogging() method, but controller-runtime's Provider does not call StartLogging() — it uses StartEventWatcher() with its own V(1) callback for both the old and new broadcasters.

So switching from GetEventRecorderFor() to GetEventRecorder() would:

✅ Use the non-deprecated API (beneficial in itself)
❌ Still log the "created canary certificate" event message at V(1) — it would remain suppressed at the INFO default level

There is also a potential compilation issue to verify: mgr.GetEventRecorder() may not be exposed on the manager.Manager interface (the interface definition in manager.go showed no EventRecorder-related methods), whereas GetEventRecorderFor() is definitely available.

To actually restore the log message at INFO level, the simplest approach would be to add an explicit log.Info() call alongside the recorder.Event() call:

func (r *reconciler) createCanaryCertificate(ctx context.Context, certificate *corev1.Secret) error {
    if err := r.client.Create(ctx, certificate); err != nil {
        return err
    }
    r.recorder.Event(certificate, "Normal", "CreatedCanaryCertificate", "created canary certificate")
    log.Info("created canary certificate", "secret", certificate.Name)
    return nil
}

This keeps the Kubernetes event (for kubectl get events) and also logs the message at INFO level regardless of the broadcaster implementation.

gcs278 · 2026-03-25T13:34:34Z

/hold
While I try to understand how to preserve event logging.

Miciah · 2026-03-25T14:44:22Z

While I try to understand how to preserve event logging.

I'm fine with @coderabbitai's suggestion to add a log.Info accompanying the r.recorder.Event.

coderabbitai · 2026-03-25T14:48:31Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Add log.Info(...) after each r.recorder.Event(...) call in pkg/operator/controller/canary-certificate/controller.go to preserve visibility of certificate events at INFO level.

🏁 Script executed:

#!/bin/bash
# Read the canary-certificate controller to confirm logger usage and exact context
cat pkg/operator/controller/canary-certificate/controller.go

Length of output: 9281

🏁 Script executed:

#!/bin/bash
# Get the exact lines around recorder.Event calls
grep -n "recorder.Event\|log\.\|import\|\"log\"\|logf\|logr" pkg/operator/controller/canary-certificate/controller.go | head -40

Length of output: 446

@Miciah, @gcs278: Here's the concrete change to add log.Info alongside both r.recorder.Event calls:

diff --git a/pkg/operator/controller/canary-certificate/controller.go b/pkg/operator/controller/canary-certificate/controller.go
--- a/pkg/operator/controller/canary-certificate/controller.go
+++ b/pkg/operator/controller/canary-certificate/controller.go
@@ -207,6 +207,7 @@ func (r *reconciler) createCanaryCertificate(ctx context.Context, certificate *c
 	}
 
 	r.recorder.Event(certificate, "Normal", "CreatedCanaryCertificate", "created canary certificate")
+	log.Info("created canary certificate", "secret", certificate.Name)
 	return nil
 }
 
@@ -220,6 +221,7 @@ func (r *reconciler) updateCanaryCertificate(ctx context.Context, current, desir
 	}
 	r.recorder.Event(updated, "Normal", "UpdatedCanaryCertificate", "updated canary certificate")
+	log.Info("updated canary certificate", "secret", updated.Name)
 	return true, nil
 }

This uses the existing package-level log variable (already scoped to canaryCertControllerName) and follows the same structured-logging convention used elsewhere in the file. No new imports are needed.

📌 Commit the changes to this branch

coderabbitai

🧹 Nitpick comments (1)

pkg/operator/controller/ingress/controller.go (1)
341-342: Consider adding IngressController identity to the log message.

The log lacks context about which IngressController was admitted. Adding the name would improve log traceability, consistent with other log statements in this file (e.g., line 391, 414).
♻️ Suggested improvement
-		log.Info("ingresscontroller admitted")
+		log.Info("ingresscontroller admitted", "namespace", ingress.Namespace, "name", ingress.Name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/ingress/controller.go` around lines 341 - 342, The
log message "ingresscontroller admitted" should include the IngressController
identity for traceability: update the log.Info call that currently logs
admission (and keep r.recorder.Event as-is) to include the ingress name (e.g.,
use ingress.Name or a formatted identifier from the ingress object) so the
message matches other contextual logs in this file (see existing logs around the
admission path such as those at lines referencing the same ingress variable).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/operator/controller/ingress/controller.go`:
- Around line 341-342: The log message "ingresscontroller admitted" should
include the IngressController identity for traceability: update the log.Info
call that currently logs admission (and keep r.recorder.Event as-is) to include
the ingress name (e.g., use ingress.Name or a formatted identifier from the
ingress object) so the message matches other contextual logs in this file (see
existing logs around the admission path such as those at lines referencing the
same ingress variable).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 6521cccd-b77e-4373-9016-81fb31b76596

📥 Commits

Reviewing files that changed from the base of the PR and between 0b27052 and 34e6c81.

📒 Files selected for processing (8)

pkg/operator/controller/canary-certificate/controller.go
pkg/operator/controller/canary/daemonset.go
pkg/operator/controller/certificate-publisher/publish_ca.go
pkg/operator/controller/certificate-publisher/publish_certs.go
pkg/operator/controller/certificate/ca.go
pkg/operator/controller/certificate/default_cert.go
pkg/operator/controller/dns/controller.go
pkg/operator/controller/ingress/controller.go

✅ Files skipped from review due to trivial changes (2)

pkg/operator/controller/canary-certificate/controller.go
pkg/operator/controller/dns/controller.go

gcs278 · 2026-03-27T13:47:36Z

Hypershift failures
/retest

openshift-ci-robot · 2026-03-27T15:05:05Z

/retest-required

Remaining retests: 0 against base HEAD d16dd45 and 2 for PR HEAD 9fe57a5 in total

rhamini3 · 2026-03-27T19:03:06Z

/retest-required

openshift-ci-robot · 2026-03-27T20:04:14Z

/retest-required

Remaining retests: 0 against base HEAD dde7a93 and 1 for PR HEAD 9fe57a5 in total

gcs278 · 2026-03-29T22:45:43Z

Looks like we had one success recently for hypershift-e2e-aks. I think it's throttling related, so before things get crazy again:
/test hypershift-e2e-aks

gcs278 · 2026-03-30T02:42:30Z

Infra failures
/retest

openshift-ci-robot · 2026-03-30T06:04:43Z

/retest-required

Remaining retests: 0 against base HEAD 019903c and 0 for PR HEAD 9fe57a5 in total

openshift-ci-robot · 2026-03-30T09:04:56Z

/hold

Revision 9fe57a5 was retested 3 times: holding

gcs278 · 2026-03-30T13:07:41Z

e2e-aws-operator failed with:

=== NAME  TestAll/parallel/TestManagedDNSToUnmanagedDNSIngressController
    util_test.go:860: waiting for loadbalancer domain a95a49f4b86944c668c1ad577eabfe57-1701475338.us-east-1.elb.amazonaws.com to resolve...
    util_test.go:860: waiting for loadbalancer domain a95a49f4b86944c668c1ad577eabfe57-1701475338.us-east-1.elb.amazonaws.com to resolve...
    unmanaged_dns_test.go:146: loadbalancer domain a95a49f4b86944c668c1ad577eabfe57-1701475338.us-east-1.elb.amazonaws.com was unable to resolve: timed out waiting for the condition

Not related - it's a DNS resolution timeout. I'll keep an eye on these failures.

/test e2e-aws-operator

gcs278 · 2026-03-30T13:07:45Z

/unhold

openshift-ci-robot · 2026-03-30T18:30:15Z

/retest-required

Remaining retests: 0 against base HEAD d212fd1 and 2 for PR HEAD 9fe57a5 in total

gcs278 · 2026-03-30T19:00:04Z

install/infra failure
/test e2e-gcp-operator

gcs278 · 2026-03-30T19:00:43Z

The e2e-aws-ovn-hypershift-conformance pod resize failure is expected, and we have the OK to override:
/override ci/prow/e2e-aws-ovn-hypershift-conformance

openshift-ci · 2026-03-30T19:00:58Z

@gcs278: Overrode contexts on behalf of gcs278: ci/prow/e2e-aws-ovn-hypershift-conformance

Details

In response to this:

The e2e-aws-ovn-hypershift-conformance pod resize failure is expected, and we have the OK to override:
/override ci/prow/e2e-aws-ovn-hypershift-conformance

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

gcs278 · 2026-03-30T19:27:13Z

/hold

Going to wait for #1404 to merge. There's no dependency, but that one has priority, and if this merges first, we have to spin CI in #1404 again 😵

rikatz · 2026-03-30T23:55:56Z

/hold cancel
/retest-required

gcs278 · 2026-03-31T01:50:11Z

teardown failure
/test e2e-hypershift

lihongan · 2026-03-31T04:16:09Z

/retest-required

lihongan · 2026-03-31T08:40:18Z

/test e2e-aws-ovn-upgrade
/test e2e-gcp-operator
/test hypershift-e2e-aks

rikatz · 2026-03-31T09:18:28Z

/retest-required

lihongan · 2026-03-31T09:24:49Z

/test e2e-aws-ovn

The HAProxy router should pass the http2 tests failed

lihongan · 2026-03-31T11:28:36Z

/test e2e-aws-ovn-upgrade

openshift-ci · 2026-03-31T13:30:51Z

@gcs278: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot · 2026-03-31T13:31:45Z

@gcs278: Jira Issue Verification Checks: Jira Issue OCPBUGS-79467
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-79467 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Summary

Fixes excessive log volume by changing the cluster-ingress-operator's default log level from DEBUG to INFO.

Problem

The operator was using zap.NewDevelopment() which defaults to DEBUG level, causing ~2,000 debug logs per hour with Sail Library integration. This created significant noise in production log aggregation systems.

Changes

Changed logger configuration (pkg/log/log.go):

Use NewDevelopmentConfig() with explicit INFO level

Keeps console encoding for human-readable logs
config := zap.NewDevelopmentConfig()
config.Level = zap.NewAtomicLevelAt(zapcore.InfoLevel)
Promoted important debug logs to INFO (pkg/operator/controller/gateway-status/controller.go):

"no service was found for gateway" (V(1) → INFO)

"no dnsrecords found for gateway" (V(1) → INFO)

These messages are important for troubleshooting Gateway deployment issues

Added INFO logs alongside event recorder calls (7 controller files):

Controller-runtime echoes recorder.Event() calls to DEBUG logs

After switching to INFO level, these operational events became invisible in logs

Added explicit log.Info() calls for 16 events including certificate lifecycle, IngressController admission, and DNS warnings

Note: Events still exist as Kubernetes Events (visible via oc get events)

Implementation Approach

We chose to keep NewDevelopmentConfig() (with INFO level) instead of NewProduction() to maintain the human-readable console logging format.

NewProduction would use JSON encoding:
{"level":"info","ts":1774378939.054826,"logger":"operator.gatewayclass_controller","msg":"..."}
Our approach keeps console format:
2026-03-24T19:04:06.963Z  INFO  operator.gatewayclass_controller  msg...
Console format is more appropriate for OpenShift operators where cluster admins often debug via oc logs rather than log aggregation systems. This preserves familiar formatting while reducing log volume.

Impact

Before: ~2,000 debug logs/hour with Sail Library

After: Estimated <200 INFO+ logs/hour

Sail Library debug logs (drift detection, ownership checks) will no longer appear

Important operational messages remain visible

Log format unchanged - same human-readable console format as before

Testing

✅ Build verification: make build passed

✅ Unit tests: make test passed

✅ Code verification: make verify passed

Related

Fixes: OCPBUGS-79467

🤖 Generated with Claude Code via /jira:solve OCPBUGS-79467 origin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-merge-robot · 2026-04-01T17:34:13Z

Fix included in accepted release 4.22.0-0.nightly-2026-04-01-092906

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2026

openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Mar 24, 2026

openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Mar 24, 2026

gcs278 force-pushed the fix-OCPBUGS-79467 branch 3 times, most recently from 2691c56 to 0b27052 Compare March 24, 2026 19:13

gcs278 marked this pull request as ready for review March 24, 2026 19:21

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2026

openshift-ci bot requested review from frobware and rikatz March 24, 2026 19:21

openshift-ci bot assigned Miciah Mar 24, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 24, 2026

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 24, 2026

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2026

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 25, 2026

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2026

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2026

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2026

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2026

openshift-merge-bot bot merged commit 2f0bb7f into openshift:master Mar 31, 2026
19 checks passed

Conversation

gcs278 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Implementation Approach

Impact

Testing

Related

Uh oh!

openshift-ci bot commented Mar 24, 2026

Uh oh!

openshift-ci-robot commented Mar 24, 2026

Summary

Problem

Changes

Impact

Testing

Related

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

Miciah commented Mar 24, 2026

Uh oh!

Miciah commented Mar 24, 2026

Uh oh!

openshift-ci bot commented Mar 24, 2026

Uh oh!

gcs278 commented Mar 24, 2026

Uh oh!

gcs278 commented Mar 24, 2026

Uh oh!

Miciah commented Mar 25, 2026

Uh oh!

coderabbitai bot commented Mar 25, 2026

Uh oh!

gcs278 commented Mar 25, 2026

Uh oh!

Miciah commented Mar 25, 2026

Uh oh!

coderabbitai bot commented Mar 25, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gcs278 commented Mar 27, 2026

Uh oh!

openshift-ci-robot commented Mar 27, 2026

Uh oh!

rhamini3 commented Mar 27, 2026

Uh oh!

openshift-ci-robot commented Mar 27, 2026

Uh oh!

gcs278 commented Mar 29, 2026

Uh oh!

gcs278 commented Mar 30, 2026

Uh oh!

openshift-ci-robot commented Mar 30, 2026

Uh oh!

openshift-ci-robot commented Mar 30, 2026

Uh oh!

gcs278 commented Mar 30, 2026

Uh oh!

gcs278 commented Mar 30, 2026

Uh oh!

openshift-ci-robot commented Mar 30, 2026

Uh oh!

gcs278 commented Mar 30, 2026

Uh oh!

gcs278 commented Mar 30, 2026

Uh oh!

openshift-ci bot commented Mar 30, 2026

Uh oh!

gcs278 commented Mar 30, 2026

Uh oh!

rikatz commented Mar 30, 2026

Uh oh!

gcs278 commented Mar 24, 2026 •

edited

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading