Fix Kubernetes load balancing GOAWAY errors by buffering request body by rana · Pull Request #60695 · gravitational/teleport

rana · 2025-10-28T20:02:27Z

Kubernetes API servers send HTTP/2 GOAWAY errors to redistribute load across replicas for up to 2% of requests. Setting the request's GetBody function enables automatic retries.

For HTTP/2 requests, GetBody is set, and request bodies are incrementally buffered and accumulated as reads occur. In the case of GOAWAY errors occurring mid-send, body buffering is completed before closing. HTTP/1.1 protocol upgrades are not buffered, since they wouldn't receive HTTP/2 GOAWAY errors.

A weighted semaphore limits total concurrent buffering to prevent OOM. The default global memory limit is 500 MiB, and is adjusted with an environment variable. Each request body size is limited to a default of 50 MiB, and may be adjusted with an environment variable.

In this PR:

Added retryableTransport and retryBuffer enabling incremental request body buffering
Added a weighted semaphore limiting total concurrent buffer size to 500 MiB by default
Added a per-request buffer size limit with a 50 MiB default
Added tunable parameters RetryBufferTotal and RetryBufferPerRequest
Added environment variable TELEPORT_UNSTABLE_KUBE_RETRY_BUFFER_TOTAL
Added environment variable TELEPORT_UNSTABLE_KUBE_RETRY_BUFFER_PER_REQ
Added unit tests

Fixes:

Kube Agent should handle GOAWAY #57766

Changelog: Fixed intermittent connection errors when accessing Kubernetes clusters, particularly EKS 1.27+

Manual Testing

A test-load-balance app was written to reproduce the load balancing GOAWAY errors with Kubernetes.

Three manual tests were run and compared.

Test run 1 ran directly to Kubernetes without Teleport. Load balancing GOAWAY errors were seen.
Test run 2 ran through Teleport without the bug fix. Load balancing GOAWAY errors were seen.
Test run 3 ran through Teleport with the bug fix applied. No load balancing GOAWAY errors were seen.

Test Runs

Test Configuration	Total Ops	Failed	GOAWAY	Error Rate
Direct K8s	1000	5	5	0.50%
Teleport (no fix)	1000	6	6	0.60%
Teleport (with fix)	1000	0	0	0.00%

Latency Comparison

Metric	Baseline	Without Fix	With Fix	Delta
p50	199.87ms	199.81ms	199.78ms	-0.0%
p95	202.43ms	203.07ms	202.77ms	-0.1%
p99	208.68ms	206.76ms	205.99ms	-0.4%

Test app

A test-load-balance app exercises Kubernetes operations to display load balancing goaway-chance errors. It records statistics for display and test comparison.

package main

import (
	"bytes"
	"context"
	"encoding/json"
	"flag"
	"fmt"
	"net/http"
	"os"
	"path/filepath"
	"sort"
	"strings"
	"sync"
	"sync/atomic"
	"time"

	v1 "k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/client-go/kubernetes"
	"k8s.io/client-go/tools/clientcmd"
)

// TestResult captures load test metrics for GOAWAY error detection
type TestResult struct {
	TestName        string        `json:"test_name"`
	StartTime       time.Time     `json:"start_time"`
	EndTime         time.Time     `json:"end_time"`
	Duration        time.Duration `json:"duration"`
	TotalOperations int           `json:"total_operations"`
	SuccessfulOps   int           `json:"successful_ops"`
	FailedOps       int           `json:"failed_ops"`
	GoawayErrors    int           `json:"goaway_errors"`
	ErrorRate       float64       `json:"error_rate"`
	GoawayRate      float64       `json:"goaway_rate"`
	Workers         int           `json:"workers"`
	OpsPerWorker    int           `json:"ops_per_worker"`
	KubeContext     string        `json:"kube_context"`
	KubeVersion     string        `json:"kube_version"`
	APIServer       string        `json:"api_server"`
	LatencyP50      time.Duration `json:"latency_p50"`
	LatencyP95      time.Duration `json:"latency_p95"`
	LatencyP99      time.Duration `json:"latency_p99"`
	SampleErrors    []string      `json:"sample_errors,omitempty"`
}

func main() {
	if len(os.Args) < 2 {
		os.Args = append(os.Args, "run") // Default to 'run' command
	}

	switch os.Args[1] {
	case "run":
		runTest()
	case "compare":
		compareResults()
	case "help", "-h", "--help":
		printHelp()
	default:
		// Treat unknown command as 'run' with that arg as flag
		os.Args = append([]string{os.Args[0], "run"}, os.Args[1:]...)
		runTest()
	}
}

func printHelp() {
	fmt.Println(`test-load-balance - Kubernetes GOAWAY load testing tool

USAGE:
  test-load-balance run [flags]        Run a load test
  test-load-balance compare [flags]    Compare test results

RUN FLAGS:
  -name <string>           Test name (default: "test")
  -workers <int>           Concurrent workers (default: 20)
  -ops <int>              Operations per worker (default: 50)
  -namespace <string>      Kubernetes namespace (default: "teleport-test")
  -kubeconfig <string>     Path to kubeconfig
  -no-retry               Disable retry (exposes GOAWAY errors)
  -output <file.json>      Save results to JSON file
  -quiet                   Minimal output (progress only)

COMPARE FLAGS:
  -files <file1,file2,...> Comma-separated JSON result files
  -markdown                Output as markdown
  -summary                 Output summary table only

EXAMPLES:
  # Run test and save results
  test-load-balance run -name "baseline" -workers 10 -ops 100 -output baseline.json

  # Compare multiple results
  test-load-balance compare -files baseline.json,without-fix.json,with-fix.json -markdown

  # Generate summary table
  test-load-balance compare -files *.json -summary`)
}

func runTest() {
	fs := flag.NewFlagSet("run", flag.ExitOnError)
	var (
		workers      = fs.Int("workers", 20, "Number of concurrent workers")
		opsPerWorker = fs.Int("ops", 50, "Operations per worker")
		namespace    = fs.String("namespace", "teleport-test", "Kubernetes namespace")
		testName     = fs.String("name", "test", "Test run name")
		kubeconfig   = fs.String("kubeconfig", "", "Path to kubeconfig (default: $HOME/.kube/config)")
		noRetry      = fs.Bool("no-retry", false, "Disable automatic retry (exposes GOAWAY errors)")
		outputFile   = fs.String("output", "", "Save results to JSON file")
		quiet        = fs.Bool("quiet", false, "Minimal output")
	)
	fs.Parse(os.Args[2:])

	if *kubeconfig == "" {
		*kubeconfig = filepath.Join(os.Getenv("HOME"), ".kube", "config")
	}

	// Load kubeconfig
	config, err := clientcmd.LoadFromFile(*kubeconfig)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Error loading kubeconfig: %v\n", err)
		os.Exit(1)
	}

	restConfig, err := clientcmd.NewDefaultClientConfig(*config, &clientcmd.ConfigOverrides{}).ClientConfig()
	if err != nil {
		fmt.Fprintf(os.Stderr, "Error creating client config: %v\n", err)
		os.Exit(1)
	}

	// Increase QPS to avoid rate limiting
	restConfig.QPS = 100
	restConfig.Burst = 200

	// Disable retry if requested
	if *noRetry {
		restConfig.Wrap(func(rt http.RoundTripper) http.RoundTripper {
			return &noRetryTransport{inner: rt}
		})
	}

	clientset, err := kubernetes.NewForConfig(restConfig)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Error creating clientset: %v\n", err)
		os.Exit(1)
	}

	// Get server version
	versionInfo, err := clientset.Discovery().ServerVersion()
	if err != nil {
		fmt.Fprintf(os.Stderr, "Error getting server version: %v\n", err)
		os.Exit(1)
	}

	currentContext := config.CurrentContext

	result := TestResult{
		TestName:        *testName,
		StartTime:       time.Now(),
		Workers:         *workers,
		OpsPerWorker:    *opsPerWorker,
		TotalOperations: *workers * *opsPerWorker,
		KubeContext:     currentContext,
		KubeVersion:     versionInfo.String(),
		APIServer:       restConfig.Host,
		SampleErrors:    make([]string, 0),
	}

	if !*quiet {
		fmt.Printf("=== Kubernetes GOAWAY Load Test ===\n")
		fmt.Printf("Test Name:     %s\n", result.TestName)
		fmt.Printf("Context:       %s\n", result.KubeContext)
		fmt.Printf("API Server:    %s\n", result.APIServer)
		fmt.Printf("K8s Version:   %s\n", result.KubeVersion)
		fmt.Printf("Namespace:     %s\n", *namespace)
		fmt.Printf("Workers:       %d\n", result.Workers)
		fmt.Printf("Ops/Worker:    %d\n", result.OpsPerWorker)
		fmt.Printf("Total Ops:     %d\n", result.TotalOperations)
		fmt.Printf("Retry Mode:    %s\n", map[bool]string{
			true:  "DISABLED (exposes GOAWAY)",
			false: "ENABLED (hides GOAWAY)",
		}[*noRetry])
		if *outputFile != "" {
			fmt.Printf("Output File:   %s\n", *outputFile)
		}
		fmt.Printf("Started:       %s\n\n", result.StartTime.Format(time.RFC3339))
	}

	// Counters and latency tracking
	var (
		successCount atomic.Int64
		errorCount   atomic.Int64
		goawayCount  atomic.Int64
		errorsMu     sync.Mutex
		latencies    []time.Duration
		latenciesMu  sync.Mutex
	)

	// Worker function
	worker := func(workerID int) {
		ctx := context.Background()
		for i := 0; i < *opsPerWorker; i++ {
			opStart := time.Now()
			cmName := fmt.Sprintf("test-cm-w%d-i%d-%d", workerID, i, time.Now().UnixNano())

			// Create ConfigMap with 1KB data
			cm := &v1.ConfigMap{
				ObjectMeta: metav1.ObjectMeta{
					Name:      cmName,
					Namespace: *namespace,
				},
				Data: map[string]string{
					"worker":    fmt.Sprintf("%d", workerID),
					"iteration": fmt.Sprintf("%d", i),
					"data":      string(bytes.Repeat([]byte("x"), 1024)),
				},
			}

			_, err := clientset.CoreV1().ConfigMaps(*namespace).Create(ctx, cm, metav1.CreateOptions{})
			if err != nil {
				errorCount.Add(1)
				if isGoawayError(err) {
					goawayCount.Add(1)
					errorsMu.Lock()
					if len(result.SampleErrors) < 10 {
						result.SampleErrors = append(result.SampleErrors, err.Error())
					}
					errorsMu.Unlock()
				}
				continue
			}

			// Delete ConfigMap
			err = clientset.CoreV1().ConfigMaps(*namespace).Delete(ctx, cmName, metav1.DeleteOptions{})
			if err != nil {
				errorCount.Add(1)
				if isGoawayError(err) {
					goawayCount.Add(1)
				}
			} else {
				successCount.Add(1)
				// Record latency only for successful operations
				opLatency := time.Since(opStart)
				latenciesMu.Lock()
				latencies = append(latencies, opLatency)
				latenciesMu.Unlock()
			}

			// Progress indicator
			if !*quiet && (successCount.Load()+errorCount.Load())%100 == 0 {
				fmt.Printf("Progress: %d/%d (errors: %d, GOAWAY: %d)\n",
					successCount.Load()+errorCount.Load(),
					result.TotalOperations,
					errorCount.Load(),
					goawayCount.Load())
			}
		}
	}

	// Run workers
	var wg sync.WaitGroup
	for w := 0; w < *workers; w++ {
		wg.Add(1)
		go func(id int) {
			defer wg.Done()
			worker(id)
		}(w)
	}
	wg.Wait()

	// Calculate latency percentiles
	if len(latencies) > 0 {
		sort.Slice(latencies, func(i, j int) bool { return latencies[i] < latencies[j] })
		result.LatencyP50 = latencies[len(latencies)*50/100]
		result.LatencyP95 = latencies[len(latencies)*95/100]
		result.LatencyP99 = latencies[len(latencies)*99/100]
	}

	// Finalize results
	result.EndTime = time.Now()
	result.Duration = result.EndTime.Sub(result.StartTime)
	result.SuccessfulOps = int(successCount.Load())
	result.FailedOps = int(errorCount.Load())
	result.GoawayErrors = int(goawayCount.Load())
	result.ErrorRate = float64(result.FailedOps) / float64(result.TotalOperations) * 100
	result.GoawayRate = float64(result.GoawayErrors) / float64(result.TotalOperations) * 100

	// Display results
	if !*quiet {
		fmt.Printf("\n=== Results ===\n")
		fmt.Printf("Completed:        %s\n", result.EndTime.Format(time.RFC3339))
		fmt.Printf("Duration:         %s\n", result.Duration)
		fmt.Printf("Total Operations: %d\n", result.TotalOperations)
		fmt.Printf("Successful:       %d\n", result.SuccessfulOps)
		fmt.Printf("Failed:           %d\n", result.FailedOps)
		fmt.Printf("GOAWAY Errors:    %d\n", result.GoawayErrors)
		fmt.Printf("Error Rate:       %.2f%%\n", result.ErrorRate)
		fmt.Printf("GOAWAY Rate:      %.2f%%\n", result.GoawayRate)

		if len(latencies) > 0 {
			fmt.Printf("\nLatency Percentiles:\n")
			fmt.Printf("  p50: %s\n", formatDuration(result.LatencyP50))
			fmt.Printf("  p95: %s\n", formatDuration(result.LatencyP95))
			fmt.Printf("  p99: %s\n", formatDuration(result.LatencyP99))
		}

		if len(result.SampleErrors) > 0 {
			fmt.Printf("\nSample Errors:\n")
			for i, err := range result.SampleErrors {
				fmt.Printf("  %d. %s\n", i+1, err)
			}
		}
	}

	// Save to JSON if requested
	if *outputFile != "" {
		if err := saveJSON(&result, *outputFile); err != nil {
			fmt.Fprintf(os.Stderr, "Error saving JSON: %v\n", err)
			os.Exit(1)
		}
		if !*quiet {
			fmt.Printf("\nResults saved to: %s\n", *outputFile)
		}
	}

	// Exit with error if operations failed
	if result.FailedOps > 0 {
		os.Exit(1)
	}
}

func compareResults() {
	fs := flag.NewFlagSet("compare", flag.ExitOnError)
	var (
		filesStr = fs.String("files", "", "Comma-separated list of JSON result files")
		markdown = fs.Bool("markdown", false, "Output as markdown")
		summary  = fs.Bool("summary", false, "Output summary table only")
	)
	fs.Parse(os.Args[2:])

	if *filesStr == "" {
		fmt.Fprintf(os.Stderr, "Error: -files flag is required\n")
		os.Exit(1)
	}

	// Parse file list
	files := strings.Split(*filesStr, ",")
	for i, f := range files {
		files[i] = strings.TrimSpace(f)
	}

	// Load all results
	results := make([]*TestResult, 0, len(files))
	for _, file := range files {
		result, err := loadJSON(file)
		if err != nil {
			fmt.Fprintf(os.Stderr, "Error loading %s: %v\n", file, err)
			os.Exit(1)
		}
		results = append(results, result)
	}

	if *markdown {
		generateMarkdown(results, *summary)
	} else {
		generateTextComparison(results)
	}
}

func generateMarkdown(results []*TestResult, summaryOnly bool) {
	// Summary table
	fmt.Println("| Test Configuration | Total Ops | Failed | GOAWAY | Error Rate | p50 | p95 | p99 |")
	fmt.Println("|-------------------|-----------|--------|--------|------------|-----|-----|-----|")
	for _, r := range results {
		fmt.Printf("| %s | %d | %d | %d | %.2f%% | %s | %s | %s |\n",
			r.TestName,
			r.TotalOperations,
			r.FailedOps,
			r.GoawayErrors,
			r.ErrorRate,
			formatDuration(r.LatencyP50),
			formatDuration(r.LatencyP95),
			formatDuration(r.LatencyP99))
	}

	if summaryOnly {
		return
	}

	// Key findings
	fmt.Println()
	fmt.Println("## Key Findings")
	fmt.Println()

	// Find baseline, without-fix, with-fix
	var baseline, withoutFix, withFix *TestResult
	for _, r := range results {
		name := strings.ToLower(r.TestName)
		if strings.Contains(name, "baseline") || strings.Contains(name, "direct") {
			baseline = r
		} else if strings.Contains(name, "without") || strings.Contains(name, "no fix") {
			withoutFix = r
		} else if strings.Contains(name, "with") {
			withFix = r
		}
	}

	// Generate findings
	if withoutFix != nil && withFix != nil {
		errorReduction := withoutFix.FailedOps - withFix.FailedOps
		fmt.Printf("1. **Problem Reproduced**: %d GOAWAY errors (%.2f%% rate)\n",
			withoutFix.GoawayErrors, withoutFix.GoawayRate)
		fmt.Printf("2. **Fix Validated**: Error rate reduced from %.2f%% to %.2f%%\n",
			withoutFix.ErrorRate, withFix.ErrorRate)
		fmt.Printf("3. **Error Elimination**: %d errors eliminated (100%% improvement)\n",
			errorReduction)

		if withoutFix.LatencyP99 > 0 && withFix.LatencyP99 > 0 {
			latencyDelta := float64(withFix.LatencyP99-withoutFix.LatencyP99) * 100 / float64(withoutFix.LatencyP99)
			fmt.Printf("4. **Latency Impact**: p99 increased by %.1f%% (%s)\n",
				latencyDelta,
				formatDuration(withFix.LatencyP99-withoutFix.LatencyP99))
		}
		fmt.Printf("5. **Consistency**: Tested with %d total operations\n",
			withFix.TotalOperations)
	}

	// Latency comparison
	if baseline != nil && withoutFix != nil && withFix != nil {
		fmt.Println()
		fmt.Println("### Latency Comparison")
		fmt.Println()
		fmt.Println("| Metric | Baseline | Without Fix | With Fix | Delta |")
		fmt.Println("|--------|----------|-------------|----------|-------|")

		for _, metric := range []struct {
			name string
			fn   func(*TestResult) time.Duration
		}{
			{"p50", func(r *TestResult) time.Duration { return r.LatencyP50 }},
			{"p95", func(r *TestResult) time.Duration { return r.LatencyP95 }},
			{"p99", func(r *TestResult) time.Duration { return r.LatencyP99 }},
		} {
			baseVal := metric.fn(baseline)
			withoutVal := metric.fn(withoutFix)
			withVal := metric.fn(withFix)
			delta := float64(withVal-withoutVal) * 100 / float64(withoutVal)

			fmt.Printf("| %s | %s | %s | %s | %+.1f%% |\n",
				metric.name,
				formatDuration(baseVal),
				formatDuration(withoutVal),
				formatDuration(withVal),
				delta)
		}
	}
}

func generateTextComparison(results []*TestResult) {
	fmt.Println("=== Test Comparison ===")
	fmt.Println()

	for _, r := range results {
		fmt.Printf("Test: %s\n", r.TestName)
		fmt.Printf("  Total Operations: %d\n", r.TotalOperations)
		fmt.Printf("  Failed:           %d\n", r.FailedOps)
		fmt.Printf("  GOAWAY Errors:    %d\n", r.GoawayErrors)
		fmt.Printf("  Error Rate:       %.2f%%\n", r.ErrorRate)
		fmt.Printf("  Latency p50:      %s\n", formatDuration(r.LatencyP50))
		fmt.Printf("  Latency p95:      %s\n", formatDuration(r.LatencyP95))
		fmt.Printf("  Latency p99:      %s\n", formatDuration(r.LatencyP99))
		fmt.Println()
	}
}

func saveJSON(result *TestResult, filename string) error {
	data, err := json.MarshalIndent(result, "", "  ")
	if err != nil {
		return err
	}
	return os.WriteFile(filename, data, 0644)
}

func loadJSON(filename string) (*TestResult, error) {
	data, err := os.ReadFile(filename)
	if err != nil {
		return nil, err
	}
	var result TestResult
	if err := json.Unmarshal(data, &result); err != nil {
		return nil, err
	}
	return &result, nil
}

// noRetryTransport wraps a RoundTripper and clears GetBody to prevent
// automatic retry on GOAWAY, exposing the raw error.
type noRetryTransport struct {
	inner http.RoundTripper
}

func (t *noRetryTransport) RoundTrip(req *http.Request) (*http.Response, error) {
	req.GetBody = nil
	return t.inner.RoundTrip(req)
}

func isGoawayError(err error) bool {
	if err == nil {
		return false
	}
	errStr := err.Error()
	return strings.Contains(errStr, "cannot retry") ||
		strings.Contains(errStr, "GOAWAY") ||
		strings.Contains(errStr, "graceful shutdown") ||
		strings.Contains(errStr, "http2: server sent GOAWAY") ||
		strings.Contains(errStr, "http2: Transport") ||
		strings.Contains(errStr, "Request.Body was written")
}

func formatDuration(d time.Duration) string {
	if d == 0 {
		return "N/A"
	}
	if d < time.Millisecond {
		return fmt.Sprintf("%.0fµs", float64(d.Microseconds()))
	}
	if d < time.Second {
		return fmt.Sprintf("%.2fms", float64(d.Microseconds())/1000.0)
	}
	return fmt.Sprintf("%.2fs", d.Seconds())
}

Kind config

A kind configuration file sets the load balancing goaway-chance to the maximum of 2%. The actual chance per request is randomized with the 2% being the maximum probability.

# kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: load-balance
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: ClusterConfiguration
        apiServer:
          extraArgs:
            goaway-chance: "0.02"
            v: "4"

Teleport config

The Teleport config file used during testing.

# teleport.yaml
version: v3
teleport:
  log:
    severity: DEBUG
  nodename: teleport
  diag_addr: "127.0.0.1:3000"

auth_service:
  enabled: true
  cluster_name: "teleport-laptop"
  listen_addr: 0.0.0.0:3025

proxy_service:
  enabled: true
  listen_addr: 0.0.0.0:3023
  web_listen_addr: 0.0.0.0:3080
  public_addr: localhost:3080
  kube_listen_addr: 0.0.0.0:3026
  kube_public_addr: localhost:3026
  https_keypairs:
    - cert_file: /Users/rana.ian/src/wrk/cfg/crt/_wildcard.teleport-laptop+4.pem
      key_file: /Users/rana.ian/src/wrk/cfg/crt/_wildcard.teleport-laptop+4-key.pem

ssh_service:
  enabled: false

kubernetes_service:
  enabled: true
  listen_addr: localhost:3027
  kubeconfig_file: /Users/rana.ian/.kube/config

Teleport role config

A Teleport role config file granting Kubernetes administrator permissions which enable the test app to run.

# teleport-role-kube.yaml
kind: role
version: v8
metadata:
  name: kube-admin
spec:
  allow:
    # Match all Kubernetes clusters
    kubernetes_labels:
      "*": "*"

    # Map to Kubernetes RBAC
    kubernetes_groups:
      - system:masters   # K8s built-in admin group

    kubernetes_users:
      - rana.ian

    # What resources can be accessed at the Teleport level
    kubernetes_resources:
      - kind: "*"
        api_group: "*"
        namespace: "*"
        name: "*"
        verbs: ["*"]
      - kind: namespaces
        name: "*"
        verbs: ["*"]

Manual Testing

Environment: Macos, Kind, Kubernetes v1.34.0

1. Kind setup

The intent is to setup a local Kubernetes cluster with the load balancing option --goaway-chance=0.02 turned on.

A new Kubernetes cluster was created using kind and the kind.yaml configuration file.

> kind create cluster --config ~/src/wrk/cfg/kind.yaml
Creating cluster "load-balance" ...
 ✓ Ensuring node image (kindest/node:v1.34.0) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-load-balance"

Validated that Kubernetes is running with the --goaway-chance=0.02 flag.

> docker exec load-balance-control-plane \
  ps aux | grep kube-apiserver | grep goaway-chance
root         552 10.6  3.2 1443096 263676 ?      Ssl  20:17   0:03 kube-apiserver --advertise-address=172.19.0.2 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --goaway-chance=0.02 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --runtime-config= --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/16 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key --v=4

Exercised a Kubernetes operation for smoke testing.

> kubectl create namespace teleport-test
namespace/teleport-test created

2. Test run (no Teleport)

The intent is to see load balancing errors with the test-load-balance app and Kubernetes. This forms a baseline understanding that we can see GOAWAY load balancing errors without Teleport.

Ran the test-load-balance app. Load balancing GOAWAY errors are seen.

> ./test-load-balance run -name "Direct K8s" -workers 10 -ops 100 -no-retry -output baseline.json
=== Kubernetes GOAWAY Load Test ===
Test Name:     Direct K8s
Context:       kind-load-balance
API Server:    https://127.0.0.1:63755
K8s Version:   v1.34.0
Namespace:     teleport-test
Workers:       10
Ops/Worker:    100
Total Ops:     1000
Retry Mode:    DISABLED (exposes GOAWAY)
Output File:   baseline.json
Started:       2025-11-02T12:20:29-08:00

Progress: 100/1000 (errors: 5, GOAWAY: 5)
Progress: 200/1000 (errors: 5, GOAWAY: 5)
Progress: 300/1000 (errors: 5, GOAWAY: 5)
Progress: 400/1000 (errors: 5, GOAWAY: 5)
Progress: 500/1000 (errors: 5, GOAWAY: 5)
Progress: 600/1000 (errors: 5, GOAWAY: 5)
Progress: 700/1000 (errors: 5, GOAWAY: 5)
Progress: 800/1000 (errors: 5, GOAWAY: 5)
Progress: 900/1000 (errors: 5, GOAWAY: 5)
Progress: 1000/1000 (errors: 5, GOAWAY: 5)

=== Results ===
Completed:        2025-11-02T12:20:47-08:00
Duration:         17.956518208s
Total Operations: 1000
Successful:       995
Failed:           5
GOAWAY Errors:    5
Error Rate:       0.50%
GOAWAY Rate:      0.50%

Latency Percentiles:
  p50: 199.87ms
  p95: 202.43ms
  p99: 208.68ms

Sample Errors:
  1. Post "https://127.0.0.1:63755/api/v1/namespaces/teleport-test/configmaps": http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error
  2. Post "https://127.0.0.1:63755/api/v1/namespaces/teleport-test/configmaps": http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error
  3. Post "https://127.0.0.1:63755/api/v1/namespaces/teleport-test/configmaps": http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error

3. Test run (Teleport, no fix)

The intent is to see load balancing errors through Teleport, without the bug fix. We'll compare this test run with a run with the bug fix.

Teleport backend database was deleted.

sudo rm -rf /var/lib/teleport
sudo mkdir -p -m0700 /var/lib/teleport
sudo chown $USER /var/lib/teleport

Teleport was built and run with the latest master branch. No bug fix present.

> git switch master
Already on 'master'
Your branch is up to date with 'origin/master'.

> make clean && make full

Teleport started with the teleport.yaml config file, and is configured with auth + proxy + kube agent.
Looking at the Teleport terminal output, Kubernetes health checks show the local kind cluster kind-load-balance is healthy.

2025-11-02T12:26:58.414-08:00 INFO [KUBERNETE] Target became healthy target_name:kind-load-balance target_kind:kube_cluster target_origin: reason:threshold_reached message:1 health check passed healthcheck/worker.go:411

Configured Teleport by creating a Teleport role and user. Logged in, listed kube clusters, and logged into the kube cluster kind-load-balance.

> tctl create -f ~/src/wrk/cfg/teleport-role-kube.yaml
role "kube-admin" has been created

> tctl users add $(whoami) --roles=editor,access,kube-admin
User "rana.ian" has been created but requires a password. Share this URL with the user to complete user setup, link is valid for 1h:
https://localhost:3080/web/invite/76723dfd5982560be4bb552a14aa82c0

> tsh login --proxy=localhost:3080 --user=$(whoami) --auth=local
Enter password for Teleport user rana.ian:
Enter an OTP code from a device:
> Profile URL:        https://localhost:3080
  Logged in as:       rana.ian
  Cluster:            teleport-laptop
  Roles:              access, editor, kube-admin
  Kubernetes:         enabled
  Kubernetes users:   rana.ian
  Kubernetes groups:  system:masters
  Valid until:        2025-11-03 00:29:15 -0800 PST [valid for 11h59m]
  Extensions:         login-ip, permit-agent-forwarding, permit-port-forwarding, permit-pty, private-key-policy

> tsh kube ls
Kube Cluster Name Labels Selected
----------------- ------ --------
kind-load-balance

> tsh kube login kind-load-balance
Logged into Kubernetes cluster "kind-load-balance". Try 'kubectl version' to test the connection.

> kubectl version
Client Version: v1.34.1
Kustomize Version: v5.7.1
Server Version: v1.34.0

Ran the test-load-balance app. Load balancing GOAWAY errors are seen.

> ./test-load-balance run -name "Teleport (no fix)" -workers 10 -ops 100 -no-retry -output without-fix.json
=== Kubernetes GOAWAY Load Test ===
Test Name:     Teleport (no fix)
Context:       teleport-laptop-kind-load-balance
API Server:    https://localhost:3026
K8s Version:   v1.34.0
Namespace:     teleport-test
Workers:       10
Ops/Worker:    100
Total Ops:     1000
Retry Mode:    DISABLED (exposes GOAWAY)
Output File:   without-fix.json
Started:       2025-11-02T12:30:40-08:00

Progress: 100/1000 (errors: 6, GOAWAY: 6)
Progress: 200/1000 (errors: 6, GOAWAY: 6)
Progress: 300/1000 (errors: 6, GOAWAY: 6)
Progress: 400/1000 (errors: 6, GOAWAY: 6)
Progress: 500/1000 (errors: 6, GOAWAY: 6)
Progress: 600/1000 (errors: 6, GOAWAY: 6)
Progress: 700/1000 (errors: 6, GOAWAY: 6)
Progress: 800/1000 (errors: 6, GOAWAY: 6)
Progress: 900/1000 (errors: 6, GOAWAY: 6)
Progress: 1000/1000 (errors: 6, GOAWAY: 6)

=== Results ===
Completed:        2025-11-02T12:30:58-08:00
Duration:         17.947190042s
Total Operations: 1000
Successful:       994
Failed:           6
GOAWAY Errors:    6
Error Rate:       0.60%
GOAWAY Rate:      0.60%

Latency Percentiles:
  p50: 199.81ms
  p95: 203.07ms
  p99: 206.76ms

Sample Errors:
  1. http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error
  2. http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error
  3. http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error

Logged out and shutdown Teleport.

> tsh logout
Logged out all users from all proxies.

4. Test run (Teleport, fix applied)

The intent is to see no load balancing errors through Teleport when the bug fix is applied. The existing backend database is reused.

Teleport was built and run with the bug fix branch rana/kube-retryable-transport branch.

> git switch rana/kube-retryable-transport
Switched to branch 'rana/kube-retryable-transport'

> make clean && make full

Teleport started with the same teleport.yaml config file.
Looking at the Teleport terminal output, Kubernetes health checks show the local kind cluster kind-load-balance is healthy.

2025-11-02T12:38:14.758-08:00 INFO [KUBERNETE] Target became healthy target_name:kind-load-balance target_kind:kube_cluster target_origin: reason:threshold_reached message:1 health check passed healthcheck/worker.go:411

Logged in to Teleport, logged into the kube cluster, and validated the kube context.

> tsh login --proxy=localhost:3080 --user=$(whoami) --auth=local
Enter password for Teleport user rana.ian:
Enter an OTP code from a device:
> Profile URL:        https://localhost:3080
  Logged in as:       rana.ian
  Cluster:            teleport-laptop
  Roles:              access, editor, kube-admin
  Kubernetes:         enabled
  Kubernetes users:   rana.ian
  Kubernetes groups:  system:masters
  Valid until:        2025-11-03 00:39:15 -0800 PST [valid for 11h59m]
  Extensions:         login-ip, permit-agent-forwarding, permit-port-forwarding, permit-pty, private-key-policy

> tsh kube login kind-load-balance
Logged into Kubernetes cluster "kind-load-balance". Try 'kubectl version' to test the connection.

> kubectl config current-context
teleport-laptop-kind-load-balance

Ran the test-load-balance app 5 times. No load balancing GOAWAY errors were seen. The last run is shown.

> for i in {1..5}; do
  echo "Run $i/5..."
  ./test-load-balance run \
    -name "Teleport (with fix) - Run $i" \
    -workers 10 \
    -ops 100 \
    -no-retry \
    -output "with-fix-run${i}.json"
done
Run 1/5...
Run 2/5...
Run 3/5...
Run 4/5...
Run 5/5...
=== Kubernetes GOAWAY Load Test ===
Test Name:     Teleport (with fix) - Run 5
Context:       teleport-laptop-kind-load-balance
API Server:    https://localhost:3026
K8s Version:   v1.34.0
Namespace:     teleport-test
Workers:       10
Ops/Worker:    100
Total Ops:     1000
Retry Mode:    DISABLED (exposes GOAWAY)
Output File:   with-fix-run5.json
Started:       2025-11-02T12:45:05-08:00

Progress: 100/1000 (errors: 0, GOAWAY: 0)
Progress: 200/1000 (errors: 0, GOAWAY: 0)
Progress: 300/1000 (errors: 0, GOAWAY: 0)
Progress: 400/1000 (errors: 0, GOAWAY: 0)
Progress: 500/1000 (errors: 0, GOAWAY: 0)
Progress: 600/1000 (errors: 0, GOAWAY: 0)
Progress: 700/1000 (errors: 0, GOAWAY: 0)
Progress: 800/1000 (errors: 0, GOAWAY: 0)
Progress: 900/1000 (errors: 0, GOAWAY: 0)
Progress: 1000/1000 (errors: 0, GOAWAY: 0)

=== Results ===
Completed:        2025-11-02T12:45:23-08:00
Duration:         18.006742166s
Total Operations: 1000
Successful:       1000
Failed:           0
GOAWAY Errors:    0
Error Rate:       0.00%
GOAWAY Rate:      0.00%

Latency Percentiles:
  p50: 199.78ms
  p95: 202.77ms
  p99: 205.99ms

Results saved to: with-fix-run5.json

Compared test run results.

./test-load-balance compare \
  -files baseline.json,without-fix.json,with-fix-run5.json \
  -markdown > test-results.md

espadolini · 2025-10-30T17:47:41Z

How much disk space and memory can be consumed at any given time? Is this guaranteed to only ever happen in the agent, or will we also buffer things in the proxy?

Is the temporary path guaranteed to be ephemeral disk storage in the teleport-kube-agent deployments?

rana · 2025-10-31T06:55:47Z

How much disk space and memory can be consumed at any given time? Is this guaranteed to only ever happen in the agent, or will we also buffer things in the proxy?

Is the temporary path guaranteed to be ephemeral disk storage in the teleport-kube-agent deployments?

@espadolini Based on the our slack discussions, buffering to disk is removed, and a weighted semaphore for in memory buffering is added.

rosstimothy · 2025-11-03T19:10:45Z

+	// RetryBufferTotal
+	if f.RetryBufferTotal <= 0 {
+		if env := os.Getenv(envRetryBufferTotal); env != "" {
+			if val, err := strconv.ParseInt(env, 10, 64); err == nil && val > 0 {
+				f.RetryBufferTotal = val
+			}
+		}
+	}
+	if f.RetryBufferTotal <= 0 {
+		f.RetryBufferTotal = defaultRetryBufferTotal
+	}
+	// RetryBufferPerRequest
+	if f.RetryBufferPerRequest <= 0 {
+		if env := os.Getenv(envRetryBufferPerRequest); env != "" {
+			if val, err := strconv.ParseInt(env, 10, 64); err == nil && val > 0 {
+				f.RetryBufferPerRequest = val
+			}
+		}
+	}


In what scenarios would users need to alter these values? If we are allowing users to edit these fields, should we also allow them to explicitly opt out of this change?

RetryBufferTotal would be increased if requests are being blocked at a high rate. RetryBufferPerRequest would be increased if large payloads are not allowed to be transferred. That scenario seems highly unlikely though.

~~I'm in the process~~ I added allowing the feature to be disabled by setting RetryBufferTotal / TELEPORT_UNSTABLE_KUBE_RETRY_BUFFER_TOTAL to zero or negative.

GCP and Azure don't explicitly mention using the --goaway-chance flag after some searching. The issue may be specific to AWS, and disabling may be beneficial for some customers.

rosstimothy · 2025-11-03T19:28:46Z

Is it possible to detect that a Kubernetes cluster is configured with a 0% goaway chance and disable this feature for requests to it?

rana · 2025-11-03T22:05:58Z

Is it possible to detect that a Kubernetes cluster is configured with a 0% goaway chance and disable this feature for requests to it?

I don't think so. The Kubernetes --goaway-chance flag is only on the server. And when --goaway-chance is active, kube just sends a close header.

espadolini · 2025-11-03T23:05:01Z

+	// If the connection closed in the middle of sending
+	// due to a GOAWAY, read remaining data for a retry attempt.
+	remaining, readErr := io.ReadAll(rb.src)
+	if readErr != nil && !errors.Is(readErr, io.EOF) && rb.readErr == nil {
+		rb.readErr = readErr
+	}
+	if len(remaining) > 0 {
+		rb.buf.Write(remaining)
+	}
+
+	// Close source and mark as done.
+	err := rb.src.Close()
+	rb.src = nil
+	rb.cond.Broadcast()


We don't need to read the full request here, and we definitely must not not block Close on I/O from the original client while we buffer the request, especially because we might not retry at all, depending on why the body is getting closed before reaching the end. The second body can just read from the buffer until the end of the buffer, then read (and buffer) from the source.

espadolini · 2025-11-03T23:07:15Z

+				"error", err,
+			)
+		}
+		rt.semaphore.Release(req.ContentLength)


This is still happening while Body and GetBody exist and rb.buf is holding on to its buffer.

Changed to release in runtime finalizer.

Using a finalizer (which shouldn't be used because go 1.24+ has cleanups - and cleanups shouldn't be used either) just means that we are going to hold on to the buffer and the memory even while we're streaming the response back to the client - and an arbitrary amount of time after that up to forever, since finalizers and cleanups are outright not guaranteed to actually run.

To solve this and the full read on close problem we should probably have a background goroutine that reads from the original body and fills in the buffer and some refcounting of bodies to close the original body, drop the buffer and release the semaphore when all bodies are closed and the call to the inner RoundTrip returns.

(technically there should also be a guarantee that the original body is guaranteed not to be interacted with anymore after the response body is closed but that is extreme pedantry and we are the only users of this anyway)

By the way, this is literally the "output log and streaming" portion of the backend coding challenge, with cleanup at the end and with the constraint of having to use io.ReadClosers for readers.

espadolini

I don't think it's ok to have this enabled by default in v18, it can be a significant increase in memory usage depending on the workload.

espadolini · 2025-11-04T13:38:03Z

For those rare requests that hit the GOAWAY condition, can't we just return a 429 with Retry-After: 1 or a Retry-After with a date in the past or something along those lines, and have the client (which definitely has the full request in a buffer and definitely needs to handle 429 errors already, since it's a kube api client) retry the request instead?

…body Kubernetes API servers send HTTP/2 GOAWAY errors to redistribute load across replicas for up to 2% of requests. Setting the request's `GetBody` function enables automatic retries. For HTTP/2 requests, `GetBody` is set, and request bodies are incrementally buffered and accumulated as reads occur. In the case of GOAWAY errors occurring mid-send, body buffering is completed before closing. HTTP/1.1 protocol upgrades are not buffered, since they wouldn't receive HTTP/2 GOAWAY errors. A weighted semaphore limits total concurrent buffering to prevent OOM. The default global memory limit is 500 MiB, and is adjusted with an environment variable. Each request body size is limited to a default of 50 MiB, and may be adjusted with an environment variable. Changes: - Added `retryableTransport` and `retryBuffer` enabling incremental request body buffering - Added a weighted semaphore limiting total concurrent buffer size to 500 MiB by default - Added a per-request buffer size limit with a 50 MiB default - Added tunable parameters `RetryBufferTotal` and `RetryBufferPerRequest` - Added environment variable `TELEPORT_UNSTABLE_KUBE_RETRY_BUFFER_TOTAL` - Added environment variable `TELEPORT_UNSTABLE_KUBE_RETRY_BUFFER_PER_REQ` - Added unit tests Fixes #57766 Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>

This is an attempt to fix #57766. When a request is terminated because the upstream Kubernetes API Server GOAWAY chance is exceeded, clients are informed to retry by replying with a 429 status code and a Retry-After header. This deviates from the approaches taken in #57881 and #60695 to favor simplicity and avoid buffering request data in a teleport process. The downside to this approach is that it requires clients to properly handle retry requests.

This is an attempt to resolve #57766. When a request is terminated because the upstream Kubernetes API Server GOAWAY chance is exceeded, clients are informed to retry by replying with a 429 status code and a Retry-After header. This deviates from the approaches taken in #57881 and #60695 to favor simplicity and avoid buffering request data in a teleport process. The downside to this approach is that it requires clients to properly handle retry requests.

This is an attempt to address #57766. When a request is terminated because the upstream Kubernetes API Server GOAWAY chance is exceeded, clients are informed to retry by replying with a 429 status code and a Retry-After header. This deviates from the approaches taken in #57881 and #60695 to favor simplicity and avoid buffering request data in a teleport process. The downside to this approach is that it requires clients to properly handle retry requests.

* Kubernetes: Handle GOAWAY requests This is an attempt to address #57766. When a request is terminated because the upstream Kubernetes API Server GOAWAY chance is exceeded, clients are informed to retry by replying with a 429 status code and a Retry-After header. This deviates from the approaches taken in #57881 and #60695 to favor simplicity and avoid buffering request data in a teleport process. The downside to this approach is that it requires clients to properly handle retry requests. * Populate GOAWAY response body (#61264) Follow up to #61142 which sets the response body so that clients which only look at the reason and not the headers will behave appropriately.

rana added bug kubernetes-access labels Oct 28, 2025

rana marked this pull request as ready for review October 28, 2025 21:17

github-actions bot added the size/lg label Oct 28, 2025

github-actions bot requested review from avatus and cthach October 28, 2025 21:18

rana requested review from rosstimothy and tigrato October 28, 2025 21:19

tigrato reviewed Oct 28, 2025

View reviewed changes

Comment thread lib/kube/proxy/transport_retryable.go Outdated

Comment thread lib/kube/proxy/transport_retryable.go Outdated

rana force-pushed the rana/kube-retryable-transport branch 2 times, most recently from 84ecb8c to a47cc32 Compare October 29, 2025 03:25

rosstimothy reviewed Oct 29, 2025

View reviewed changes

Comment thread lib/kube/proxy/transport_retryable.go Outdated

Comment thread lib/kube/proxy/transport_retryable.go Outdated

Comment thread lib/kube/proxy/transport_retryable.go Outdated

rana requested review from Joerger, capnspacehook, rosstimothy and tigrato and removed request for avatus and cthach October 30, 2025 16:24

zmb3 reviewed Oct 30, 2025

View reviewed changes

Comment thread lib/kube/proxy/transport_retryable.go Outdated

rana force-pushed the rana/kube-retryable-transport branch from ce110e7 to 0eff6f5 Compare October 30, 2025 17:08

rana requested review from espadolini and zmb3 October 30, 2025 17:15

rosstimothy reviewed Oct 30, 2025

View reviewed changes

tigrato reviewed Oct 30, 2025

View reviewed changes

zmb3 reviewed Oct 30, 2025

View reviewed changes

Comment thread lib/kube/proxy/transport_retryable.go Outdated

rana force-pushed the rana/kube-retryable-transport branch from 4571de1 to 34fd919 Compare October 31, 2025 06:44

rana requested review from rosstimothy and tigrato October 31, 2025 07:04

espadolini reviewed Oct 31, 2025

View reviewed changes

rosstimothy requested a review from espadolini October 31, 2025 21:43

rana force-pushed the rana/kube-retryable-transport branch from b893409 to d34e1eb Compare November 2, 2025 22:35

rana changed the title ~~Fix Kubernetes GOAWAY retry errors by buffering request bodies~~ Fix Kubernetes load balancing GOAWAY errors by buffering request bodies Nov 3, 2025

rana changed the title ~~Fix Kubernetes load balancing GOAWAY errors by buffering request bodies~~ Fix Kubernetes load balancing GOAWAY errors by buffering request body Nov 3, 2025

rosstimothy reviewed Nov 3, 2025

View reviewed changes

Joerger approved these changes Nov 3, 2025

View reviewed changes

Comment thread lib/kube/proxy/forwarder.go Outdated

Comment thread lib/kube/proxy/transport.go Outdated

rosstimothy reviewed Nov 3, 2025

View reviewed changes

Comment thread lib/kube/proxy/transport_retryable.go Outdated

rana requested a review from rosstimothy November 3, 2025 22:33

espadolini reviewed Nov 3, 2025

View reviewed changes

rana requested a review from espadolini November 4, 2025 08:58

espadolini reviewed Nov 4, 2025

View reviewed changes

rana force-pushed the rana/kube-retryable-transport branch from 5f8d054 to bb0a216 Compare November 4, 2025 17:27

rosstimothy closed this Nov 6, 2025

rosstimothy mentioned this pull request Nov 7, 2025

Kubernetes: Handle GOAWAY requests #61142

Merged

8 tasks

Conversation

rana commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Manual Testing

Test Runs

Latency Comparison

1. Kind setup

2. Test run (no Teleport)

3. Test run (Teleport, no fix)

4. Test run (Teleport, fix applied)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

espadolini commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rana commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rosstimothy Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

rana Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rosstimothy commented Nov 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rana commented Nov 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

espadolini Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

espadolini Nov 3, 2025

Choose a reason for hiding this comment

rana commented Oct 28, 2025 •

edited

Loading

rana Nov 3, 2025 •

edited

Loading