-
Notifications
You must be signed in to change notification settings - Fork 638
feat: move the EPP build docker file #3555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
WalkthroughIntroduces a multi-stage Dockerfile for building/running the GAIE EPP binary, updates the GAIE build script to use GAIE paths and copy a Dockerfile artifact, and applies a patch adding Dynamo-based pre-request and KV scorer plugins plus a version bump to v0.5.1. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as Build Script (GAIE)
participant Git as Patch Set
participant Docker as Docker Builder
participant Img as GAIE EPP Image
CLI->>CLI: Prepare GAIE dirs (lib/include)
CLI->>CLI: Copy headers, libs, Dockerfile.epp
CLI->>Git: Apply epp-v0.5.1-dyn2.patch
alt Patch applied
CLI-->>CLI: Proceed
else Patch already applied / skipped
CLI-->>CLI: Continue without error
end
CLI->>Docker: docker build -f container/Dockerfile.epp
Docker->>Docker: Build stage (golang 1.24) compile epp
Docker->>Docker: Runtime stage (ubuntu) add deps, user
Docker-->>Img: Produce GAIE EPP image
sequenceDiagram
autonumber
participant Client
participant Gateway as Inference Gateway
participant PreReq as Pre-Request Plugin (dynamo_inject_workerid)
participant Router as Router
participant Scorer as KV Scorer (dynamo_kv_scorer)
participant Backend
Client->>Gateway: Request
Gateway->>PreReq: Execute pre-request
note right of PreReq: Inject worker ID from Dynamo
PreReq-->>Gateway: Mutated request
Gateway->>Router: Route candidates
Router->>Scorer: Score candidates (KV-aware)
note right of Scorer: Uses KV signals for scoring
Scorer-->>Router: Ranked candidates
Router->>Backend: Dispatch
Backend-->>Client: Response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch (2)
27-38
: Dockerfile name mismatch: Makefile expects Dockerfile.dynamo but repo/script provide Dockerfile.eppTarget uses “-f Dockerfile.dynamo” while the artifact is Dockerfile.epp. Builds will fail.
Options:
- Rename/copy the file as Dockerfile.dynamo, or
- Change the Makefile target to use container/Dockerfile.epp, e.g.:
- $(IMAGE_BUILD_CMD) -f Dockerfile.dynamo -t $(IMAGE_TAG) \ + $(IMAGE_BUILD_CMD) -f container/Dockerfile.epp -t $(IMAGE_TAG) \Keep build script and Makefile consistent.
803-806
: Per-request state stored on Server struct causes data races and cross-request leakageworkerIDHint/tokenDataHint live on Server and are mutated per request. Process can handle multiple streams concurrently; this invites races and wrong association across requests.
Refactor to store per-request state:
- Keep hints in a per-stream/request context struct and pass to handler phases, or
- Use gRPC stream context to carry values between header/body phases, or
- Echo via ext_proc dynamic metadata between phases.
Avoid shared mutable fields on Server for request-local data.Also applies to: 765-792
deploy/inference-gateway/build-epp-dynamo.sh (1)
86-96
: Patch path likely incorrectRepo path shows deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch but script uses epp-patches/v0.5.1-2/...
-PATCH_FILE="${DYNAMO_DIR}/deploy/inference-gateway/epp-patches/v0.5.1-2/epp-v0.5.1-dyn2.patch" +PATCH_FILE="${DYNAMO_DIR}/deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch"
🧹 Nitpick comments (3)
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch (2)
615-645
: Base64 token_data decode silently ignores malformed inputsNon-blocking, but consider emitting a debug log when decode/unmarshal fails to aid diagnosis.
172-174
: Nit: flag name in error messageMessage uses “model-server-metrics-scheme”; actual flag is “modelServerMetricsScheme”. Align for clarity.
- return fmt.Errorf("unexpected %q value for %q flag, it can only be set to 'http' or 'https'", *modelServerMetricsScheme, "model-server-metrics-scheme") + return fmt.Errorf("unexpected %q value for %q flag, it can only be set to 'http' or 'https'", *modelServerMetricsScheme, "modelServerMetricsScheme")deploy/inference-gateway/build-epp-dynamo.sh (1)
17-17
: Harden script error handlingAdd -u and pipefail to catch unset vars and pipeline failures.
-set -e # Exit on any error +set -euo pipefail # Exit on error, unset vars, and pipeline errors
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
container/Dockerfile.epp
(1 hunks)deploy/inference-gateway/build-epp-dynamo.sh
(3 hunks)deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (4)
container/Dockerfile.epp (1)
51-52
: Verify ldflags package path for CommitSHA/BuildRefThe path points to pkg/epp/metrics. The PR changes show metrics code under pkg/epp/backend/metrics. Ensure the target variables actually exist to avoid “symbol not found” or unused ldflags.
If the variables live elsewhere, update the path accordingly.
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch (2)
1306-1324
: KV router pipeline global: confirm thread-safety of C APIThe pipeline is a shared global used under RLock; multiple concurrent Score calls can hit dynamo_query_worker_selection_and_annotate. Verify the C pipeline is thread-safe; otherwise, serialize calls or create per-goroutine instances.
If not thread-safe, protect with a mutex around the C call or implement a worker pool.
Also applies to: 1476-1527
949-970
: Prefix indexer Get: LGTMDeep-copying the pod set under RLock avoids races. Good fix.
deploy/inference-gateway/build-epp-dynamo.sh (1)
37-48
: Validate the produced static lib nameScript copies libdynamo_llm_capi.a after building “-p libdynamo_llm”. Ensure the crate produces libdynamo_llm_capi.a (staticlib). If not, build the correct crate/package.
If needed:
-cargo build --release -p libdynamo_llm +cargo build --release -p libdynamo_llm_capiOr adjust the copied filename to match Cargo output.
Also applies to: 58-59
ENV GOOS=linux | ||
ENV GOARCH=amd64 | ||
# be explicit; helps cgo when linking libstdc++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coded GOARCH breaks multi-arch builds with CGO
GOARCH=amd64 forces amd64 even when building for arm64 via buildx. With CGO, this will fail or produce wrong-arch binaries.
Apply one of these fixes:
- Prefer: derive from BuildKit args.
+ARG TARGETOS
+ARG TARGETARCH
-ENV GOOS=linux
-ENV GOARCH=amd64
+ENV GOOS=${TARGETOS}
+ENV GOARCH=${TARGETARCH}
- Or simplest: remove both GOOS/GOARCH lines and let buildx select per-platform images (builder is linux already).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
ENV GOOS=linux | |
ENV GOARCH=amd64 | |
# be explicit; helps cgo when linking libstdc++ | |
ARG TARGETOS | |
ARG TARGETARCH | |
ENV GOOS=${TARGETOS} | |
ENV GOARCH=${TARGETARCH} | |
# be explicit; helps cgo when linking libstdc++ |
🤖 Prompt for AI Agents
In container/Dockerfile.epp around lines 15 to 17 the Dockerfile hard-sets ENV
GOOS=linux and ENV GOARCH=amd64 which forces amd64 output and breaks
multi-arch/CGO builds; either remove these ENV lines so buildx/BuildKit can pick
the platform automatically, or change them to derive from build-time args (use
ARG TARGETOS and ARG TARGETARCH, then set ENV GOOS=${TARGETOS} and ENV
GOARCH=${TARGETARCH}) so cross-builds produce correct-arch binaries.
if [[ ! -f "${GAIE_DIR}/Dockerfile.epp" ]]; then | ||
echo "Docker.epp file copy failed!" | ||
exit 1 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makefile/Dockerfile mismatch will break image build
- Script copies Dockerfile.epp; Makefile target uses Dockerfile.dynamo.
Option A (script-side fix):
-cp "${DYNAMO_DIR}/container/Dockerfile.epp" "${GAIE_DIR}"
+cp "${DYNAMO_DIR}/container/Dockerfile.epp" "${GAIE_DIR}/Dockerfile.dynamo"
@@
-echo "Building the custom EPP image for GAIE..."
+echo "Building the custom EPP image for GAIE..."
make dynamo-image-local-load
Option B: adjust Makefile to use container/Dockerfile.epp (see prior comment).
Ensure the existence check matches the chosen name.
Also applies to: 99-101
🤖 Prompt for AI Agents
In deploy/inference-gateway/build-epp-dynamo.sh around lines 72-75 and 99-101
the script checks for Dockerfile.epp while the Makefile target expects
Dockerfile.dynamo; make the names consistent by either (A) changing the script
to use Dockerfile.dynamo everywhere (update copy commands, existence checks and
error messages to Dockerfile.dynamo) or (B) update the Makefile to reference
container/Dockerfile.epp; ensure the chosen filename is used in all copy/exists
checks and error outputs so the image build cannot fail due to a naming
mismatch.
+ modelServerMetricsPath = flag.String("modelServerMetricsPath", "/metrics", "Path to scrape metrics from pods") | ||
+ modelServerMetricsScheme = flag.String("modelServerMetricsScheme", "http", "Scheme to scrape metrics from pods") | ||
+ modelServerMetricsHttpsInsecureSkipVerify = flag.Bool("modelServerMetricsHttpsInsecureSkipVerify", true, "When using 'https' scheme for 'modelServerMetricsScheme', configure 'InsecureSkipVerify' (default to true)") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting HTTPS metrics scraping to InsecureSkipVerify=true weakens security
Insecure by default is risky, even in-cluster. Default should be false; allow opting-in via flag/env.
- modelServerMetricsHttpsInsecureSkipVerify = flag.Bool("modelServerMetricsHttpsInsecureSkipVerify", true, "When using 'https' scheme for 'modelServerMetricsScheme', configure 'InsecureSkipVerify' (default to true)")
+ modelServerMetricsHttpsInsecureSkipVerify = flag.Bool("modelServerMetricsHttpsInsecureSkipVerify", false, "When using 'https' scheme for 'modelServerMetricsScheme', configure 'InsecureSkipVerify' (default false)")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
+ modelServerMetricsPath = flag.String("modelServerMetricsPath", "/metrics", "Path to scrape metrics from pods") | |
+ modelServerMetricsScheme = flag.String("modelServerMetricsScheme", "http", "Scheme to scrape metrics from pods") | |
+ modelServerMetricsHttpsInsecureSkipVerify = flag.Bool("modelServerMetricsHttpsInsecureSkipVerify", true, "When using 'https' scheme for 'modelServerMetricsScheme', configure 'InsecureSkipVerify' (default to true)") | |
modelServerMetricsPath = flag.String("modelServerMetricsPath", "/metrics", "Path to scrape metrics from pods") | |
modelServerMetricsScheme = flag.String("modelServerMetricsScheme", "http", "Scheme to scrape metrics from pods") | |
modelServerMetricsHttpsInsecureSkipVerify = flag.Bool("modelServerMetricsHttpsInsecureSkipVerify", false, "When using 'https' scheme for 'modelServerMetricsScheme', configure 'InsecureSkipVerify' (default false)") |
🤖 Prompt for AI Agents
In deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch
around lines 100 to 103, the flag modelServerMetricsHttpsInsecureSkipVerify is
defaulting to true which weakens security; change its default to false and keep
the flag so operators can opt-in explicitly, and update the flag description to
state that it defaults to false and should only be used when necessary, and
ensure any documentation or env-var mapping reflects the new default.
+ var metricsHttpClient *http.Client | ||
+ if *modelServerMetricsScheme == "https" { | ||
+ metricsHttpClient = &http.Client{ | ||
+ Transport: &http.Transport{ | ||
+ TLSClientConfig: &tls.Config{ | ||
+ InsecureSkipVerify: *modelServerMetricsHttpsInsecureSkipVerify, | ||
+ }, | ||
+ }, | ||
+ } | ||
+ } else { | ||
+ metricsHttpClient = http.DefaultClient | ||
+ } | ||
+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No HTTP client timeouts for metrics scraping
Using a client without timeouts risks hangs under network issues.
Add a sane Timeout (and optionally dial/read timeouts):
+ "time"
@@
- metricsHttpClient = &http.Client{
+ metricsHttpClient = &http.Client{
+ Timeout: 5 * time.Second,
Transport: &http.Transport{
TLSClientConfig: &tls.Config{
InsecureSkipVerify: *modelServerMetricsHttpsInsecureSkipVerify,
},
},
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
+ var metricsHttpClient *http.Client | |
+ if *modelServerMetricsScheme == "https" { | |
+ metricsHttpClient = &http.Client{ | |
+ Transport: &http.Transport{ | |
+ TLSClientConfig: &tls.Config{ | |
+ InsecureSkipVerify: *modelServerMetricsHttpsInsecureSkipVerify, | |
+ }, | |
+ }, | |
+ } | |
+ } else { | |
+ metricsHttpClient = http.DefaultClient | |
+ } | |
+ | |
// … elsewhere in the file, in your import block: | |
import ( | |
// … | |
"time" | |
) | |
// … | |
var metricsHttpClient *http.Client | |
if *modelServerMetricsScheme == "https" { | |
metricsHttpClient = &http.Client{ | |
Timeout: 5 * time.Second, | |
Transport: &http.Transport{ | |
TLSClientConfig: &tls.Config{ | |
InsecureSkipVerify: *modelServerMetricsHttpsInsecureSkipVerify, | |
}, | |
}, | |
} | |
} else { | |
metricsHttpClient = http.DefaultClient | |
} |
🤖 Prompt for AI Agents
In deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch
around lines 134 to 146, the metricsHttpClient is created without any timeouts
which can cause the process to hang on network issues; update the client
creation to set a sensible Timeout on the http.Client (e.g., a few seconds) and
configure Transport timeouts by supplying a net.Dialer with a Dial/Context
timeout and setting TLSHandshakeTimeout, ResponseHeaderTimeout, and
IdleConn/ExpectContinue timeouts on http.Transport so scraping requests fail
fast and resources are reclaimed promptly.
+ - --v | ||
+ - "{{ .Values.inferenceExtension.logVerbosity | default "3" }}" | ||
+ - --grpcPort | ||
- "9002" | ||
- -grpcHealthPort | ||
- "9003" | ||
- -metricsPort | ||
- "9090" | ||
+ - -configFile | ||
+ - "config/{{ .Values.inferenceExtension.pluginsConfigFile }}" | ||
# https://pkg.go.dev/flag#hdr-Command_line_flag_syntax; space is only for non-bool flags | ||
- - "-enablePprof={{ .Values.inferenceExtension.enablePprof }}" | ||
+ - "--enablePprof={{ .Values.inferenceExtension.enablePprof }}" | ||
+ - "--modelServerMetricsPath={{ .Values.inferenceExtension.modelServerMetricsPath }}" | ||
+ - "--modelServerMetricsScheme={{ .Values.inferenceExtension.modelServerMetricsScheme }}" | ||
+ - "--modelServerMetricsHttpsInsecureSkipVerify={{ .Values.inferenceExtension.modelServerMetricsHttpsInsecureSkipVerify }}" | ||
{{- if eq (.Values.inferencePool.modelServerType | default "vllm") "triton-tensorrt-llm" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go stdlib flag package doesn’t accept “--long” style; use single dash
Args use “--v”, “--enablePprof”, etc. The code uses the standard flag package, which expects single dash (“-v”, “-enablePprof=...”).
Update chart args back to single dash:
- - --v
+ - -v
- - "--enablePprof={{ .Values.inferenceExtension.enablePprof }}"
+ - "-enablePprof={{ .Values.inferenceExtension.enablePprof }}"
- - "--modelServerMetricsPath={{ .Values.inferenceExtension.modelServerMetricsPath }}"
+ - "-modelServerMetricsPath={{ .Values.inferenceExtension.modelServerMetricsPath }}"
- - "--modelServerMetricsScheme={{ .Values.inferenceExtension.modelServerMetricsScheme }}"
+ - "-modelServerMetricsScheme={{ .Values.inferenceExtension.modelServerMetricsScheme }}"
- - "--modelServerMetricsHttpsInsecureSkipVerify={{ .Values.inferenceExtension.modelServerMetricsHttpsInsecureSkipVerify }}"
+ - "-modelServerMetricsHttpsInsecureSkipVerify={{ .Values.inferenceExtension.modelServerMetricsHttpsInsecureSkipVerify }}"
🤖 Prompt for AI Agents
deploy/inference-gateway/epp-patches/epp-v0.5.1-2/epp-v0.5.1-dyn2.patch around
lines 367–383: the Helm chart args use GNU-style double-dash flags which the Go
stdlib flag package does not accept; change all double-dash flags back to
single-dash form (e.g. "--v" -> "-v", "--enablePprof=..." -> "-enablePprof=...",
"--configFile" -> "-configFile", and the three modelServerMetrics flags ->
"-modelServerMetricsPath=...", "-modelServerMetricsScheme=...",
"-modelServerMetricsHttpsInsecureSkipVerify=...") and ensure non-bool flags
supply their values using the "=" form (no space) so the Go flag parser reads
them correctly.
+ ModelServerMetricsPort int32 | ||
+ ModelServerMetricsPath string | ||
+ ModelServerMetricsScheme string | ||
+ | ||
+ Client *http.Client | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Nil client risk in PodMetricsClientImpl and missing default scheme
If Client is nil, p.Client.Do will panic. Also, empty scheme yields malformed URLs.
type PodMetricsClientImpl struct {
@@
- Client *http.Client
+ Client *http.Client
}
@@
- resp, err := p.Client.Do(req)
+ client := p.Client
+ if client == nil {
+ client = http.DefaultClient
+ }
+ resp, err := client.Do(req)
@@
- return fmt.Sprintf("%s://%s:%d%s", p.ModelServerMetricsScheme, pod.Address, p.ModelServerMetricsPort, p.ModelServerMetricsPath)
+ scheme := p.ModelServerMetricsScheme
+ if scheme == "" {
+ scheme = "http"
+ }
+ return fmt.Sprintf("%s://%s:%d%s", scheme, pod.Address, p.ModelServerMetricsPort, p.ModelServerMetricsPath)
Search other initializations to ensure Client is always set.
Also applies to: 834-835
🏁 Script executed:
#!/bin/bash
rg -nP 'PodMetricsClientImpl\s*{[^}]*}' -C2
Length of output: 634
Enforce non-nil HTTP client and default scheme in PodMetricsClientImpl
- Use
http.DefaultClient
whenp.Client
is nil to avoid panics inPodMetricsClientImpl.Do
- Default empty
ModelServerMetricsScheme
to"http"
to prevent malformed URLs
@@ PodMetricsClientImpl.Do(req)
- resp, err := p.Client.Do(req)
+ client := p.Client
+ if client == nil {
+ client = http.DefaultClient
+ }
+ resp, err := client.Do(req)
@@ URL formatting in PodMetricsClientImpl
- return fmt.Sprintf("%s://%s:%d%s", p.ModelServerMetricsScheme, pod.Address, p.ModelServerMetricsPort, p.ModelServerMetricsPath)
+ scheme := p.ModelServerMetricsScheme
+ if scheme == "" {
+ scheme = "http"
+ }
+ return fmt.Sprintf("%s://%s:%d%s", scheme, pod.Address, p.ModelServerMetricsPort, p.ModelServerMetricsPath)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
+ ModelServerMetricsPort int32 | |
+ ModelServerMetricsPath string | |
+ ModelServerMetricsScheme string | |
+ | |
+ Client *http.Client | |
} | |
// In PodMetricsClientImpl.Do: | |
func (p *PodMetricsClientImpl) Do(req *http.Request) (*http.Response, error) { | |
// enforce non-nil HTTP client | |
client := p.Client | |
if client == nil { | |
client = http.DefaultClient | |
} | |
resp, err := client.Do(req) | |
return resp, err | |
} | |
// Wherever you format the metrics URL (e.g. in PodMetricsClientImpl): | |
func (p *PodMetricsClientImpl) buildMetricsURL(pod Pod) string { | |
// default empty scheme to "http" | |
scheme := p.ModelServerMetricsScheme | |
if scheme == "" { | |
scheme = "http" | |
} | |
return fmt.Sprintf( | |
"%s://%s:%d%s", | |
scheme, | |
pod.Address, | |
p.ModelServerMetricsPort, | |
p.ModelServerMetricsPath, | |
) | |
} |
return &MaxScorePicker{ | ||
typedName: plugins.TypedName{Type: MaxScorePickerType, Name: MaxScorePickerType}, | ||
maxNumOfEndpoints: maxNumOfEndpoints, | ||
+ randomGenerator: rand.New(rand.NewSource(time.Now().UnixNano())), | ||
} | ||
} | ||
|
||
// MaxScorePicker picks pod(s) with the maximum score from the list of candidates. | ||
type MaxScorePicker struct { | ||
typedName plugins.TypedName | ||
- maxNumOfEndpoints int // maximum number of endpoints to pick | ||
+ maxNumOfEndpoints int // maximum number of endpoints to pick | ||
+ randomGenerator *rand.Rand // randomGenerator for randomly pick endpoint on tie-break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rand.Rand is not concurrency-safe; Pickers may be called in parallel
Replacing rand.Shuffle (package-level, locked) with p.randomGenerator.Shuffle introduces data races if Pick is invoked concurrently.
Use the package-level RNG or guard with a mutex:
- p.randomGenerator.Shuffle(len(scoredPods), func(i, j int) {
+ rand.Shuffle(len(scoredPods), func(i, j int) {
scoredPods[i], scoredPods[j] = scoredPods[j], scoredPods[i]
})
Do similarly in RandomPicker.
Also applies to: 1124-1129
Overview:
DEP-504
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Chores