feat: add AIPerf Kubernetes Deployment Enhancement doc #44

ajcasagrande · 2025-10-01T19:53:40Z

Summary

This proposal outlines the enhancement of AIPerf to support distributed deployment on Kubernetes clusters. The enhancement enables AIPerf to generate significantly higher concurrent loads by distributing work across multiple pods in a Kubernetes cluster, overcoming single-node performance limitations. The solution adopts a true per-service pod architecture where each AIPerf service runs in its own dedicated pod, enabling independent scaling and resource allocation.

AIPerf currently supports only single-node multiprocess deployment. This enhancement proposes implementing the existing KubernetesServiceManager stub to enable distributed deployment while maintaining full compatibility with existing service management patterns, ZMQ communication protocols, and configuration systems.

hhzhang16 · 2025-10-01T20:57:54Z

deps/AIP-0002-kubernetes-deployment.md

+#    - 200 worker pods (100K / 500 connections per worker)
+#    - 50 record processor pods


Are these going to be the default numbers?

It will scale based on the given concurrency. However its a little TBD. The final goal was to have something based on CPU usage, but for first implementation it will be formula based. It can however always be manually set like it is now using cli args or potentially env.

hhzhang16 · 2025-10-01T21:02:23Z

deps/AIP-0002-kubernetes-deployment.md

+# 4. Runs benchmark for 5 minutes
+# 5. Retrieves results to local ./artifacts/ directory
+# 6. Cleans up all Kubernetes resources
+# 7. Displays metrics summary in terminal


Will the information be stored locally in an output-dir or just printed in terminal?

Yes, it will be saved to file locally in artifacts dir as well. Metrics summaries such as for the csv and json are already published via ZMQ, and larger output files we plan to implement a file retrieval system.

How does it retrieve the result?

hhzhang16 · 2025-10-01T21:02:25Z

deps/AIP-0002-kubernetes-deployment.md

+# Use custom namespace (won't auto-delete)
+aiperf profile --kubernetes --kubernetes-namespace my-benchmark ...


What if my-benchmark doesn't yet exist? Will this create it?

hmm, good question. I think in this case we should create it, and then auto-remove it? thoughts?

That makes sense to me, and with a log notifying the user that that's the case -- the namespace they specified doesn't yet exist, so you'll create it for this aiperf run but will clean it up after

hhzhang16 · 2025-10-01T21:02:27Z

deps/AIP-0002-kubernetes-deployment.md

+  --streaming \
+  -u http://my-llm-service.default.svc.cluster.local:8080 \
+  -m my-llm-model \
+  --concurrency 100000 \


Will this support multiple concurrencies? Will it support SLAs/isl/osl?

The current plan is to just be a 1:1 implementation of the existing aiperf features, but run on kubernetes instead of multi-processing. Additional features may come later. Are you referring to sweeps, or any of the work you have been doing? Ideally we would like to find a way to integrate things once we have baseline support going.

One thing that I have in my sweeps is a concurrency loop, would be cool to have it inherently in AIPerf! But yeah that can be a later thing

I think one question here is whether as we add those other features, will they "just work" in the distributed setting, or will each feature require to be designed for distributed too? The latter would be unfortunate.

@itay Yes everything should "just work" as even single-node aiperf uses this exact same architecture already but just with python multiprocessing instead of pods. The only things that would need custom work would be things that introduce something special like an output file that would need to be retrieved (but could still utilize the base k8s implementation for results gathering)

Thanks - that's helpful. I think that should be taken as a key design principle for AIPerf, so that we don't get into sticky situations.

itay · 2025-10-10T05:06:34Z

deps/AIP-0002-kubernetes-deployment.md

+- Scale horizontally based on target concurrency requirements (1 to N replicas)
+- Each pod can handle concurrent connections up to the configured `AIPERF_HTTP_CONNECTION_LIMIT`
+
+#### Record Processor (Scalable Pods)


Should this be a sidecar to the worker pods, so that the scaling is the same?

Record processors are designed as separate entities in the single-node instance to prevent resource starvation from the workers who are doing time sensitive work, and the spike in CPU utilization from the tokenization of the results will potentially cause jitter. Record processors are less time critical as they can take their time working their way through the zmq queue. Right now the recommendation is 4 workers to 1 record processor, but that could change based on testing.

The single-node AIPerf already handles this 4:1 approach and in theory it will just be able to work the same way on k8s. If needed we can also consider using Pod Affinity to co-locate them.

They can have dedicated capacity even as sidecars, as requests/limits are per container.

itay · 2025-10-10T05:26:27Z

deps/AIP-0002-kubernetes-deployment.md

+### Network Configuration
+- System controller pod exposes ZMQ proxy ports via Kubernetes services
+- All service pods connect to system controller services using Kubernetes DNS
+- Each singleton service pod exposes its own service endpoint for direct communication


What does this mean?

System Controller Service (aiperf-system-controller)

Exposes all ZMQ proxy ports:

5661-5666 (dataset, event bus, raw inference proxies)

Timing Manager Service (timing-manager)

Exposes:

5562 (credit_drop - PUSH socket that it BINDS)

5563 (credit_return - PULL socket that it BINDS)

Records Manager Service (records-manager)

Exposes:

5557 (records - PULL socket that it BINDS)

itay · 2025-10-10T05:27:34Z

deps/AIP-0002-kubernetes-deployment.md

+  --streaming \
+  -u http://my-llm-service.default.svc.cluster.local:8080 \
+  -m my-llm-model \
+  --concurrency 100000 \


I think one question here is whether as we add those other features, will they "just work" in the distributed setting, or will each feature require to be designed for distributed too? The latter would be unfortunate.

itay · 2025-10-10T05:28:08Z

deps/AIP-0002-kubernetes-deployment.md

+```bash
+# Run 100K concurrent connections against inference service
+aiperf profile \
+  --kubernetes \


How does it know which Kubernetes to talk to?

Good question. I think I should move up the explanation of that. Its hidden in the advanced features. By default it will use your local kubeconfig file, or you can sepcify a custom one:

# Use custom kubeconfig (defaults to ~/.kube/config) aiperf profile --kubernetes --kubeconfig ~/.kube/prod-cluster ...

itay · 2025-10-10T05:30:32Z

deps/AIP-0002-kubernetes-deployment.md

+# 4. Runs benchmark for 5 minutes
+# 5. Retrieves results to local ./artifacts/ directory
+# 6. Cleans up all Kubernetes resources
+# 7. Displays metrics summary in terminal


How does it retrieve the result?

itay · 2025-10-10T05:49:44Z

deps/AIP-0002-kubernetes-deployment.md

+
+## Artifact and Export File Retrieval
+
+AIPerf generates output files including metrics exports (JSON, CSV) and logs that users need to access after benchmark completion. In the Kubernetes deployment, these files are generated by the Records Manager pod and must be retrieved to the user's local filesystem via the Kubernetes Python API.


Same here re: not using the Kubernetes API to move files around.

Gotcha. I think an HTTP API implementation would be ideal. I can add that in. Thanks!

itay · 2025-10-10T05:51:00Z

deps/AIP-0002-kubernetes-deployment.md

+
+**Reason Rejected:**
+
+* Per-service pods provide maximum flexibility and isolation


Flexibility is not free.

itay · 2025-10-10T05:52:43Z

deps/AIP-0002-kubernetes-deployment.md

+**Reason Rejected:**
+
+* Per-service pods provide maximum flexibility and isolation
+* Resource requirements for singleton services are sufficiently small that co-location benefits are minimal


Your service pods amount to 7 CPUs and 8GB of RAM. Even if you co-locate a lot of pods, you're looking at a minimum of a c7g.2xl just to barely fit.

itay · 2025-10-10T05:53:15Z

deps/AIP-0002-kubernetes-deployment.md

+* Limited real-time control and monitoring during benchmark execution
+* Difficult to implement dynamic scaling based on runtime metrics
+* Reduced flexibility for interactive benchmark sessions
+* May not support complex coordination patterns required by AIPerf


What's an example?

itay · 2025-10-10T05:53:53Z

deps/AIP-0002-kubernetes-deployment.md

+└──────────────────────────────────────────────────────────────────┘
+```
+
+# Alternate Solutions


What about using an Operator and a custom resource to define a AIPerfJob?

@itay this does sounds like a valid option. I will investigate this. Do you recommend any open source operators we can leverage for this, or does dynamo-runtime have/use one we can leverage? I found this one: https://github.com/nolar/kopf not sure if you have experience with it.

Ignoring the specific technical implementation of the Operator, it might be worthwhile sketching out the entire user flow/journey and see whether we think it's worthwhile to pursue.

ajcasagrande · 2025-10-10T19:38:13Z

deps/AIP-0002-kubernetes-deployment.md

+### Service Exposure
+The system controller pod exposes ZMQ proxy endpoints via Kubernetes ClusterIP service:


note, I need to add the network service pieces for the records manager and timing manager exposed ports.

feat: add AIPerf Kubernetes Deployment Enhancement doc

0af03dd

ajcasagrande requested review from biswapanda, ganeshku1, hhzhang16, hutm, matthewkotila, nicolasnoble, nnshah1 and the-david-oy October 1, 2025 19:55

hhzhang16 reviewed Oct 1, 2025

View reviewed changes

ajcasagrande self-assigned this Oct 2, 2025

itay reviewed Oct 10, 2025

View reviewed changes

ajcasagrande commented Oct 10, 2025

View reviewed changes

		# - 200 worker pods (100K / 500 connections per worker)
		# - 50 record processor pods

		# Use custom namespace (won't auto-delete)
		aiperf profile --kubernetes --kubernetes-namespace my-benchmark ...


		## Artifact and Export File Retrieval

		AIPerf generates output files including metrics exports (JSON, CSV) and logs that users need to access after benchmark completion. In the Kubernetes deployment, these files are generated by the Records Manager pod and must be retrieved to the user's local filesystem via the Kubernetes Python API.


		Reason Rejected:

		* Per-service pods provide maximum flexibility and isolation

		### Service Exposure
		The system controller pod exposes ZMQ proxy endpoints via Kubernetes ClusterIP service:

feat: add AIPerf Kubernetes Deployment Enhancement doc #44

Are you sure you want to change the base?

feat: add AIPerf Kubernetes Deployment Enhancement doc #44

Uh oh!

Conversation

ajcasagrande commented Oct 1, 2025

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants