From 1a241df6157a3385d0a5b71efcb0337c38b61e0c Mon Sep 17 00:00:00 2001 From: atchernych Date: Fri, 1 Aug 2025 17:54:17 -0700 Subject: [PATCH 1/2] docs: Dyn 591 (#2247) Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com> Co-authored-by: Anish <80174047+athreesh@users.noreply.github.com> --- components/backends/trtllm/deploy/README.md | 1 + components/backends/vllm/deploy/README.md | 1 + docs/examples/README.md | 9 ++- docs/guides/dynamo_deploy/README.md | 81 +++++++++++++++++++-- 4 files changed, 85 insertions(+), 7 deletions(-) create mode 100644 components/backends/trtllm/deploy/README.md create mode 100644 components/backends/vllm/deploy/README.md diff --git a/components/backends/trtllm/deploy/README.md b/components/backends/trtllm/deploy/README.md new file mode 100644 index 0000000000..1829d46c6a --- /dev/null +++ b/components/backends/trtllm/deploy/README.md @@ -0,0 +1 @@ +This folder contains deployment examples for the TRTLLM inference backend. \ No newline at end of file diff --git a/components/backends/vllm/deploy/README.md b/components/backends/vllm/deploy/README.md new file mode 100644 index 0000000000..5d7b0e2db5 --- /dev/null +++ b/components/backends/vllm/deploy/README.md @@ -0,0 +1 @@ +This folder contains examples for the VLLM inference backend. \ No newline at end of file diff --git a/docs/examples/README.md b/docs/examples/README.md index f9e22535d8..560360cd62 100644 --- a/docs/examples/README.md +++ b/docs/examples/README.md @@ -40,9 +40,14 @@ kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE} You can use `kubectl get dynamoGraphDeployment -n ${NAMESPACE}` to view your deployment. You can use `kubectl delete dynamoGraphDeployment -n ${NAMESPACE}` to delete the deployment. -We provide a Custom Resource yaml file for many examples under the `deploy/` folder. -Use [VLLM YAML](../../components/backends/vllm/deploy/agg.yaml) for an example. +We provide a Custom Resource yaml file for many examples under the `components/backends//deploy/`folder. +Consult the examples below for the CRs for your specific inference backend. +[View SGLang k8s](../../components/backends/sglang/deploy/README.md) + +[View vLLM K8s](../../components/backends/vllm/deploy/README.md) + +[View TRTLLM k8s](../../components/backends/trtllm/deploy/README.md) **Note 1** Example Image diff --git a/docs/guides/dynamo_deploy/README.md b/docs/guides/dynamo_deploy/README.md index 44155e46ca..4285867bf9 100644 --- a/docs/guides/dynamo_deploy/README.md +++ b/docs/guides/dynamo_deploy/README.md @@ -17,16 +17,87 @@ limitations under the License. # Deploying Inference Graphs to Kubernetes -We expect users to deploy their inference graphs using CRDs or helm charts. + We expect users to deploy their inference graphs using CRDs or helm charts. + +# 1. Install Dynamo Cloud. + +Prior to deploying an inference graph the user should deploy the Dynamo Cloud Platform. Reference the [Quickstart Guide](quickstart.md) for steps to install Dynamo Cloud with Helm. -Prior to deploying an inference graph the user should deploy the Dynamo Cloud Platform. Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you. This is a one-time action, only necessary the first time you deploy a DynamoGraph. +# 2. Deploy your inference graph. + +We provide a Custom Resource YAML file for many examples under the components/backends/{engine}/deploy folders. Consult the examples below for the CRs for a specific inference backend. + +[View SGLang K8s](../../components/backends/sglang/deploy/README.md) + +[View vLLM K8s](../../components/backends/vllm/deploy/README.md) + +[View TRT-LLM K8s](../../components/backends/trtllm/deploy/README.md) + +### Deploying a particular example + +```bash +# Set your dynamo root directory +cd +export PROJECT_ROOT=$(pwd) +export NAMESPACE= # the namespace you used to deploy Dynamo cloud to. +``` + +Deploying an example consists of the simple `kubectl apply -f ... -n ${NAMESPACE}` command. For example: + +```bash +kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE} +``` + +You can use `kubectl get dynamoGraphDeployment -n ${NAMESPACE}` to view your deployment. +You can use `kubectl delete dynamoGraphDeployment -n ${NAMESPACE}` to delete the deployment. + +We provide a Custom Resource YAML file for many examples under the `deploy/` folder. +Use [VLLM YAML](../../components/backends/vllm/deploy/agg.yaml) for an example. + +**Note 1** Example Image + +The examples use a prebuilt image from the `nvcr.io` registry. +You can utilize public images from [Dynamo NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo) or build your own image and update the image location in your CR file prior to applying. Either way, you will need to overwrite the image in the example YAML. + +To build your own image: + +```bash +./container/build.sh --framework +``` + +For example for the `sglang` run +```bash +./container/build.sh --framework sglang +``` + +To overwrite the image in the example: + +```bash +extraPodSpec: + mainContainer: + image: +``` + +**Note 2** +Setup port forward if needed when deploying to Kubernetes. + +List the services in your namespace: + +```bash +kubectl get svc -n ${NAMESPACE} +``` +Look for one that ends in `-frontend` and use it for port forward. -# 1. Please follow [Installing Dynamo Cloud](./dynamo_cloud.md) for steps to install. -For details about the Dynamo Cloud Platform, see the [Dynamo Operator Guide](dynamo_operator.md) +```bash +SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1) +kubectl port-forward svc/${SERVICE_NAME}-frontend 8080:8080 -n ${NAMESPACE} +``` -# 2. Follow [Examples](../../examples/README.md) to see how you can deploy your Inference Graphs. +Additional Resources: +- [Port Forward Documentation](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/) +- [Examples Deployment Guide](../../examples/README.md#deploying-a-particular-example) ## Manual Deployment with Helm Charts From 31b8b8840cfc3105bde0c47e40ac63eaf48775fd Mon Sep 17 00:00:00 2001 From: Anish <80174047+athreesh@users.noreply.github.com> Date: Fri, 1 Aug 2025 18:56:05 -0700 Subject: [PATCH 2/2] Update README.md Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com> --- docs/examples/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/examples/README.md b/docs/examples/README.md index 560360cd62..ec95678c59 100644 --- a/docs/examples/README.md +++ b/docs/examples/README.md @@ -43,11 +43,11 @@ You can use `kubectl delete dynamoGraphDeployment -n ${NAMESPACE We provide a Custom Resource yaml file for many examples under the `components/backends//deploy/`folder. Consult the examples below for the CRs for your specific inference backend. -[View SGLang k8s](../../components/backends/sglang/deploy/README.md) +[View SGLang k8s](/components/backends/sglang/deploy/README.md) -[View vLLM K8s](../../components/backends/vllm/deploy/README.md) +[View vLLM K8s](/components/backends/vllm/deploy/README.md) -[View TRTLLM k8s](../../components/backends/trtllm/deploy/README.md) +[View TRTLLM k8s](/components/backends/trtllm/deploy/README.md) **Note 1** Example Image