From e67749afdc3249d9bf023301dd3e1ce5ab277597 Mon Sep 17 00:00:00 2001 From: Florian Woerner Date: Tue, 12 May 2026 10:41:32 +0200 Subject: [PATCH 1/2] Fix typo in llm-d documentation link Signed-off-by: Florian Woerner --- docs/deployment/integrations/llm-d.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/deployment/integrations/llm-d.md b/docs/deployment/integrations/llm-d.md index cccf1773c6be..276fd2435cb8 100644 --- a/docs/deployment/integrations/llm-d.md +++ b/docs/deployment/integrations/llm-d.md @@ -2,4 +2,4 @@ vLLM can be deployed with [llm-d](https://github.com/llm-d/llm-d), a Kubernetes-native distributed inference serving stack providing well-lit paths for anyone to serve large generative AI models at scale. It helps achieve the fastest "time to state-of-the-art (SOTA) performance" for key OSS models across most hardware accelerators and infrastructure providers. -You can use vLLM with llm-d directly by following [this guide](https://llm-d.ai/docs/guide) or via [KServe's LLMInferenceService](https://kserve.github.io/website/docs/model-serving/generative-inference/llmisvc/llmisvc-overview). +You can use vLLM with llm-d directly by following [this guides](https://llm-d.ai/docs/guides) or via [KServe's LLMInferenceService](https://kserve.github.io/website/docs/model-serving/generative-inference/llmisvc/llmisvc-overview). From d56695bdfdc1444a72b85c4887bc98ec2b197306 Mon Sep 17 00:00:00 2001 From: Florian Woerner Date: Tue, 12 May 2026 11:31:47 +0200 Subject: [PATCH 2/2] Update docs/deployment/integrations/llm-d.md Co-authored-by: Cyrus Leung Signed-off-by: Florian Woerner --- docs/deployment/integrations/llm-d.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/deployment/integrations/llm-d.md b/docs/deployment/integrations/llm-d.md index 276fd2435cb8..6060b98f6421 100644 --- a/docs/deployment/integrations/llm-d.md +++ b/docs/deployment/integrations/llm-d.md @@ -2,4 +2,4 @@ vLLM can be deployed with [llm-d](https://github.com/llm-d/llm-d), a Kubernetes-native distributed inference serving stack providing well-lit paths for anyone to serve large generative AI models at scale. It helps achieve the fastest "time to state-of-the-art (SOTA) performance" for key OSS models across most hardware accelerators and infrastructure providers. -You can use vLLM with llm-d directly by following [this guides](https://llm-d.ai/docs/guides) or via [KServe's LLMInferenceService](https://kserve.github.io/website/docs/model-serving/generative-inference/llmisvc/llmisvc-overview). +You can use vLLM with llm-d directly by following [the official guides](https://llm-d.ai/docs/guides) or via [KServe's LLMInferenceService](https://kserve.github.io/website/docs/model-serving/generative-inference/llmisvc/llmisvc-overview).