Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 15 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,21 @@ helm repo update

ModelService operates under the assumption that `llm-d-infra` has been installed in a Kubernetes cluster, which installs the required prerequisites and CRDs. Read the [`llm-d` Guides](https://github.com/llm-d/llm-d/blob/main/guides/README.md) for more information.

Note that in order to create HTTPRoute objects last, Helm hooks are used. As a consequence, these objects are not deleted when `helm delete` is executed. They should be manually deleted to avoid unexpected routing problems.
## Routing

Once a model is deployed, inference requests must be routed to it. To do this, the Kubernetes Gateway API Inference Extension (GAIE) Helm charts can be used. These charts are defined [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/). For example, to create an InferencePool, use the chart oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool.

### Relationships

Note that when using the GAIE [inferencepool chart](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool) together with the modelservice chart the following relationships will exist:

- The modelservice field `modelArtifact.routing.servicePort` should match the GAIE field `inferencePool.targetPortNumber` or be an entry in the list `inferencePool.targets` (depending on the apiVersion of InferencePool).
- The modelservice field `modelArtifact.labels` should match the GAIE field, `inferencePool.modelServers.matchLabels`.
Comment thread
kalantar marked this conversation as resolved.
Note that the field `llm-d.ai/role` will be addition in addition to the labels specified in the `modelArtifacts.labels` field.

### HTTPRoute

In addition to deploying the GAIE chart, an `HTTPRoute` is typically required to connect the `Gateway` to the `InferencePool`. Creating an HTTPRoute is not part of either chart. Some examples are provided [here](https://github.com/llm-d-incubation/llm-d-modelservice/blob/main/examples/README.md#httproute).

## Examples

Expand All @@ -56,24 +70,6 @@ Below are the values you can set.
| `routing.proxy.targetPort` | The port the vLLM decode container listens on. <br>If proxy is present, it will forward request to this port. | string | N/A |
| `routing.proxy.debugLevel` | Debug level of the routing proxy | int | 5 |
| `routing.proxy.parentRefs[*].name` | The name of the inference gateway | string | N/A |
| `routing.inferencePool.create` | If true, creates an InferencePool object | bool | `true` |
| `routing.inferencePool.extensionRef` | Name of of an epp service to use instead of the default one created by this chart. | string | N/A |
| `routing.inferenceModel.create` | If true, creates an InferenceModel object | bool | `false` |
| `routing.httpRoute.create` | If true, creates an HTTPRoute object | bool | `true` |
| `routing.httpRoute.backendRefs` | Override for HTTPRoute.backendRefs | List | [] |
| `routing.httpRoute.matches` | Override for HTTPRoute.backendRefs[*].matches where backendRefs are created by this chart. | Dict | {} |
| `routing.epp.create` | If true, creates EPP objects | bool | `true` |
| `routing.epp.service.permissions` | Role to be bound to the epp service account in place of the default created by this chart. | string | N/A |
| `routing.epp.service.type` | Type of Service created for the Inference Scheduler (Endpoint Picker) deployment | string | ClusterIP |
| `routing.epp.service.port` | The port the Inference Scheduler listens on | int | 9002 |
| `routing.epp.service.targetPort` | The target port the Inference Scheduler listens on | int | 9002 |
| `routing.epp.service.appProtocol` | The app protocol the Inference Scheduler uses | int | 9002 |
| `routing.epp.image` | Image to be used for the epp container | string | ghcr.io/llm-d/llm-d-inference-scheduler:0.0.4` |
| `routing.epp.replicas` | Number of replicas for the Inference Scheduler pod | int | 1 |
| `routing.epp.debugLevel` | Debug level used to start the Inference Scheduler pod | int | 4 |
| `routing.epp.disableReadinessProbe` | Disable readiness probe creation for the Inference Scheduler pod. <br>Set this to `true` if you want to debug on Kind. | bool | `false` |
| `routing.epp.disableLivenessProbe` | Disable liveness probe creation for the Inference Scheduler pod. <br>Set this to `true` if you want to debug on Kind. | bool | `false` |
| `routing.epp.env` | List of environment variables | List | [] |
| `decode.create` | If true, creates decode Deployment or LeaderWorkerSet | List | `true` |
| `decode.annotations` | Annotations that should be added to the Deployment or LeaderWorkerSet | Dict | {} |
| `decode.tolerations` | Tolerations that should be added to the Deployment or LeaderWorkerSet | List | [] |
Expand Down
2 changes: 1 addition & 1 deletion charts/llm-d-modelservice/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: "v0.2.16"
version: "v0.3.0"

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
Expand Down
51 changes: 1 addition & 50 deletions charts/llm-d-modelservice/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,7 @@ app.kubernetes.io/managed-by: {{ .Release.Service }}

{{/* Create common shared by prefill and decode deployment/LWS */}}
{{- define "llm-d-modelservice.pdlabels" -}}
llm-d.ai/inferenceServing: "true"
llm-d.ai/model: {{ (include "llm-d-modelservice.fullname" .) -}}
{{ .Values.modelArtifacts.labels | toYaml }}
{{- end }}

{{/* Create labels for the prefill deployment/LWS */}}
Expand Down Expand Up @@ -212,54 +211,6 @@ resources:
{{ include "llm-d-modelservice.fullname" . }}
{{- end }}

{{/* EPP service account name */}}
{{- define "llm-d-modelservice.eppServiceAccountName" -}}
{{ include "llm-d-modelservice.eppName" . }}
{{- end }}

{{/* EPP service name */}}
{{- define "llm-d-modelservice.eppServiceName" -}}
{{ include "llm-d-modelservice.eppName" . }}
{{- end }}

{{/* EPP role name */}}
{{- define "llm-d-modelservice.eppRoleName" -}}
{{ include "llm-d-modelservice.eppName" . }}
{{- end }}

{{/* EPP rolebinding name */}}
{{- define "llm-d-modelservice.eppRoleBindingName" -}}
{{ include "llm-d-modelservice.eppName" . }}
{{- end }}

{{/* EPP Config name */}}
{{- define "llm-d-modelservice.eppConfigName" -}}
{{ include "llm-d-modelservice.eppName" . }}
{{- end }}

{{/* default inference pool name */}}
{{- define "llm-d-modelservice.inferencePoolName" -}}
{{- if .Values.routing.inferencePool.name -}}
{{- .Values.routing.inferencePool.name }}
{{- else -}}
{{ include "llm-d-modelservice.fullname" . }}
{{- end }}
{{- end }}

{{/* default inference model name */}}
{{- define "llm-d-modelservice.inferenceModelName" -}}
{{- if .Values.routing.inferenceModel.name -}}
{{- .Values.routing.inferenceModel.name }}
{{- else -}}
{{ include "llm-d-modelservice.fullname" . }}
{{- end -}}
{{- end }}

{{/* default http route name */}}
{{- define "llm-d-modelservice.httpRouteName" -}}
{{ include "llm-d-modelservice.fullname" . }}
{{- end }}

{{/*
Volumes for PD containers based on model artifact prefix
Context is .Values.modelArtifacts
Expand Down
90 changes: 0 additions & 90 deletions charts/llm-d-modelservice/templates/epp-deployment.yaml

This file was deleted.

108 changes: 0 additions & 108 deletions charts/llm-d-modelservice/templates/epp-plugin-configmap.yaml

This file was deleted.

44 changes: 0 additions & 44 deletions charts/llm-d-modelservice/templates/epp-role.yaml

This file was deleted.

17 changes: 0 additions & 17 deletions charts/llm-d-modelservice/templates/epp-rolebinding.yaml

This file was deleted.

8 changes: 0 additions & 8 deletions charts/llm-d-modelservice/templates/epp-sa.yaml

This file was deleted.

Loading