Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): Add support for importing models stored in the Modelcar format (sidecar) #11606

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

mprahl
Copy link
Contributor

@mprahl mprahl commented Feb 8, 2025

Description of your changes:

This allows dsl.import to leverage Modelcar container images in an OCI repository. This works by having an init container prepull the image and then adding a sidecar container when the launcher container is running. The Modelcar container adds a symlink to its /models directory in an emptyDir volume that is accessible by the launcher container. Once the launcher is done running the user code, it stops the Modelcar containers.

This approach has the benefit of leveraging image pull secrets configured on the Kubernetes cluster rather than require separate credentials for importing the artifact. Additionally, no data is copied to the emptyDir volume, so the storage cost is just pulling the Modelcar container image on the Kubernetes worker node.

Note that once Kubernetes supports OCI images as volume mounts for several releases, consider replacing the init container with that approach.

This also adds a new environment variable of PIPELINE_RUN_AS_USER to set the runAsUser on all pods created by Argo Workflows.

Resolves:
#11584

Checklist:

Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@mprahl
Copy link
Contributor Author

mprahl commented Feb 10, 2025

The last step that I'm aware of that needs to happen is to have the launcher SIGHUP the sleep infinity rather than the Argo Exec process.

@mprahl mprahl force-pushed the import-as-oci-sidecar branch 2 times, most recently from 651abd9 to 584edf2 Compare February 10, 2025 19:56
@mprahl mprahl marked this pull request as ready for review February 10, 2025 20:18
@mprahl mprahl changed the title WIP: feat(backend): Add support for importing models stored in the Modelcar format (sidecar) feat(backend): Add support for importing models stored in the Modelcar format (sidecar) Feb 10, 2025
@google-oss-prow google-oss-prow bot requested a review from HumairAK February 10, 2025 20:18
@mprahl mprahl force-pushed the import-as-oci-sidecar branch from 584edf2 to a6f4c94 Compare February 11, 2025 14:01
@mprahl mprahl force-pushed the import-as-oci-sidecar branch 5 times, most recently from a380cf3 to f94dd85 Compare February 12, 2025 18:46
@HumairAK
Copy link
Collaborator

@mprahl can you rebase?

@mprahl mprahl force-pushed the import-as-oci-sidecar branch from 8a0b0ec to c60152c Compare February 19, 2025 01:19
@google-oss-prow google-oss-prow bot removed the lgtm label Feb 19, 2025
@mprahl
Copy link
Contributor Author

mprahl commented Feb 19, 2025

@mprahl can you rebase?

@HumairAK okay, rebased.

@mprahl
Copy link
Contributor Author

mprahl commented Feb 19, 2025

/lgtm This is really cool. A couple of side questions came to mind.

  1. What happens if the model is not in the /models path? I know this is the default in kserve images, but can we always count on this? I know BentoML and Seldon also create oci images for models. Is the path something we can parameterize?
  2. We may need examples/docs, since it may not be immediately clear in which use cases this is valuable--answering the question of when I should use an api like huggingface, when I should use s3, and when I should use oci.

Good question on 1. For Modelcar, it's always in /models. It's part of the requirements. I couldn't find any clear documentation on the format used from BentoML and Seldon, so if there's a need later on to support those, we'll have to figure out what the standard is. Since KServe is part of Kubeflow, we should try to support their packaging options.

On 2, I definitely need to add documentation to the website with examples. If this PR gets merged, I'll try to get documentation in before the next KFP release.

Thanks for the review!

@@ -108,6 +114,14 @@ def convert_local_path_to_remote_path(path: str) -> str:
return MINIO_REMOTE_PREFIX + path[len(_MINIO_LOCAL_MOUNT_PREFIX):]
elif path.startswith(_S3_LOCAL_MOUNT_PREFIX):
return S3_REMOTE_PREFIX + path[len(_S3_LOCAL_MOUNT_PREFIX):]
elif path.startswith(_OCI_LOCAL_MOUNT_PREFIX):
remotePath = OCI_REMOTE_PREFIX + path[len(_OCI_LOCAL_MOUNT_PREFIX
):].replace('\\/', '/')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
):].replace('\\/', '/')
):].replace('\\/', '/')
PEP 8: E124 closing bracket does not match visual indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was how yapf formatted it but I'll try something nicer.

@property
def path(self) -> str:
if self.uri.startswith("oci://"):
return self._get_path() + "/models"
Copy link
Collaborator

@HumairAK HumairAK Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not immediately clear to me why we need this, I'm guessing this has to do something with how oci is structured

optional: maybe we can drop a comment here informing future devs about the significance of this path? wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a hardcoded path that KServe expects for Modelcar containers. I'll add a note.

podSpec.InitContainers = append(
podSpec.InitContainers,
k8score.Container{
Name: "oci-prepull-" + name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you are guaranteed that name here follows the subdomain convention for RFC 1123, for example you can have an artifact named "input_model" and this would fail, same goes for other places where we are using this for resource values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll sort by input artifact name and then use the index here instead. It'll be deterministic and always valid.

@HumairAK
Copy link
Collaborator

a note for documentation, we'll probably want to somehow express that only input models are supported today, as soon as users try to output this oci model it will fail

@mprahl mprahl force-pushed the import-as-oci-sidecar branch 2 times, most recently from 5ae4eaf to c8951fc Compare February 19, 2025 16:33
@mprahl mprahl requested review from HumairAK and dandawg February 19, 2025 16:38
@dandawg
Copy link
Contributor

dandawg commented Feb 19, 2025

a note for documentation, we'll probably want to somehow express that only input models are supported today, as soon as users try to output this oci model it will fail

I can definitely see this being an issue, as data scientists expect to be able to pass artifacts (models) between components for different jobs. They can get around this if they need by creating an Output[Model] and then copying the oci-model import to this artifact (inside their component), but it's a hack. I think we can still move ahead with this first step, and just document the oci-model limitations.

@mprahl mprahl force-pushed the import-as-oci-sidecar branch from c8951fc to 91138b2 Compare February 19, 2025 17:23
@mprahl mprahl requested review from HumairAK and dandawg February 19, 2025 17:23
@mprahl mprahl force-pushed the import-as-oci-sidecar branch 2 times, most recently from c5d466f to c7b23e1 Compare February 19, 2025 18:05
This allows dsl.import to leverage Modelcar container images in an OCI
repository. This works by having an init container prepull the image and
then adding a sidecar container when the launcher container is running.
The Modelcar container adds a symlink to its /models directory in an
emptyDir volume that is accessible by the launcher container. Once the
launcher is done running the user code, it stops the Modelcar
containers.

This approach has the benefit of leveraging image pull secrets
configured on the Kubernetes cluster rather than require separate
credentials for importing the artifact. Additionally, no data is copied
to the emptyDir volume, so the storage cost is just pulling the Modelcar
container image on the Kubernetes worker node.

Note that once Kubernetes supports OCI images as volume mounts for
several releases, consider replacing the init container with that
approach.

This also adds a new environment variable of PIPELINE_RUN_AS_USER to
set the runAsUser on all Pods created by Argo Workflows.

Resolves:
kubeflow#11584

Signed-off-by: mprahl <[email protected]>
@mprahl mprahl force-pushed the import-as-oci-sidecar branch from c7b23e1 to 106f72f Compare February 19, 2025 18:29
@HumairAK
Copy link
Collaborator

/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dandawg, HumairAK

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit cc1c435 into kubeflow:master Feb 19, 2025
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants