-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backend): Add support for importing models stored in the Modelcar format (sidecar) #11606
Conversation
Skipping CI for Draft Pull Request. |
e081256
to
f6d7081
Compare
The last step that I'm aware of that needs to happen is to have the launcher SIGHUP the |
651abd9
to
584edf2
Compare
584edf2
to
a6f4c94
Compare
a380cf3
to
f94dd85
Compare
@mprahl can you rebase? |
8a0b0ec
to
c60152c
Compare
Good question on 1. For Modelcar, it's always in On 2, I definitely need to add documentation to the website with examples. If this PR gets merged, I'll try to get documentation in before the next KFP release. Thanks for the review! |
@@ -108,6 +114,14 @@ def convert_local_path_to_remote_path(path: str) -> str: | |||
return MINIO_REMOTE_PREFIX + path[len(_MINIO_LOCAL_MOUNT_PREFIX):] | |||
elif path.startswith(_S3_LOCAL_MOUNT_PREFIX): | |||
return S3_REMOTE_PREFIX + path[len(_S3_LOCAL_MOUNT_PREFIX):] | |||
elif path.startswith(_OCI_LOCAL_MOUNT_PREFIX): | |||
remotePath = OCI_REMOTE_PREFIX + path[len(_OCI_LOCAL_MOUNT_PREFIX | |||
):].replace('\\/', '/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
):].replace('\\/', '/') | |
):].replace('\\/', '/') |
PEP 8: E124 closing bracket does not match visual indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was how yapf
formatted it but I'll try something nicer.
@property | ||
def path(self) -> str: | ||
if self.uri.startswith("oci://"): | ||
return self._get_path() + "/models" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not immediately clear to me why we need this, I'm guessing this has to do something with how oci is structured
optional: maybe we can drop a comment here informing future devs about the significance of this path? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a hardcoded path that KServe expects for Modelcar containers. I'll add a note.
backend/src/v2/driver/driver.go
Outdated
podSpec.InitContainers = append( | ||
podSpec.InitContainers, | ||
k8score.Container{ | ||
Name: "oci-prepull-" + name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you are guaranteed that name
here follows the subdomain convention for RFC 1123, for example you can have an artifact named "input_model" and this would fail, same goes for other places where we are using this for resource values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll sort by input artifact name and then use the index here instead. It'll be deterministic and always valid.
a note for documentation, we'll probably want to somehow express that only input models are supported today, as soon as users try to output this oci model it will fail |
5ae4eaf
to
c8951fc
Compare
I can definitely see this being an issue, as data scientists expect to be able to pass artifacts (models) between components for different jobs. They can get around this if they need by creating an Output[Model] and then copying the oci-model import to this artifact (inside their component), but it's a hack. I think we can still move ahead with this first step, and just document the oci-model limitations. |
c8951fc
to
91138b2
Compare
c5d466f
to
c7b23e1
Compare
This allows dsl.import to leverage Modelcar container images in an OCI repository. This works by having an init container prepull the image and then adding a sidecar container when the launcher container is running. The Modelcar container adds a symlink to its /models directory in an emptyDir volume that is accessible by the launcher container. Once the launcher is done running the user code, it stops the Modelcar containers. This approach has the benefit of leveraging image pull secrets configured on the Kubernetes cluster rather than require separate credentials for importing the artifact. Additionally, no data is copied to the emptyDir volume, so the storage cost is just pulling the Modelcar container image on the Kubernetes worker node. Note that once Kubernetes supports OCI images as volume mounts for several releases, consider replacing the init container with that approach. This also adds a new environment variable of PIPELINE_RUN_AS_USER to set the runAsUser on all Pods created by Argo Workflows. Resolves: kubeflow#11584 Signed-off-by: mprahl <[email protected]>
c7b23e1
to
106f72f
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dandawg, HumairAK The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description of your changes:
This allows dsl.import to leverage Modelcar container images in an OCI repository. This works by having an init container prepull the image and then adding a sidecar container when the launcher container is running. The Modelcar container adds a symlink to its /models directory in an emptyDir volume that is accessible by the launcher container. Once the launcher is done running the user code, it stops the Modelcar containers.
This approach has the benefit of leveraging image pull secrets configured on the Kubernetes cluster rather than require separate credentials for importing the artifact. Additionally, no data is copied to the emptyDir volume, so the storage cost is just pulling the Modelcar container image on the Kubernetes worker node.
Note that once Kubernetes supports OCI images as volume mounts for several releases, consider replacing the init container with that approach.
This also adds a new environment variable of PIPELINE_RUN_AS_USER to set the runAsUser on all pods created by Argo Workflows.
Resolves:
#11584
Checklist: