Is there any public Kubeflow Pipeline Registry ? #11129
Replies: 5 comments 3 replies
-
As far as I know there is not currently. In my company we have our own set of reusable components that internal users can use. Practically its a library on top of kfp that enables reusability and other stuff as well. Talked with people in other companies and they all seems to have similar tools implemented including the reusable components. So lets see how this looks like in the S3 example. from kfp import dsl, kubernetes
from kfp.dsl import Output, Dataset
@dsl.component(
base_image="python:3.9",
packages_to_install=["s3fs==2024.6.0","pandas==2.2.2"])
def from_s3_comp(bucket: str, key: str, output: Output[Dataset]):
"""Fetches data from an S3 bucket and writes it to an output file.
Args:
bucket (str): The name of the S3 bucket.
key (str): The key of the object in the S3 bucket to be fetched.
output (Output[Dataset]): The output file where the fetched data will be written.
"""
# pylint: disable=invalid-name
from s3fs import S3FileSystem
s3 = S3FileSystem(anon=False)
with s3.open(f"{bucket}/{key}", mode="rb") as f:
content = f.read()
with open(output.path, "wb") as f:
f.write(content)
def from_s3(
bucket: str,
key: str,
):
comp = from_s3_comp(bucket=bucket, key=key)
kubernetes.use_secret_as_env(comp,
secret_name='awscreds',
secret_key_to_env={'AWS_ACCESS_KEY_ID': 'AWS_ACCESS_KEY_ID',
'AWS_SECRET_ACCESS_KEY': 'AWS_SECRET_ACCESS_KEY',
'AWS_REGION': 'AWS_REGION'})
return comp from_s3_comp will create a component with the basic code functionality (maybe that could be part of component repository), while from_s3 in this case is mapping secrets from awscreds to env variables so that from_s3_comp can authenticate. Now imagine that various users would maybe like to add pod labels , pod annotations retry strategy, have different secrets files etc... |
Beta Was this translation helpful? Give feedback.
-
@milosjava oh yes that is exactly what I meant. A shared library of reusable components is not only reducing tedious tasks, but also helping people who have little knowledge of K8S get used to Kubeflow easier. |
Beta Was this translation helpful? Give feedback.
-
@haiminh2001 yes, creating a library will be a challenge due to the all use cases and some specific settings that every organisation has. But we can start with the repo with various components (as one I have shared) at least as examples that could be quickly adjusted for specific cases in other organisations. I hope that could help process of kubeflow adoption in various comapnies. I have some repo where I have planned to put various useful commands and kubeflow related code for future reference so I guess I can use it for now. @rimolive do you have maybe info regarding some repo or location with kubeflow components that can be reused ? |
Beta Was this translation helpful? Give feedback.
-
I'm not very familiar with it, but isn't that the intention of https://github.com/kubeflow/pipelines/tree/master/components ? |
Beta Was this translation helpful? Give feedback.
-
@gregsheremeta @milosjava I think that the components from kubeflow pipelines github is not very active though, they only serve as examples now, and contributing to a github repository is not so simple. A public repository should be easy to contribute. Therefore, I think perhaps a python SDK library is not the way to go after all. The idea in my head when I opened this discussion is something like a helm repository. |
Beta Was this translation helpful? Give feedback.
-
Hi, I am quite new with Kubeflow Pipeline. I found the concept of pipeline registry, I think such a thing will benefit a lot. But I can not find any public registry that I can download pipelines from other than VertexAI's one. I expect common tasks like downloading data from a S3 storage should be available like BigQuery's one.
My guess about the absence of the registry is that perhaps the community is not enough or the kubeflow pipeline itself is not mature enough?
Beta Was this translation helpful? Give feedback.
All reactions