Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Fluid with Kubeflow Pipline #3694

Merged
merged 5 commits into from
Jan 21, 2024

Conversation

zhang-x-z
Copy link
Collaborator

@zhang-x-z zhang-x-z commented Jan 16, 2024

Ⅰ. Describe what this PR does

This PR provides a demo which demonstrates how to integrate Fluid with Kubeflow Pipline (KFP).

We build some Fluid KFP components to

  • Create Dataset
  • Create AlluxioRuntime
  • Preheat Dataset
  • Cleanup Dataset and AlluxioRuntime
  • Cleanup DataLoad operation

We use these components to load and preheat the dataset FashionMNIST which is used to train and test a simple CNN. Users can directly use the YAML files we already compiled or modify the code we provide and re-generate the YAML files.

Ⅱ. Does this pull request fix one issue?

None

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

We have run this demo in our KFP environment.
image
image

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

wang-mask and others added 5 commits January 15, 2024 06:24
Signed-off-by: wang-mask <[email protected]>
Signed-off-by: wang-mask <[email protected]>
Signed-off-by: ZhangXiaozheng <[email protected]>
Copy link

sonarcloud bot commented Jan 16, 2024

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

2 New issues
1 Security Hotspot
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

from kfp import dsl, compiler

# Create a Fluid dataset which contains data in S3.
@dsl.component(packages_to_install=['git+https://github.com/fluid-cloudnative/fluid-client-python.git'])
Copy link
Collaborator Author

@zhang-x-z zhang-x-z Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we should avoid using package_to_install in production environment (more details). But for now we only have image which install Fluid Python SDK in our image repo. I think this image should be maintained by Fluid community, so this maybe need to update when we have a official base image.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We can handle this in next PR. And currently we have optimized the fluid sdk, please check this doc for reference. https://github.com/fluid-cloudnative/fluid-client-python/blob/master/examples/ml_pipeline/pipeline.ipynb

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (877c607) 64.46% compared to head (b2b20e4) 64.47%.
Report is 22 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3694   +/-   ##
=======================================
  Coverage   64.46%   64.47%           
=======================================
  Files         471      471           
  Lines       28153    28153           
=======================================
+ Hits        18149    18151    +2     
+ Misses       7848     7847    -1     
+ Partials     2156     2155    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

- datasets
- datasets/status
- alluxioruntimes
- alluxioruntimes/status
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the access of dataload? And I think the verbs can be restricted as create/delete/get.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do need dataload. I will change the verbs and re-test it.

@@ -0,0 +1,27 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not clusterRole. It should be Role in specified namespace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It will be more safe in that case. For standalone KFP deployment, maybe we can only set this namespace to which KFP was deployed. More details about the namespace in Kubeflow.

@cheyang
Copy link
Collaborator

cheyang commented Jan 21, 2024

@zhang-x-z I can merge this PR first. Please create another PR to continue enhancing according my comments. Thanks.

Copy link
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link

fluid-e2e-bot bot commented Jan 21, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheyang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fluid-e2e-bot fluid-e2e-bot bot merged commit d98b4ca into fluid-cloudnative:master Jan 21, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants