-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Super pipeline for code transforms. #172
Conversation
@revit13 , we'll need a document that explains it |
001b23b
to
b815335
Compare
Signed-off-by: Revital Sur <[email protected]>
Signed-off-by: Revital Sur <[email protected]>
Signed-off-by: Revital Sur <[email protected]>
Signed-off-by: Revital Sur <[email protected]>
kfp/doc/multi_transform_pipeline.md
Outdated
**Note** An example super pipeline that combines several transforms, `doc_id`, `ededup`, and `fdedup`, can be found in [superworkflow_dedups_sample_wf.py](../superworkflows/v1/superworkflow_dedups_sample_wf.py). | ||
The sections that follow display two super pipelines as examples: | ||
|
||
1) [dedups super pipeline](#De-dups-super-pipeline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the link will not work.
better to use explicit names.
### Dedups super pipeline <a name = "dedups"></a>
so the link can be dedups super pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks
kfp/doc/multi_transform_pipeline.md
Outdated
|
||
### Programming languages Super pipeline | ||
|
||
This pipeline combines several programming-languages transforms: `ededup`, `doc_id`, `fdedup`, `proglang_select`, `code_quality`, `malware` and `tokenization`. It can be found in [superworkflow_code_wf.py](../superworkflows/ray/kfp_v1/superworkflow_code_sample_wf.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pipeline combines transforms for programming languages data preprocessing:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks
|
||
# Pipeline to invoke execution on remote resource | ||
@dsl.pipeline( | ||
name="sample-super-kubeflow-pipeline", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe to change the pipeline name and description to the specific usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks
Signed-off-by: Revital Sur <[email protected]>
Signed-off-by: Revital Sur <[email protected]>
Signed-off-by: Revital Sur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/Closes #173
Why are these changes needed?
This PR implements a pipeline for the notebook
ingest 2 parquet phase is not part of the super pipeline. its output was manually generated by running the notebook phase.
Please note that transforms images are not synced with the latest code in the repo. To run the pipelines please do:
cd transforms && make image && load-image
to upload the latest versions to kind cluster.Related issue number (if any).
#173