diff --git a/README.md b/README.md index 3ee87bf47..4f681e3b9 100644 --- a/README.md +++ b/README.md @@ -123,8 +123,8 @@ The common part of are: -Docker/Podman Two important development tools will also be installed using the steps below: --[pre-commit](https://pre-commit.com/) --[twine](https://twine.readthedocs.io/en/stable/) +- [pre-commit](https://pre-commit.com/) +- [twine](https://twine.readthedocs.io/en/stable/) #### Installation Steps ```shell @@ -151,24 +151,15 @@ that can run on your machine. This implementation can also be extended to connec ### Automate a Pipeline The data preprocessing can be automated by running transformers as a Kubeflow pipeline (KFP). -See this simple transform pipeline [tutorial](kfp/doc/simple_transform_pipeline.md). See [multi-steps pipeline](kfp/doc/multi_transform_pipeline.md) -if you want to combine several data transformation steps. +The project facilitates the creation of a local [Kind cluster](https://kind.sigs.k8s.io/) with all the required +software and test data, or deployment of required software on an existing cluster. +See [Set up a Kubernetes clusters for KFP execution](kfp/doc/setup.md) -The project facilitates the creation of a local [Kind cluster](https://kind.sigs.k8s.io/) with all the required software and test data. -To work with the Kind cluster and KFP, you need to install several required software packages. Please refer to -[prerequisite software](./kind/README.md#preinstalled-software) for more details. +A simple transform pipeline [tutorial](kfp/doc/simple_transform_pipeline.md) explains the pipeline creation and execution. +In addition, if you want to combine severat transformers in a single pipeline, you can look at [multi-steps pipeline](kfp/doc/multi_transform_pipeline.md) -When you have all those packages installed, you can execute the following setup command, - -```bash -make setup -``` -from this main package directory or from the `kind` directory. - -When you finish working with the cluster, you can destroy it by running, -```bash -make clean -``` +When you finish working with the cluster, and want to clean up or destroy it. See the +[clean up the cluster](../kfp/doc/setup.md#cleanup) ### How to Navigate and Use the Repository See the documentation on [repository structure and its use](doc/repo.md). diff --git a/kfp/README.md b/kfp/README.md index bac282c9f..800038f9c 100644 --- a/kfp/README.md +++ b/kfp/README.md @@ -1,4 +1,21 @@ -# Automation with Kubeflow Pipelines +# Automation with Kubeflow Pipelines + +## Map betweens transforms and KFP pipelines + +| Transform | KFP pipeline | +|-------------------------------------|:----------------------------------------------------------------------------------:| +| code/malware | [malware_wf.py](../transforms/code/malware/kfp_ray/v1/malware_wf.py) | +| code/code_quality | [code_quality_wf.py](../transforms/code/code_quality/kfp_ray/v1/code_quality_wf.py) | +| code/programming language_annotator | [proglang_select_wf.py](../transforms/code/proglang_select/kfp_ray/v1/proglang_select_wf.py) | +| universal/doc_id | [doc_id_wf.py](../transforms/universal/doc_id/kfp_ray/v1/doc_id_wf.py) | +| universal/ededup | [ededup_wf.py](../transforms/universal/ededup/kfp_ray/v1/ededup_wf.py) | +| universal/fdedup | [fdedup_wf.py](../transforms/universal/fdedup/kfp_ray/v1/fdedup_wf.py) | +| universal/filtering | [filter_wf.py](../transforms/universal/filter/kfp_ray/v1/filter_wf.py) | +| universal/noop | [noop_wf.py](../transforms/universal/noop/kfp_ray/v1/noop_wf.py) | +| universal/tokenization | [tokenization_wf.py](../transforms/universal/tokenization/kfp_ray/v1/tokenization_wf.py) | + + +## Set up and working steps - [Set up a Kubernetes clusters for KFP execution](./doc/setup.md) - [Simple Transform pipeline tutorial](./doc/simple_transform_pipeline.md) diff --git a/kfp/Readme.md b/kfp/Readme.md deleted file mode 100644 index e8c4aa473..000000000 --- a/kfp/Readme.md +++ /dev/null @@ -1,16 +0,0 @@ -Map betweens transforms and KFP pipelines - -| Transform | KFP pipeline | -|-------------------------------------|:----------------------------------------------------------------------------------:| -| code/malware | [malware_wf.py](../transforms/code/malware/kfp_ray/v1/malware_wf.py) | -| code/code_quality | [code_quality_wf.py](../transforms/code/code_quality/kfp_ray/v1/code_quality_wf.py) | -| code/programming language_annotator | [proglang_select_wf.py](../transforms/code/proglang_select/kfp_ray/v1/proglang_select_wf.py) | -| universal/doc_id | [doc_id_wf.py](../transforms/universal/doc_id/kfp_ray/v1/doc_id_wf.py) | -| universal/ededup | [ededup_wf.py](../transforms/universal/ededup/kfp_ray/v1/ededup_wf.py) | -| universal/fdedup | [fdedup_wf.py](../transforms/universal/fdedup/kfp_ray/v1/fdedup_wf.py) | -| universal/filtering | [filter_wf.py](../transforms/universal/filter/kfp_ray/v1/filter_wf.py) | -| universal/noop | [noop_wf.py](../transforms/universal/noop/kfp_ray/v1/noop_wf.py) | -| universal/tokenization | [tokenization_wf.py](../transforms/universal/tokenization/kfp_ray/v1/tokenization_wf.py) | - - -For more information you can find [here](./doc/simple_transform_pipeline.md) a toturial that shows how to build, compile, and execute a KFP pipeline for a simple transfotm. diff --git a/kfp/doc/setup.md b/kfp/doc/setup.md index 58f0e3f9d..cd7b48763 100644 --- a/kfp/doc/setup.md +++ b/kfp/doc/setup.md @@ -7,7 +7,7 @@ - [An existing cluster](#existing_cluster) - [Installation steps](#installation) - [Installation on an existing Kubernetes cluster](#installation_existing) -- [Clean up the cluster](#cleanup") +- [Clean up the cluster](#cleanup) The project provides instructions and deployment automation to run all components in an all-inclusive fashion on a single machine using a [Kind cluster](https://kind.sigs.k8s.io/) and a local data storage ([MinIO](https://min.io/)).