Skip to content

Commit

Permalink
Merge pull request #191 from roytman/dev2
Browse files Browse the repository at this point in the history
Dev2
  • Loading branch information
roytman authored May 28, 2024
2 parents 3968135 + 76b3d86 commit 22d65fb
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 36 deletions.
27 changes: 9 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,8 @@ The common part of are:
-Docker/Podman

Two important development tools will also be installed using the steps below:
-[pre-commit](https://pre-commit.com/)
-[twine](https://twine.readthedocs.io/en/stable/)
- [pre-commit](https://pre-commit.com/)
- [twine](https://twine.readthedocs.io/en/stable/)

#### Installation Steps
```shell
Expand All @@ -151,24 +151,15 @@ that can run on your machine. This implementation can also be extended to connec

### Automate a Pipeline
The data preprocessing can be automated by running transformers as a Kubeflow pipeline (KFP).
See this simple transform pipeline [tutorial](kfp/doc/simple_transform_pipeline.md). See [multi-steps pipeline](kfp/doc/multi_transform_pipeline.md)
if you want to combine several data transformation steps.
The project facilitates the creation of a local [Kind cluster](https://kind.sigs.k8s.io/) with all the required
software and test data, or deployment of required software on an existing cluster.
See [Set up a Kubernetes clusters for KFP execution](kfp/doc/setup.md)

The project facilitates the creation of a local [Kind cluster](https://kind.sigs.k8s.io/) with all the required software and test data.
To work with the Kind cluster and KFP, you need to install several required software packages. Please refer to
[prerequisite software](./kind/README.md#preinstalled-software) for more details.
A simple transform pipeline [tutorial](kfp/doc/simple_transform_pipeline.md) explains the pipeline creation and execution.
In addition, if you want to combine severat transformers in a single pipeline, you can look at [multi-steps pipeline](kfp/doc/multi_transform_pipeline.md)

When you have all those packages installed, you can execute the following setup command,

```bash
make setup
```
from this main package directory or from the `kind` directory.

When you finish working with the cluster, you can destroy it by running,
```bash
make clean
```
When you finish working with the cluster, and want to clean up or destroy it. See the
[clean up the cluster](../kfp/doc/setup.md#cleanup)

### How to Navigate and Use the Repository
See the documentation on [repository structure and its use](doc/repo.md).
Expand Down
19 changes: 18 additions & 1 deletion kfp/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,21 @@
# Automation with Kubeflow Pipelines
# Automation with Kubeflow Pipelines

## Map betweens transforms and KFP pipelines

| Transform | KFP pipeline |
|-------------------------------------|:----------------------------------------------------------------------------------:|
| code/malware | [malware_wf.py](../transforms/code/malware/kfp_ray/v1/malware_wf.py) |
| code/code_quality | [code_quality_wf.py](../transforms/code/code_quality/kfp_ray/v1/code_quality_wf.py) |
| code/programming language_annotator | [proglang_select_wf.py](../transforms/code/proglang_select/kfp_ray/v1/proglang_select_wf.py) |
| universal/doc_id | [doc_id_wf.py](../transforms/universal/doc_id/kfp_ray/v1/doc_id_wf.py) |
| universal/ededup | [ededup_wf.py](../transforms/universal/ededup/kfp_ray/v1/ededup_wf.py) |
| universal/fdedup | [fdedup_wf.py](../transforms/universal/fdedup/kfp_ray/v1/fdedup_wf.py) |
| universal/filtering | [filter_wf.py](../transforms/universal/filter/kfp_ray/v1/filter_wf.py) |
| universal/noop | [noop_wf.py](../transforms/universal/noop/kfp_ray/v1/noop_wf.py) |
| universal/tokenization | [tokenization_wf.py](../transforms/universal/tokenization/kfp_ray/v1/tokenization_wf.py) |


## Set up and working steps

- [Set up a Kubernetes clusters for KFP execution](./doc/setup.md)
- [Simple Transform pipeline tutorial](./doc/simple_transform_pipeline.md)
Expand Down
16 changes: 0 additions & 16 deletions kfp/Readme.md

This file was deleted.

2 changes: 1 addition & 1 deletion kfp/doc/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
- [An existing cluster](#existing_cluster)
- [Installation steps](#installation)
- [Installation on an existing Kubernetes cluster](#installation_existing)
- [Clean up the cluster](#cleanup")
- [Clean up the cluster](#cleanup)

The project provides instructions and deployment automation to run all components in an all-inclusive fashion on a
single machine using a [Kind cluster](https://kind.sigs.k8s.io/) and a local data storage ([MinIO](https://min.io/)).
Expand Down

0 comments on commit 22d65fb

Please sign in to comment.