|
| 1 | +# Template to create a codabench bundle for ML competition in python |
| 2 | + |
| 3 | +The code in this repository is a template to create a codabench bundle for a machine learning competition in python. |
| 4 | +It sets up a dummy classification task, evaluated with accuracy metric on a public and private test set. |
| 5 | + |
| 6 | +## Structure of the bundle |
| 7 | + |
| 8 | +- `ingestion_program/`: contains the ingestion program that will be run on participant's submissions. It is responsible for loading the code from the submission, passing the training data to train the model, and generating predictions on the test datasets. |
| 9 | +- `scoring_program/`: contains the scoring program that will be run to evaluate the predictions generated by the ingestion program. It loads the predictions and the ground truth labels, computes the evaluation metric (accuracy in this case), and outputs the score. |
| 10 | +- `solution/`: contains a sample solution submission that participants can use as a reference. Here, this is a simple Random Forest classifier. |
| 11 | +- `*_phase/`: contains the data for a given phase, including input data and reference labels. Running `setup_data.py` will generate dummy data for a development phase. |
| 12 | +- `competition.yaml`: configuration file for the codabench competition, specifying phases, tasks, and evaluation metrics. |
| 13 | +- `pages/`: contains markdown files that will be rendered as web pages in the codabench competition. |
| 14 | + |
| 15 | +## Extra scripts in this repository |
| 16 | + |
| 17 | +- `setup_data.py`: script to generate dummy data for the competition. This should be changed to load and preprocess real data for a given competition. |
| 18 | +- `create_bundle.py`: script to create the codabench bundle archive from the repository structure. |
| 19 | +- `Dockerfile`: Dockerfile to build the docker image that will be used to run the ingestion and scoring programs. |
| 20 | + |
| 21 | +## Instruction to create the codabench bundle |
| 22 | + |
| 23 | +Make sure that the `setup_data.py` script has been run to generate the data for the competition. |
| 24 | + |
| 25 | +Then, run the `create_bundle.py` script to create the codabench bundle archive: |
| 26 | + |
| 27 | +```bash |
| 28 | +python create_bundle.py |
| 29 | +``` |
| 30 | +You can then upload the generated `bundle.zip` file to codabench to create the competition on this [page](https://www.codabench.org/competitions/upload/). |
| 31 | + |
| 32 | + |
| 33 | +## Instructions to test the bundle locally |
| 34 | + |
| 35 | + |
| 36 | +To test the ingestion program, run: |
| 37 | + |
| 38 | +```bash |
| 39 | +python ingestion_program/ingestion.py --data-dir dev_phase/input_data/ --output-dir ingestion_res/ --submission-dir solution/ |
| 40 | +``` |
| 41 | + |
| 42 | +To test the scoring program, run: |
| 43 | +```bash |
| 44 | +python scoring_program/scoring.py --reference-dir dev_phase/reference_data/ --output-dir scoring_res --prediction-dir ingestion_res/ |
| 45 | +``` |
| 46 | + |
| 47 | + |
| 48 | +### Setting up and testing the docker image |
| 49 | + |
| 50 | +You can build the docker image locally from the `Dockerfile` with: |
| 51 | + |
| 52 | +```bash |
| 53 | +docker build -t docker-image . |
| 54 | +``` |
| 55 | + |
| 56 | +To test the docker image locally, run: |
| 57 | + |
| 58 | +```bash |
| 59 | +docker run --rm -it -u root \ |
| 60 | + -v "./ingestion_program":"/app/ingestion_program" \ |
| 61 | + -v "./dev_phase/input_data":/app/input_data \ |
| 62 | + -v "./ingestion_res":/app/output \ |
| 63 | + -v "./solution":/app/ingested_program \ |
| 64 | + --name ingestion docker-image \ |
| 65 | + python /app/ingestion_program/ingestion.py |
| 66 | + |
| 67 | +docker run --rm -it -u root \ |
| 68 | + -v "./scoring_program":"/app/scoring_program" \ |
| 69 | + -v "./dev_phase/reference_data":/app/input/ref \ |
| 70 | + -v "./ingestion_res":/app/input/res \ |
| 71 | + -v "./scoring_res":/app/output \ |
| 72 | + --name scoring docker-image \ |
| 73 | + python /app/scoring_program/scoring.py |
| 74 | +``` |
0 commit comments