CI add Dockerfile and CI

tomMoral · tomMoral · commit 6722a774bdd7 · 2025-11-14T12:16:43.000+01:00
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -0,0 +1,64 @@
+# Test that the docker image builds correctly
+# and that running the ingestion and scoring programs works
+# Also use cache for the data and the docker image.
+name: Test Docker Image
+
+on:
+    push:
+        branches: [ main ]
+    pull_request:
+
+jobs:
+    test:
+        runs-on: ubuntu-latest
+        env:
+            DOCKER_IMAGE_NAME: docker-codabench-test
+
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v3
+
+            - name: Set up Docker Buildx
+              uses: docker/setup-buildx-action@v2
+
+            - name: Prepare buildx cache dir
+              run: mkdir -p /tmp/.buildx-cache
+
+            - name: Cache Docker buildx layers
+              uses: actions/cache@v3
+              with:
+                path: /tmp/.buildx-cache
+                key: ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }}-${{ github.ref }}
+                restore-keys: |
+                    ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }}-
+
+            - name: Build Docker image
+              run: |
+                docker build --progress=plain \
+                    --cache-from=type=local,src=/tmp/.buildx-cache \
+                    --cache-to=type=local,dest=/tmp/.buildx-cache,mode=max \
+                    -t ${{ env.DOCKER_IMAGE_NAME }} .
+
+            - name: Prepare input/output directories (per README)
+              run: |
+                set -e
+                python setup_data.py
+
+            - name: Test ingestion program
+              run: |
+                docker run --rm -it -u root \
+                    -v "./ingestion_program":"/app/ingestion_program" \
+                    -v "./dev_phase/input_data":/app/input_data \
+                    -v "./ingestion_res":/app/output \
+                    -v "./solution":/app/ingested_program \
+                    --name ingestion ${{ env.DOCKER_IMAGE_NAME }} \
+                        python /app/ingestion_program/ingestion.py
+
+            - name: Test scoring program
+              run: docker run --rm -it -u root \
+                    -v "./scoring_program":"/app/scoring_program" \
+                    -v "./dev_phase/reference_data":/app/input/ref \
+                    -v "./ingestion_res":/app/input/res \
+                    -v "./scoring_res":/app/output \
+                    --name scoring ${{ env.DOCKER_IMAGE_NAME }} \
+                        python /app/scoring_program/scoring.py
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,20 @@
+# Step 1: Start from an official Docker image with desired base environment
+# Good starting points are the official codalab images or
+# pytorch images with CUDA support:
+#    - Codalab: codalab/codalab-legacy:py39
+#    - Codalab GPU: codalab/codalab-legacy:gpu310
+#    - Pytorch: pytorch/pytorch:2.8.0-cuda12.6-cudnn9-runtime
+FROM codalab/codalab-legacy:py39
+
+# Set environment variables to prevent interactive prompts
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Step 2: Install system-level dependencies (if any)
+# e.g., git, wget, or common libraries for OpenCV like libgl1
+RUN pip install -U pip
+
+# Step 3: Copy and pre-install all Python dependencies
+# This 'requirements.txt' file should list pandas, scikit-learn, timm, etc.
+# Place it in the same directory as this Dockerfile.
+COPY requirements.txt /tmp/requirements.txt
+RUN pip install --no-cache-dir -r /tmp/requirements.txt
diff --git a/README.md b/README.md
@@ -0,0 +1,74 @@
+# Template to create a codabench bundle for ML competition in python
+
+The code in this repository is a template to create a codabench bundle for a machine learning competition in python.
+It sets up a dummy classification task, evaluated with accuracy metric on a public and private test set.
+
+## Structure of the bundle
+
+- `ingestion_program/`: contains the ingestion program that will be run on participant's submissions. It is responsible for loading the code from the submission, passing the training data to train the model, and generating predictions on the test datasets.
+- `scoring_program/`: contains the scoring program that will be run to evaluate the predictions generated by the ingestion program. It loads the predictions and the ground truth labels, computes the evaluation metric (accuracy in this case), and outputs the score.
+- `solution/`: contains a sample solution submission that participants can use as a reference. Here, this is a simple Random Forest classifier.
+- `*_phase/`: contains the data for a given phase, including input data and reference labels. Running `setup_data.py` will generate dummy data for a development phase.
+- `competition.yaml`: configuration file for the codabench competition, specifying phases, tasks, and evaluation metrics.
+- `pages/`: contains markdown files that will be rendered as web pages in the codabench competition.
+
+## Extra scripts in this repository
+
+- `setup_data.py`: script to generate dummy data for the competition. This should be changed to load and preprocess real data for a given competition.
+- `create_bundle.py`: script to create the codabench bundle archive from the repository structure.
+- `Dockerfile`: Dockerfile to build the docker image that will be used to run the ingestion and scoring programs.
+
+## Instruction to create the codabench bundle
+
+Make sure that the `setup_data.py` script has been run to generate the data for the competition.
+
+Then, run the `create_bundle.py` script to create the codabench bundle archive:
+
+```bash
+python create_bundle.py
+```
+You can then upload the generated `bundle.zip` file to codabench to create the competition on this [page](https://www.codabench.org/competitions/upload/).
+
+
+## Instructions to test the bundle locally
+
+
+To test the ingestion program, run:
+
+```bash
+python ingestion_program/ingestion.py --data-dir dev_phase/input_data/ --output-dir ingestion_res/  --submission-dir solution/
+```
+
+To test the scoring program, run:
+```bash
+python scoring_program/scoring.py --reference-dir dev_phase/reference_data/ --output-dir scoring_res  --prediction-dir ingestion_res/
+```
+
+
+### Setting up and testing the docker image
+
+You can build the docker image locally from the `Dockerfile` with:
+
+```bash
+docker build -t docker-image .
+```
+
+To test the docker image locally, run:
+
+```bash
+docker run --rm -it -u root \
+    -v "./ingestion_program":"/app/ingestion_program" \
+    -v "./dev_phase/input_data":/app/input_data \
+    -v "./ingestion_res":/app/output \
+    -v "./solution":/app/ingested_program \
+    --name ingestion docker-image \
+        python /app/ingestion_program/ingestion.py
+
+docker run --rm -it -u root \
+    -v "./scoring_program":"/app/scoring_program" \
+    -v "./dev_phase/reference_data":/app/input/ref \
+    -v "./ingestion_res":/app/input/res \
+    -v "./scoring_res":/app/output \
+    --name scoring docker-image \
+        python /app/scoring_program/scoring.py
+```
diff --git a/logo.png b/logo.png