DOC improve README with bundle structure details

tomMoral · web-flow · commit 5b87880ca4f7 · 2025-11-16T11:32:49.000+01:00
diff --git a/README.md b/README.md
@@ -6,7 +6,16 @@ It sets up a dummy classification task, evaluated with accuracy metric on a publ
 ## Structure of the bundle
 
 - `ingestion_program/`: contains the ingestion program that will be run on participant's submissions. It is responsible for loading the code from the submission, passing the training data to train the model, and generating predictions on the test datasets.
+  It contains:
+  * `metadata.yaml`: A file describing how to run the ingestion program for `codabench`. For a single script ingestion program in `ingestion.py`, no need to edit it.
+  * `ingestion.py`: A script to run the ingestion. The role of this script is to load the submission code and produce predictions that can be evaluated with the `scoring_program`.
+    In our example, the submission define a `train_model` function that is called with the training data, and returns a model from which we can call the `predict` method.
+    The predictions are then stored as a csv file, to be loaded with the `scoring_program`.
 - `scoring_program/`: contains the scoring program that will be run to evaluate the predictions generated by the ingestion program. It loads the predictions and the ground truth labels, computes the evaluation metric (accuracy in this case), and outputs the score.
+  It contains:
+  * `metadata.yaml`: A file describing how to run the scoring program for `codabench`. For a single script ingestion program in `scoring.py`, no need to edit it.
+  * `scoring.py`: A script to run the scoring. This script loads the prediction dumped from the ingestion program and produce a single json file containing the scores associated with the submission.
+    In our example, we compute `accuracy` on two test sets (public and private) as well as runtime.
 - `solution/`: contains a sample solution submission that participants can use as a reference. Here, this is a simple Random Forest classifier.
 - `*_phase/`: contains the data for a given phase, including input data and reference labels. Running `setup_data.py` will generate dummy data for a development phase.
 - `competition.yaml`: configuration file for the codabench competition, specifying phases, tasks, and evaluation metrics.