Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed changes to framework #106

Open
maumueller opened this issue Apr 17, 2023 · 1 comment
Open

Proposed changes to framework #106

maumueller opened this issue Apr 17, 2023 · 1 comment
Assignees

Comments

@maumueller
Copy link
Collaborator

The current workflow to run algorithm X on dataset Y is something like this:

  1. python install.py builds docker container
  2. python create_dataset.py sets up datasets
  3. python run.py --dataset Y --algorithm X mounts data/, results/, benchmark/ into the container for X
    • it takes care of parsing the definitions file and checking present runs to figure out which runs to carry out.
    • py-docker is used to spawn the container from within the Python process
    • results are written to results/
  4. python plot.py / data_export.py / ... to evaluate the results

Given @harsha-simhadri's and @sourcesync's frustrations and some directions discussed in other meetings, I think we should relax step 3 a bit and allow more flexibility in the container setup. One direction could look like this:

  1. python install.py builds docker container, participants are expected to overwrite the entry point to point to their own implementation (file algorithms/X/Dockerfile)
  2. python create_dataset.py sets up datasets
  3. A python/shell script that contains the logic to run the container for X, (in algorithms/X/run.{py,sh})
    • as arguments, we provide task, dataset, where the results should be written, and some additional parameters
    • we mount data/, results/, and the config file that is used by the implementation (algorithms/X/config.yaml, maybe task specific)
    • The following is done by the implementation in the container:
      a. file I/O in the container, loading/building index
      b. running the experiment and providing timings
      c. writing results in a standard format (as before results/Y/X/run_identifier.hdf5)
  4. python plot.py / data_export.py / ... to evaluate the results

We provide a default run script for inspiration, which would be pretty close to the current setup. Putting all the logic into the container could mean a lot of code duplication, but isolated containers will allow for a much easier orchestration.

I can provide a proof-of-concept if this sounds promising.

@harsha-simhadri
Copy link
Owner

Martin, This sounds reasonable.

We also need to think about specialized runners for the tasks we are thinking about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants