Add storage import and export features to command line and web API #1082

notoraptor · 2023-02-10T11:52:14Z

Description

HI @bouthilx ! This is a pull request to add export (dump) and import (load) features to Oríon, currently available via new command lines and new web API entries. New features use storage objects instead of underlying databases.

This should fix #953

NB: important remarks:

I had to add few modifications to BaseStorageProtocol
Overall, import/export may be slower when using storage instead of database when we import/export whole data. That's because we must call storage method for each data (benchmark or experiment) we want to manage, making a call to database for each manipulation. While, with database object, we could group queries and make one call to write/read many data at once.

Changes

Add a new module storage_backup that provides 2 functions, dump_database for export, and load_database for import.
Add new command lines using these functions, orion db dump and orion db load
Add new web API entries, /dump for export, /load for import, and /import-status to monitor import.

Checklist

Tests

I added corresponding tests for bug fixes and new features. If possible, the tests fail without the changes
All new and existing tests are passing ($ tox -e py38; replace 38 by your Python version if necessary)

Documentation

I have updated the relevant documentation related to my changes

Quality

I have read the CONTRIBUTING doc
My commits messages follow this format
My code follows the style guidelines ($ tox -e lint)

docs/src/user/storage.rst

src/orion/core/cli/db/load.py

src/orion/core/worker/storage_backup.py

src/orion/serving/storage_resource.py

tests/functional/commands/conftest.py

bouthilx · 2023-02-22T15:49:58Z

The *.pickled files are PickledDB databases right? These should not be included in the repo and rather be generated automatically with the test fixtures.

tests/functional/commands/test_db_dump.py

tests/functional/commands/test_db_load.py

…use for both dump and load features.

…g task info. Encapsulate ImportTask to make sure lock is used when necessary.

…ot for python 3.7. - In python 3.8, we can clear previous logging handlers before setting new stream just by using new argument `force` in logging.basicConfig - In python 3.7, we must clear previous handlers manually before setting new stream.

…n tests/functional/commands/conftest

…rocess

- Now that storage_resource logging depends on root logging, we must set client logging in unit tests to be sure expected logging messages are printed in import sub-process, using caplog. - Using caplog, we can rewrite logging unit test using simulated client, instead of launching a real sub-process server.

… already existed and *no* overwrite specified.

…ckup`

… should have been deleted if an error occurred)

…est_dump_default`

… if no error occurred - Move function _gen_host_file() from orion/serving/storage_resource to orion/core/utils and rename to generate_temporary_file().

bouthilx · 2023-05-24T13:39:59Z

tests/functional/commands/test_db_dump.py

+    three_experiments_branch_same_name_trials_benchmarks, capsys, testing_helpers
+):
+    """Test dump with overwrite argument"""
+    with tempfile.TemporaryDirectory() as tmp_dir:


You can avoid this line by using the fixture tmp_path: https://docs.pytest.org/en/7.1.x/how-to/tmp_path.html

Done (here and for all comments below): 91a10d9

bouthilx · 2023-05-24T13:40:57Z

tests/functional/commands/test_db_dump.py

+    three_experiments_branch_same_name_trials_benchmarks, capsys, testing_helpers
+):
+    """Test dump to a specified output file"""
+    with tempfile.TemporaryDirectory() as tmp_dir:


bouthilx · 2023-05-24T13:41:05Z

tests/functional/commands/test_db_dump.py

+):
+    """Test how dumped file is cleaned if dump fails."""
+
+    with tempfile.TemporaryDirectory() as tmp_dir:


bouthilx · 2023-05-24T13:41:19Z

tests/functional/commands/test_db_dump.py

+    three_experiments_branch_same_name_trials_benchmarks, capsys
+):
+    """Test dump unknown experiment"""
+    with tempfile.TemporaryDirectory() as tmp_dir:


bouthilx · 2023-05-24T13:41:24Z

tests/functional/commands/test_db_dump.py

+    algo_state,
+):
+    """Test dump experiment test_single_exp"""
+    with tempfile.TemporaryDirectory() as tmp_dir:


… test db dump.

…pistimio#1082) * Add files for scripts `orion db dump` and `orion db load` * Write orion db dump * Write orion db load for parameter --exp * Simplify code to get an experiment from its name * Wrap destination database into a storage. Move dump logic into a function. * dump: - add and use CLI basic arguments - use both version and name to find experiment to dump * Rewrite load * Hardcode collection list in dump * Raise runtime error if resolve is 'bump' for benchmarks. * - Add tests for orion db dump - Raise DatabaseError when error occurs in the script - If found many experiments for a given name, pick experiment with lower version * - Add tests for orion db load - raise DatabaseError when errors occur in th script - If found many experiments for a given name, pick experiment with lower version * Reformat code * Move function dump_database() into new module orion.core.worker.storage_backup * Add entry /dump to Web API. * [web api] Add download suffix to dumped file * Use one module for both import/export web API endpoints. * [web api] Receive a POST request to import data * Add function load_database into module storage_backup and move import logic into this function. * Rename param `experiment` to `name` for function dump_database() to have same param names as load_database() * Check conflicts before making import. * [Web API] Make load entry launch a process to manage import Capture import progress messages printed by log Add entry import-status to get import progress messages and status * [web api] Allow to follow import progress using a callback in backend * Add documentation for web API. * Add documentation for command line. * Add tests for web API /dump, /load and /import-status * Fix tests. * Move main functions to top of module storage_backup * For dump from pickledb, use db object directly instead of locking database before (database is always locked before anything read/write/remove operation) * Use storage instead of database for export. - NB: With this new interface, dumping whole database is slower, because we must dump experiments one by one, making many more database calls. Add heartbeat field to LockedAlgorithmState Update BaseStorageProtocol interface: - Allow to set initial algo state in create_experiment() - Allow to set algo state in initialize_algorithm_lock(), and rename it to write_algorithm_lock() Update test_db_dump - WIth new dumping interface, only algos related to available experiments are dumped, dumped data does not contain algo related to unknown experiment anymore - Check dumped algo state for an experiment * Update doc in storage_resource Update test for storage_resource * For load (from pickledb file), use db object directly instead of locking database before (database is always locked before anything read/write/remove operation) * Add benchmarks to tested databased for dump/load. * Use storage instead of database to import. BaseStorageProtocol: - add new function delete_benchmark() and implement it in child class Legacy. * Check progress callback messages in test_db_load:test_load_overwrite() * Fix pylint * Tru to set logging level in running import task * Update docs/src/user/storage.rst Co-authored-by: Xavier Bouthillier <[email protected]> * Allow to not specify a resolve strategy. If not specified, an exception will be raised as soon as a conflict is detected. * Just logging storage instead of `storage._db` in command lines `dump` and `load`. * Storage export: add an option `--force` to explicitly overwrite dumped file if already exists. * Import/export: if no specified, get latest instead of oldest version for specific experiment * Use NamedTemporaryFile to generate temporary file in storage_resource. * Rewrite docstring for class ImportTask in Numpy style. * Fix a function name * Rename test fixture used to test storage export. * Add comment in test_dump_unknown_experiment() to explain why output file is created in any case. * Remove unused logger in module storage_resource. * Write generic checking functions for dump unit tests, that also verify expected number of children experiments or trials. Discovering a corner case: imported/exported experiments keeps old `refers` links. Currently remove refers links if only 1 specified experiment is dumped. * Update TODO * Regenerate experiment parent links in dst according to src when dumping, using correct dst IDs. * Regenerate experiment parent links and experiment-to-trial links in dst according to src when loading, using correct dst IDs. * Factorize tests in test_db_load and do not use prec-computed PKL files anymore (use pkl_* fixtures instead) * Refactorize code for test_storage_resource. * Use test_helpers for test_db_dump * Remove useless tests/functional/commands/__init__.py Add a test to check default storage content (we have more algos than expected) * Check new data are ignored or indeed written when importing with ignore or overwrite resolution. * Set and check trial links. * Add tests/functional/conftest * Regenerate links to root experiments in imported/exported experiments. * Add common function to write experiment in a destination storage, to use for both dump and load features. * Add a lock to ImportTask to prevent concurrent execution when updating task info. Encapsulate ImportTask to make sure lock is used when necessary. * Use Orion TreeNode to manage experiment and trial links. * Try to fix CI failing tests. Tests seems to pass for python 3.8 but not for python 3.7. - In python 3.8, we can clear previous logging handlers before setting new stream just by using new argument `force` in logging.basicConfig - In python 3.7, we must clear previous handlers manually before setting new stream. * Correctly update algorithm when setting deterministic experiment ID in tests/functional/commands/conftest * Remove irrelevant calls to `logger.setLevel` * Remove irrelevant calls to `logger.setLevel` that were added in this PR. * [web api] make sure to transfer main root log level into import sub-process * [test_storage_resource] - Now that storage_resource logging depends on root logging, we must set client logging in unit tests to be sure expected logging messages are printed in import sub-process, using caplog. - Using caplog, we can rewrite logging unit test using simulated client, instead of launching a real sub-process server. * Clean dumped file if an error occurs when dumping, **except** if file already existed and *no* overwrite specified. * Remove useless pylint in `orion.core.cli.db.load` * Move module `orion.core.worker.storage_backup` into `orion.storage.backup` * Remove fixed TODO * Remove fixed TODO in test_db_load * Clean dumped files only if no error occurs in storage_resource (files should have been deleted if an error occurred) * Test dump using a temporary output path for most unit tests except `test_dump_default` * - Work on temporary file when dumping and move it to output file only if no error occurred - Move function _gen_host_file() from orion/serving/storage_resource to orion/core/utils and rename to generate_temporary_file(). * Use pytest fixture `tmp_path` instead of manually-created temp dir to test db dump. --------- Co-authored-by: Xavier Bouthillier <[email protected]> Co-authored-by: Setepenre <[email protected]>

bouthilx reviewed Feb 15, 2023

View reviewed changes

docs/src/user/storage.rst Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/cli/db/load.py Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/cli/db/load.py Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/worker/storage_backup.py Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/worker/storage_backup.py Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/worker/storage_backup.py Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/worker/storage_backup.py Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/worker/storage_backup.py Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/worker/storage_backup.py Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/worker/storage_backup.py Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/worker/storage_backup.py Outdated Show resolved Hide resolved

bouthilx reviewed Feb 15, 2023

View reviewed changes

src/orion/core/worker/storage_backup.py Outdated Show resolved Hide resolved