-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add storage import and export features to command line and web API #1082
Add storage import and export features to command line and web API #1082
Conversation
The |
…use for both dump and load features.
…g task info. Encapsulate ImportTask to make sure lock is used when necessary.
…ot for python 3.7. - In python 3.8, we can clear previous logging handlers before setting new stream just by using new argument `force` in logging.basicConfig - In python 3.7, we must clear previous handlers manually before setting new stream.
…n tests/functional/commands/conftest
- Now that storage_resource logging depends on root logging, we must set client logging in unit tests to be sure expected logging messages are printed in import sub-process, using caplog. - Using caplog, we can rewrite logging unit test using simulated client, instead of launching a real sub-process server.
… already existed and *no* overwrite specified.
… should have been deleted if an error occurred)
…est_dump_default`
a8406e7
to
e6b23fc
Compare
… if no error occurred - Move function _gen_host_file() from orion/serving/storage_resource to orion/core/utils and rename to generate_temporary_file().
three_experiments_branch_same_name_trials_benchmarks, capsys, testing_helpers | ||
): | ||
"""Test dump with overwrite argument""" | ||
with tempfile.TemporaryDirectory() as tmp_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid this line by using the fixture tmp_path
: https://docs.pytest.org/en/7.1.x/how-to/tmp_path.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (here and for all comments below): 91a10d9
three_experiments_branch_same_name_trials_benchmarks, capsys, testing_helpers | ||
): | ||
"""Test dump to a specified output file""" | ||
with tempfile.TemporaryDirectory() as tmp_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
): | ||
"""Test how dumped file is cleaned if dump fails.""" | ||
|
||
with tempfile.TemporaryDirectory() as tmp_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
three_experiments_branch_same_name_trials_benchmarks, capsys | ||
): | ||
"""Test dump unknown experiment""" | ||
with tempfile.TemporaryDirectory() as tmp_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
algo_state, | ||
): | ||
"""Test dump experiment test_single_exp""" | ||
with tempfile.TemporaryDirectory() as tmp_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
…pistimio#1082) * Add files for scripts `orion db dump` and `orion db load` * Write orion db dump * Write orion db load for parameter --exp * Simplify code to get an experiment from its name * Wrap destination database into a storage. Move dump logic into a function. * dump: - add and use CLI basic arguments - use both version and name to find experiment to dump * Rewrite load * Hardcode collection list in dump * Raise runtime error if resolve is 'bump' for benchmarks. * - Add tests for orion db dump - Raise DatabaseError when error occurs in the script - If found many experiments for a given name, pick experiment with lower version * - Add tests for orion db load - raise DatabaseError when errors occur in th script - If found many experiments for a given name, pick experiment with lower version * Reformat code * Move function dump_database() into new module orion.core.worker.storage_backup * Add entry /dump to Web API. * [web api] Add download suffix to dumped file * Use one module for both import/export web API endpoints. * [web api] Receive a POST request to import data * Add function load_database into module storage_backup and move import logic into this function. * Rename param `experiment` to `name` for function dump_database() to have same param names as load_database() * Check conflicts before making import. * [Web API] Make load entry launch a process to manage import Capture import progress messages printed by log Add entry import-status to get import progress messages and status * [web api] Allow to follow import progress using a callback in backend * Add documentation for web API. * Add documentation for command line. * Add tests for web API /dump, /load and /import-status * Fix tests. * Move main functions to top of module storage_backup * For dump from pickledb, use db object directly instead of locking database before (database is always locked before anything read/write/remove operation) * Use storage instead of database for export. - NB: With this new interface, dumping whole database is slower, because we must dump experiments one by one, making many more database calls. Add heartbeat field to LockedAlgorithmState Update BaseStorageProtocol interface: - Allow to set initial algo state in create_experiment() - Allow to set algo state in initialize_algorithm_lock(), and rename it to write_algorithm_lock() Update test_db_dump - WIth new dumping interface, only algos related to available experiments are dumped, dumped data does not contain algo related to unknown experiment anymore - Check dumped algo state for an experiment * Update doc in storage_resource Update test for storage_resource * For load (from pickledb file), use db object directly instead of locking database before (database is always locked before anything read/write/remove operation) * Add benchmarks to tested databased for dump/load. * Use storage instead of database to import. BaseStorageProtocol: - add new function delete_benchmark() and implement it in child class Legacy. * Check progress callback messages in test_db_load:test_load_overwrite() * Fix pylint * Tru to set logging level in running import task * Update docs/src/user/storage.rst Co-authored-by: Xavier Bouthillier <[email protected]> * Allow to not specify a resolve strategy. If not specified, an exception will be raised as soon as a conflict is detected. * Just logging storage instead of `storage._db` in command lines `dump` and `load`. * Storage export: add an option `--force` to explicitly overwrite dumped file if already exists. * Import/export: if no specified, get latest instead of oldest version for specific experiment * Use NamedTemporaryFile to generate temporary file in storage_resource. * Rewrite docstring for class ImportTask in Numpy style. * Fix a function name * Rename test fixture used to test storage export. * Add comment in test_dump_unknown_experiment() to explain why output file is created in any case. * Remove unused logger in module storage_resource. * Write generic checking functions for dump unit tests, that also verify expected number of children experiments or trials. Discovering a corner case: imported/exported experiments keeps old `refers` links. Currently remove refers links if only 1 specified experiment is dumped. * Update TODO * Regenerate experiment parent links in dst according to src when dumping, using correct dst IDs. * Regenerate experiment parent links and experiment-to-trial links in dst according to src when loading, using correct dst IDs. * Factorize tests in test_db_load and do not use prec-computed PKL files anymore (use pkl_* fixtures instead) * Refactorize code for test_storage_resource. * Use test_helpers for test_db_dump * Remove useless tests/functional/commands/__init__.py Add a test to check default storage content (we have more algos than expected) * Check new data are ignored or indeed written when importing with ignore or overwrite resolution. * Set and check trial links. * Add tests/functional/conftest * Regenerate links to root experiments in imported/exported experiments. * Add common function to write experiment in a destination storage, to use for both dump and load features. * Add a lock to ImportTask to prevent concurrent execution when updating task info. Encapsulate ImportTask to make sure lock is used when necessary. * Use Orion TreeNode to manage experiment and trial links. * Try to fix CI failing tests. Tests seems to pass for python 3.8 but not for python 3.7. - In python 3.8, we can clear previous logging handlers before setting new stream just by using new argument `force` in logging.basicConfig - In python 3.7, we must clear previous handlers manually before setting new stream. * Correctly update algorithm when setting deterministic experiment ID in tests/functional/commands/conftest * Remove irrelevant calls to `logger.setLevel` * Remove irrelevant calls to `logger.setLevel` that were added in this PR. * [web api] make sure to transfer main root log level into import sub-process * [test_storage_resource] - Now that storage_resource logging depends on root logging, we must set client logging in unit tests to be sure expected logging messages are printed in import sub-process, using caplog. - Using caplog, we can rewrite logging unit test using simulated client, instead of launching a real sub-process server. * Clean dumped file if an error occurs when dumping, **except** if file already existed and *no* overwrite specified. * Remove useless pylint in `orion.core.cli.db.load` * Move module `orion.core.worker.storage_backup` into `orion.storage.backup` * Remove fixed TODO * Remove fixed TODO in test_db_load * Clean dumped files only if no error occurs in storage_resource (files should have been deleted if an error occurred) * Test dump using a temporary output path for most unit tests except `test_dump_default` * - Work on temporary file when dumping and move it to output file only if no error occurred - Move function _gen_host_file() from orion/serving/storage_resource to orion/core/utils and rename to generate_temporary_file(). * Use pytest fixture `tmp_path` instead of manually-created temp dir to test db dump. --------- Co-authored-by: Xavier Bouthillier <[email protected]> Co-authored-by: Setepenre <[email protected]>
…pistimio#1082) * Add files for scripts `orion db dump` and `orion db load` * Write orion db dump * Write orion db load for parameter --exp * Simplify code to get an experiment from its name * Wrap destination database into a storage. Move dump logic into a function. * dump: - add and use CLI basic arguments - use both version and name to find experiment to dump * Rewrite load * Hardcode collection list in dump * Raise runtime error if resolve is 'bump' for benchmarks. * - Add tests for orion db dump - Raise DatabaseError when error occurs in the script - If found many experiments for a given name, pick experiment with lower version * - Add tests for orion db load - raise DatabaseError when errors occur in th script - If found many experiments for a given name, pick experiment with lower version * Reformat code * Move function dump_database() into new module orion.core.worker.storage_backup * Add entry /dump to Web API. * [web api] Add download suffix to dumped file * Use one module for both import/export web API endpoints. * [web api] Receive a POST request to import data * Add function load_database into module storage_backup and move import logic into this function. * Rename param `experiment` to `name` for function dump_database() to have same param names as load_database() * Check conflicts before making import. * [Web API] Make load entry launch a process to manage import Capture import progress messages printed by log Add entry import-status to get import progress messages and status * [web api] Allow to follow import progress using a callback in backend * Add documentation for web API. * Add documentation for command line. * Add tests for web API /dump, /load and /import-status * Fix tests. * Move main functions to top of module storage_backup * For dump from pickledb, use db object directly instead of locking database before (database is always locked before anything read/write/remove operation) * Use storage instead of database for export. - NB: With this new interface, dumping whole database is slower, because we must dump experiments one by one, making many more database calls. Add heartbeat field to LockedAlgorithmState Update BaseStorageProtocol interface: - Allow to set initial algo state in create_experiment() - Allow to set algo state in initialize_algorithm_lock(), and rename it to write_algorithm_lock() Update test_db_dump - WIth new dumping interface, only algos related to available experiments are dumped, dumped data does not contain algo related to unknown experiment anymore - Check dumped algo state for an experiment * Update doc in storage_resource Update test for storage_resource * For load (from pickledb file), use db object directly instead of locking database before (database is always locked before anything read/write/remove operation) * Add benchmarks to tested databased for dump/load. * Use storage instead of database to import. BaseStorageProtocol: - add new function delete_benchmark() and implement it in child class Legacy. * Check progress callback messages in test_db_load:test_load_overwrite() * Fix pylint * Tru to set logging level in running import task * Update docs/src/user/storage.rst Co-authored-by: Xavier Bouthillier <[email protected]> * Allow to not specify a resolve strategy. If not specified, an exception will be raised as soon as a conflict is detected. * Just logging storage instead of `storage._db` in command lines `dump` and `load`. * Storage export: add an option `--force` to explicitly overwrite dumped file if already exists. * Import/export: if no specified, get latest instead of oldest version for specific experiment * Use NamedTemporaryFile to generate temporary file in storage_resource. * Rewrite docstring for class ImportTask in Numpy style. * Fix a function name * Rename test fixture used to test storage export. * Add comment in test_dump_unknown_experiment() to explain why output file is created in any case. * Remove unused logger in module storage_resource. * Write generic checking functions for dump unit tests, that also verify expected number of children experiments or trials. Discovering a corner case: imported/exported experiments keeps old `refers` links. Currently remove refers links if only 1 specified experiment is dumped. * Update TODO * Regenerate experiment parent links in dst according to src when dumping, using correct dst IDs. * Regenerate experiment parent links and experiment-to-trial links in dst according to src when loading, using correct dst IDs. * Factorize tests in test_db_load and do not use prec-computed PKL files anymore (use pkl_* fixtures instead) * Refactorize code for test_storage_resource. * Use test_helpers for test_db_dump * Remove useless tests/functional/commands/__init__.py Add a test to check default storage content (we have more algos than expected) * Check new data are ignored or indeed written when importing with ignore or overwrite resolution. * Set and check trial links. * Add tests/functional/conftest * Regenerate links to root experiments in imported/exported experiments. * Add common function to write experiment in a destination storage, to use for both dump and load features. * Add a lock to ImportTask to prevent concurrent execution when updating task info. Encapsulate ImportTask to make sure lock is used when necessary. * Use Orion TreeNode to manage experiment and trial links. * Try to fix CI failing tests. Tests seems to pass for python 3.8 but not for python 3.7. - In python 3.8, we can clear previous logging handlers before setting new stream just by using new argument `force` in logging.basicConfig - In python 3.7, we must clear previous handlers manually before setting new stream. * Correctly update algorithm when setting deterministic experiment ID in tests/functional/commands/conftest * Remove irrelevant calls to `logger.setLevel` * Remove irrelevant calls to `logger.setLevel` that were added in this PR. * [web api] make sure to transfer main root log level into import sub-process * [test_storage_resource] - Now that storage_resource logging depends on root logging, we must set client logging in unit tests to be sure expected logging messages are printed in import sub-process, using caplog. - Using caplog, we can rewrite logging unit test using simulated client, instead of launching a real sub-process server. * Clean dumped file if an error occurs when dumping, **except** if file already existed and *no* overwrite specified. * Remove useless pylint in `orion.core.cli.db.load` * Move module `orion.core.worker.storage_backup` into `orion.storage.backup` * Remove fixed TODO * Remove fixed TODO in test_db_load * Clean dumped files only if no error occurs in storage_resource (files should have been deleted if an error occurred) * Test dump using a temporary output path for most unit tests except `test_dump_default` * - Work on temporary file when dumping and move it to output file only if no error occurred - Move function _gen_host_file() from orion/serving/storage_resource to orion/core/utils and rename to generate_temporary_file(). * Use pytest fixture `tmp_path` instead of manually-created temp dir to test db dump. --------- Co-authored-by: Xavier Bouthillier <[email protected]> Co-authored-by: Setepenre <[email protected]>
Description
HI @bouthilx ! This is a pull request to add export (dump) and import (load) features to Oríon, currently available via new command lines and new web API entries. New features use storage objects instead of underlying databases.
This should fix #953
NB: important remarks:
BaseStorageProtocol
Changes
storage_backup
that provides 2 functions,dump_database
for export, andload_database
for import.orion db dump
andorion db load
/dump
for export,/load
for import, and/import-status
to monitor import.Checklist
Tests
$ tox -e py38
; replace38
by your Python version if necessary)Documentation
Quality
$ tox -e lint
)