Improve ExperimentData handling of jobs and analysis#599
Conversation
nkanazawa1989
left a comment
There was a problem hiding this comment.
Halfway through the review. Feel free to start update. I'll review rest of changes tomorrow.
nkanazawa1989
left a comment
There was a problem hiding this comment.
Thanks Chris this PR looks awesome. The logic looks really clean and easy to read. Do you think you can write unittest for #573 before closing it (currently the test for status is too obvious since it just checks status that is manually overridden)? I think this test is bit tough since the event is hard to reproduce.
| LOG.warning("Experiment cannot be saved because backend is missing.") | ||
| return | ||
|
|
||
| with self._job_futures.lock: |
There was a problem hiding this comment.
This is removed because now _wait_for_analysis method will wait for job?
| @@ -862,8 +909,8 @@ def save(self) -> None: | |||
|
|
|||
| if self.verbose: | |||
There was a problem hiding this comment.
also analysis status should be done here (i.e. no error).
|
|
||
| If the experiment consists of multiple jobs, the returned status is mapped | ||
| in the following order: | ||
| with self._analysis_futures.lock and self._job_futures.lock: |
There was a problem hiding this comment.
We don't need to shutdown executor when not_done_* is empty?
There was a problem hiding this comment.
The executor doesn't need to be shutdown, it can sit around forever waiting to make more futures if needed.
There was a problem hiding this comment.
Then we expect the assigned resources in the analysis executor of each experiment data instance will be eliminated by garbage collection (we can come back to this later when we start to get memory issue as the size of parallel experiment scales)?
e6a8f66 to
1f1510f
Compare
nkanazawa1989
left a comment
There was a problem hiding this comment.
Minor comments for new commits. Previous updates are really nice.
| ) | ||
| # Add future for cancelling jobs that timeout | ||
| if timeout_ids: | ||
| self._job_executor.submit(self._timeout_running_jobs, timeout_ids, timeout) |
There was a problem hiding this comment.
Nice. I like this logic 💯
|
|
||
| Args: | ||
| jobs: The Job or list of Jobs to add result data from. | ||
| timeout: Optional, time to wait for jobs to finish before |
There was a problem hiding this comment.
Please add time unit for the value.
| if waited.not_done: | ||
| LOG.debug("Cancelling running jobs that exceeded add_jobs timeout.") | ||
| done_ids = {fut.result()[0] for fut in waited.done} | ||
| notdone_ids = [jid for jid in job_ids if jid not in done_ids] |
There was a problem hiding this comment.
just minor comment; why we cannot simply write self.cancel_jobs([fut.result[0] for fut in waited.not_done])? another comment; do you think this can be simplified with as_complete function?
There was a problem hiding this comment.
Not done futures won't have a result, the job id is only returned when the future finishes. This is why it can be extracted only from the done futures and the reason for the above work around to figure out the not-done job ids.
| self._add_job_data(job) | ||
| else: | ||
| # Add job results asynchronously | ||
| self._add_job_future(job) |
There was a problem hiding this comment.
We don't need to retrieve with timeout and call add_job here?
There was a problem hiding this comment.
Changing to calling add_jobs would raise a bunch of warnings about existing job ids at the moment. It could be good to think about changing later so that backend+service can be extracted from job for expdata saved/loaded to JSON, since the backend/provider cant be serialized/deserialized, but that might be better to treat in the serialization PR.
|
|
||
| If the experiment consists of multiple jobs, the returned status is mapped | ||
| in the following order: | ||
| with self._analysis_futures.lock and self._job_futures.lock: |
There was a problem hiding this comment.
Then we expect the assigned resources in the analysis executor of each experiment data instance will be eliminated by garbage collection (we can come back to this later when we start to get memory issue as the size of parallel experiment scales)?
| pending analysis callbacks. Note that analysis callbacks that have already | ||
| started running cannot be cancelled. | ||
| - | | ||
| Adds :meth:`.ExperimentData.cancel` to cancel both jobs and analysis. |
There was a problem hiding this comment.
also need description for timeout arg in run method and and add_job method.
| The ``timeout`` kwarg of :meth:`.ExperimentData.add_data` has been deprecated. | ||
| To timeout waiting for job results use the ``timeout`` kwarg with | ||
| :meth:`.ExperimentData.block_for_results` or | ||
| :meth:`.ExperimentData.analysis_results` instead. |
There was a problem hiding this comment.
Also need deprecation of adding job to add_data method.
* Add `add_jobs` method to ExperimentData * Add timeout functionality to add_jobs and Experiment.run that allows an experiment to set a timeout before cancelling all non-finished jobs * Adds status enum as ExperimentData class variable for easier access for value comparisons
1f1510f to
1563dd2
Compare
* This PR improves ExperimentData handling of qiskit job data and analysis callbacks to make querying job and analysis statuses and cancelling jobs or analysis processes more robust. * Changes `ExperimentData.status` to return `ExperimentStatus enum value * Adds `ExperimentData.job_status` method and `JobStatus` enum return type * Adds` ExperimentData.analysis_status` method and `AnalysisStatus enum return type. * Adds `ExperimentData.cancel_jobs`, `ExperimentData.cancel_analysis` and `ExperimentData.cancel` methods for cancelling running jobs, queued analysis, or both. * Adds `ExperimentData.add_jobs` method for adding job data. Deprecates adding job data via `add_data` method.
1563dd2 to
06d96ab
Compare
nkanazawa1989
left a comment
There was a problem hiding this comment.
LGTM, This PR allows us to handle callback and status more safely. Thanks Chris!
…y#599) This PR improves ExperimentData handling of qiskit job data and analysis callbacks to make querying job and analysis statuses and cancelling jobs or analysis processes more robust. It also includes some API changes to ExperimentData: * Changes `ExperimentData.status` to return `ExperimentStatus enum value * Adds `ExperimentData.job_status` method and `JobStatus` enum return type * Adds` ExperimentData.analysis_status` method and `AnalysisStatus enum return type. * Adds `ExperimentData.cancel_jobs`, `ExperimentData.cancel_analysis` and `ExperimentData.cancel` methods for cancelling running jobs, queued analysis, or both. * Adds `ExperimentData.add_jobs` method for adding job data. Deprecates adding job data via `add_data` method.
Summary
Depends on #596
Fixes #573, #592
This reworks how analysis callbacks are run to use separate Futures for each callback and improves handling of futures for jobs and callbacks in ExperimentData.
Details and comments
This also adds some convenience methods to
ExperimentData:jobs: returns a list of all jobs added to the experiment. This can be used to monitor specific jobs for example.job_statusreturns the status of all job execution. This is similar to existing status, but will return DONE if all jobs are finished regardless of current callback status.callback_statusreturns the status of callback execution. This will be QUEUED if callbacks are still waiting for jobs to finish running, RUNNING if at least one callback is still running, CANCELLED or ERROR if any callbacks were cancelled or had an error, or DONE if all callbacks have finished successfully.cancel_callbackscancels any queued callback futures that have not started running yet. Since callbacks run on a single-worker thread pool executor only the currently running callback future cannot be cancelled.cancelcancel all jobs and callbacks.callback_errorsreturns string of any errors encountered during callback executionOther methods are updated:
status: Returns "EMPTY" instead of "DONE" if no jobs, data, or callbacks are contained in the experiment data.errorshas improved reporting for job and callback errors encountered.