Improve ExperimentData handling of jobs and analysis by chriseclectic · Pull Request #599 · qiskit-community/qiskit-experiments

chriseclectic · 2022-01-07T20:57:11Z

Summary

Depends on #596
Fixes #573, #592

This reworks how analysis callbacks are run to use separate Futures for each callback and improves handling of futures for jobs and callbacks in ExperimentData.

Details and comments

This also adds some convenience methods to ExperimentData:

jobs: returns a list of all jobs added to the experiment. This can be used to monitor specific jobs for example.
job_status returns the status of all job execution. This is similar to existing status, but will return DONE if all jobs are finished regardless of current callback status.
callback_status returns the status of callback execution. This will be QUEUED if callbacks are still waiting for jobs to finish running, RUNNING if at least one callback is still running, CANCELLED or ERROR if any callbacks were cancelled or had an error, or DONE if all callbacks have finished successfully.
cancel_callbacks cancels any queued callback futures that have not started running yet. Since callbacks run on a single-worker thread pool executor only the currently running callback future cannot be cancelled.
cancel cancel all jobs and callbacks.
callback_errors returns string of any errors encountered during callback execution

Other methods are updated:

status: Returns "EMPTY" instead of "DONE" if no jobs, data, or callbacks are contained in the experiment data.
errors has improved reporting for job and callback errors encountered.

nkanazawa1989

Halfway through the review. Feel free to start update. I'll review rest of changes tomorrow.

nkanazawa1989

Thanks Chris this PR looks awesome. The logic looks really clean and easy to read. Do you think you can write unittest for #573 before closing it (currently the test for status is too obvious since it just checks status that is manually overridden)? I think this test is bit tough since the event is hard to reproduce.

nkanazawa1989 · 2022-01-13T08:06:41Z

            LOG.warning("Experiment cannot be saved because backend is missing.")
            return

-        with self._job_futures.lock:


This is removed because now _wait_for_analysis method will wait for job?

nkanazawa1989 · 2022-01-13T08:07:35Z

@@ -862,8 +909,8 @@ def save(self) -> None:

        if self.verbose:


also analysis status should be done here (i.e. no error).

nkanazawa1989 · 2022-01-13T08:24:14Z


-        If the experiment consists of multiple jobs, the returned status is mapped
-        in the following order:
+        with self._analysis_futures.lock and self._job_futures.lock:


We don't need to shutdown executor when not_done_* is empty?

The executor doesn't need to be shutdown, it can sit around forever waiting to make more futures if needed.

Then we expect the assigned resources in the analysis executor of each experiment data instance will be eliminated by garbage collection (we can come back to this later when we start to get memory issue as the size of parallel experiment scales)?

nkanazawa1989

Minor comments for new commits. Previous updates are really nice.

nkanazawa1989 · 2022-01-15T01:03:14Z

-            )
+        # Add future for cancelling jobs that timeout
+        if timeout_ids:
+            self._job_executor.submit(self._timeout_running_jobs, timeout_ids, timeout)


Nice. I like this logic 💯

nkanazawa1989 · 2022-01-15T01:16:55Z

+
+        Args:
+            jobs: The Job or list of Jobs to add result data from.
+            timeout: Optional, time to wait for jobs to finish before


Please add time unit for the value.

nkanazawa1989 · 2022-01-15T01:34:57Z

+        if waited.not_done:
+            LOG.debug("Cancelling running jobs that exceeded add_jobs timeout.")
+            done_ids = {fut.result()[0] for fut in waited.done}
+            notdone_ids = [jid for jid in job_ids if jid not in done_ids]


just minor comment; why we cannot simply write self.cancel_jobs([fut.result[0] for fut in waited.not_done])? another comment; do you think this can be simplified with as_complete function?

Not done futures won't have a result, the job id is only returned when the future finishes. This is why it can be extracted only from the done futures and the reason for the above work around to figure out the not-done job ids.

nkanazawa1989 · 2022-01-15T01:42:57Z

+                self._add_job_data(job)
+            else:
+                # Add job results asynchronously
+                self._add_job_future(job)


We don't need to retrieve with timeout and call add_job here?

Changing to calling add_jobs would raise a bunch of warnings about existing job ids at the moment. It could be good to think about changing later so that backend+service can be extracted from job for expdata saved/loaded to JSON, since the backend/provider cant be serialized/deserialized, but that might be better to treat in the serialization PR.

nkanazawa1989 · 2022-01-15T01:53:56Z


-        If the experiment consists of multiple jobs, the returned status is mapped
-        in the following order:
+        with self._analysis_futures.lock and self._job_futures.lock:


Then we expect the assigned resources in the analysis executor of each experiment data instance will be eliminated by garbage collection (we can come back to this later when we start to get memory issue as the size of parallel experiment scales)?

nkanazawa1989 · 2022-01-15T01:59:56Z

+    pending analysis callbacks. Note that analysis callbacks that have already
+    started running cannot be cancelled.
+  - |
+    Adds :meth:`.ExperimentData.cancel` to cancel both jobs and analysis.


also need description for timeout arg in run method and and add_job method.

nkanazawa1989 · 2022-01-15T02:00:39Z

+    The ``timeout`` kwarg of :meth:`.ExperimentData.add_data` has been deprecated.
+    To timeout waiting for job results use the ``timeout`` kwarg with
+    :meth:`.ExperimentData.block_for_results` or
+    :meth:`.ExperimentData.analysis_results` instead.


Also need deprecation of adding job to add_data method.

* Add `add_jobs` method to ExperimentData * Add timeout functionality to add_jobs and Experiment.run that allows an experiment to set a timeout before cancelling all non-finished jobs * Adds status enum as ExperimentData class variable for easier access for value comparisons

* This PR improves ExperimentData handling of qiskit job data and analysis callbacks to make querying job and analysis statuses and cancelling jobs or analysis processes more robust. * Changes `ExperimentData.status` to return `ExperimentStatus enum value * Adds `ExperimentData.job_status` method and `JobStatus` enum return type * Adds` ExperimentData.analysis_status` method and `AnalysisStatus enum return type. * Adds `ExperimentData.cancel_jobs`, `ExperimentData.cancel_analysis` and `ExperimentData.cancel` methods for cancelling running jobs, queued analysis, or both. * Adds `ExperimentData.add_jobs` method for adding job data. Deprecates adding job data via `add_data` method.

nkanazawa1989

LGTM, This PR allows us to handle callback and status more safely. Thanks Chris!

…y#599) This PR improves ExperimentData handling of qiskit job data and analysis callbacks to make querying job and analysis statuses and cancelling jobs or analysis processes more robust. It also includes some API changes to ExperimentData: * Changes `ExperimentData.status` to return `ExperimentStatus enum value * Adds `ExperimentData.job_status` method and `JobStatus` enum return type * Adds` ExperimentData.analysis_status` method and `AnalysisStatus enum return type. * Adds `ExperimentData.cancel_jobs`, `ExperimentData.cancel_analysis` and `ExperimentData.cancel` methods for cancelling running jobs, queued analysis, or both. * Adds `ExperimentData.add_jobs` method for adding job data. Deprecates adding job data via `add_data` method.

chriseclectic added the Changelog: New Feature Include in the "Added" section of the changelog label Jan 7, 2022

chriseclectic changed the title ~~Improve ExperimentData handling of job futures~~ Improve ExperimentData handling of callbacks and futures Jan 10, 2022

chriseclectic requested a review from nkanazawa1989 January 10, 2022 16:24

chriseclectic added the Changelog: Bugfix Include in the "Fixed" section of the changelog label Jan 10, 2022

This was referenced Jan 10, 2022

Improve handling of analysis callbacks #598

Closed

Make ExperimentData serializable #604

Merged

nkanazawa1989 suggested changes Jan 12, 2022

View reviewed changes

nkanazawa1989 reviewed Jan 13, 2022

View reviewed changes

chriseclectic force-pushed the job-futures branch from e6a8f66 to 1f1510f Compare January 13, 2022 19:53

chriseclectic changed the title ~~Improve ExperimentData handling of callbacks and futures~~ Improve ExperimentData handling of jobs and analysis Jan 14, 2022

nkanazawa1989 reviewed Jan 15, 2022

View reviewed changes

chriseclectic added 6 commits January 19, 2022 13:41

Improve analysis callbacks

b03f9a4

Improve handling of job data futures

61adba2

Improve futures logging and waiting

414c91e

Rename callback methods to use analysis naming

4873374

Add ExperimentStatus and AnalysisStatus enums

cbe7bcb

chriseclectic force-pushed the job-futures branch from 1f1510f to 1563dd2 Compare January 19, 2022 19:41

chriseclectic force-pushed the job-futures branch from 1563dd2 to 06d96ab Compare January 19, 2022 19:47

chriseclectic added Changelog: API Change Include in the "Changed" section of the changelog Changelog: Deprecation Include in "Deprecated" section of changelog and removed Changelog: Bugfix Include in the "Fixed" section of the changelog labels Jan 19, 2022

nkanazawa1989 approved these changes Jan 20, 2022

View reviewed changes

chriseclectic merged commit f229944 into qiskit-community:main Jan 20, 2022

nkanazawa1989 mentioned this pull request Jan 21, 2022

Update unittest with run success check #568

Merged

chriseclectic mentioned this pull request Feb 8, 2022

Automatic experiment monitor #521

Open

chriseclectic deleted the job-futures branch March 3, 2022 22:44

yaelbh mentioned this pull request Aug 4, 2022

Analysis runs in spite of job failure #866

Closed

Conversation

chriseclectic commented Jan 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details and comments

Uh oh!

nkanazawa1989 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanazawa1989 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanazawa1989 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nkanazawa1989 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chriseclectic commented Jan 7, 2022 •

edited

Loading