Experiment data and analysis result classes by jyu00 · Pull Request #113 · qiskit-community/qiskit-experiments

jyu00 · 2021-06-17T00:35:17Z

Summary

This is a replacement of Qiskit/qiskit#5499. Most of the code remains the same, but I renamed ExperimentData to StoredData since qiskit-experiments has its own ExperimentData class. The attributes in StoredData continue to reference "experiment" (e.g. experiment_type), since the DB service is still called experiment service.

Details and comments

yaelbh · 2021-06-17T14:17:57Z

What happened to Quality ?

yaelbh · 2021-06-17T14:19:52Z

In stored_data.utils.py we have:

def qiskit_version():
    """Return the Qiskit version."""
    try:
        return pkg_resources.get_distribution("qiskit").version
    except Exception:  # pylint: disable=broad-except
        return {"qiskit-terra": terra_version}

There are tests in test_stored_data.py that look for qiskit-terra; these tests fail in environments where the exception doesn't occur.

jyu00 · 2021-06-17T20:29:58Z

What happened to Quality ?

I moved it to qiskit-ibmq-provider because those values are ibmq specific. Although a lot of things here are also ibmq specific, so I can duplicate the code here if you'd like.

jyu00 · 2021-06-17T20:30:33Z

There are tests in test_stored_data.py that look for qiskit-terra; these tests fail in environments where the exception doesn't occur.

Good catch! It's fixed in e3a2d31

nkanazawa1989

The future-based execution management is really cool. I think this will improve the usability of experiment.

Sorry about my review is mainly questions. I'm still not clear the relationship (or purpose) among analysis result, experiment (stored) data, and service. All of them seem to have quite similar interface. And stored data sounds bit strange to me because it has function to store the data in the database.

Depending on our standpoint (from remote db vs from local package) the method name would be different. For example, ExperimentServiceV1.create_experiment seems to create an entry in the database thus experiment should be completed. However if we call this from local software it sounds like we are going to start new experiment. Perhaps those can be renamed more specifically, such as create_experiment_entry.

nkanazawa1989 · 2021-06-24T09:54:04Z

+
+    def __init__(
+        self,
+        result_data: Dict,


Do you plan to define a template for this in another PR?

The database accepts a free-form dictionary. However, I have noticed there are common keywords used in qiskit-experiments (like success), so maybe it's better to define a template specific to qiskit-experiments?

I guess so. A dict subclass is also good for type annotation.

nkanazawa1989 · 2021-06-24T09:55:00Z

+
+        Args:
+            result_data: Analysis result data.
+            result_type: Analysis result type.


Is this really a result type rather than type of analysis?

The name also came from results db, which only has the notion of "analysis result" and not "analysis". Currently all outputs of an Analysis class have the same type, so we can say it's the analysis type. Although last time we talked about "minimal analysis result" - not sure how that changes things.

I see. So this is not the field we can set arbitrary value?

No, result_data is the one that allows arbitrary value

nkanazawa1989 · 2021-06-24T16:54:10Z

+        if result.job_id not in self._jobs:
+            self._jobs[result.job_id] = None
+        for i in range(len(result.results)):
+            data = result.data(i)


I don't think raw data is necessary if we store memory data. @eggerdj

My feel here is that we should not be throwing away any information when adding to the StoredData from the Result. If any changes to the data format are made I would assume that the user could at least control them.

nkanazawa1989 · 2021-06-24T16:56:38Z

+                else:
+                    fig_name = (
+                        f"figure_{self.experiment_id[:8]}_"
+                        f"{datetime.now().isoformat()}_{len(self._figures)}"


I don't think we can retrieve the figure from the service with this name.

I added the timestamp to a) keep the name unique and b) I thought it'd be nice to know when the figure was created. If you're worry about not being able to remember the name, figure_names gives back all the names.

Sounds like we need to retrieve all figures anyways because we cannot guess what is drawn from the timestamp. Can you add some mechanism to pass image tag as I wrote in another comment?

Analysis can give the figure a more meaningful name instead of using the default timestamp, if the concern is about finding the right figure. As I mentioned in my other comment, the db doesn't support tags for figures right now. Another alternative is to implement tags support locally in this class. For example, since metadata allows arbitrary value, I can keep track of the tags assigned to each figure there. That would be a separate PR though since it's a bit of work.

Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>

jyu00 · 2021-06-28T22:25:22Z

Thanks @nkanazawa1989 for reviewing this! After reading your comments, I decided to make some name changes and hope that'd clear things up a bit.

I renamed the subpackage to database_service, which contains DatabaseService, DbExperimentData, and DbAnalysisResult. Hopefully that emphasize the fact that these classes defines both the format and the interface for data to be stored in a database. This also allows qiskit-experiments to have its own ExperimentData and AnalysisResult.

Let me know if the new names are even more confusing 😅

For example, ExperimentServiceV1.create_experiment seems to create an entry in the database thus experiment should be completed. ... Perhaps those can be renamed more specifically, such as create_experiment_entry.

I can make that change, but it requires changes on the qiskit-ibmq-provider side as well, so I'll do that in the next few days.

nkanazawa1989 · 2021-06-30T04:49:50Z

Thanks Jessie! I replied to some comments but almost all issues I found are addressed. I do like new module names! :)

Just one more point: Can you describe the relationship between service, analysis result, and experiment data in the module docstring? This would help third party providers to use this framework. Especially, relationship between experimental data and analysis is very IBM-ish (sometime we update a part of analysis result, e.g. quality, without retrieving full experiment -- this is expected use case but based on our experience).

eliarbel · 2021-06-30T08:38:44Z

+            "quality": "Good",
+            "verified": False,
+        }
+        result = DbAnalysisResult(result_data={"foo": "bar"}, tags=["tag1", "tag2"], **attrs)


Didn't you mean to use DbAnalaysisResultV1 here instead ?

DbAnalysisResultV1 is imported as DbAnalysisResult. The idea is when we have V2 we only need to change the import statement and not all the reference of V1 to V2 (assuming interface remains compatible).

eliarbel

Overall looks good to me, especially for the stage we at now, needing it to be merged and serve as a basis for PR #115 and then testing (which I think should and would be extensive). I'm still missing to see how ExperimentService connects with IBMExperimentService in provider, but that's probably a hole in my understanding of the bigger picture.

eliarbel · 2021-06-30T08:45:28Z

+
+        return result_key
+
+    def analysis_result(


should probably named analysis_results as it may return multiple analysis results

Fixed in e916c5f.

eliarbel · 2021-06-30T08:56:55Z

+        figure_names: Optional[Union[List[str], str]] = None,
+        overwrite: bool = False,
+        save_figure: Optional[bool] = None,
+        service: Optional["DatabaseServiceV1"] = None,


@jyu00 can you elaborate on the use-case that justifies having a service parameter here and in the other add_* and save methods ? What would be the reason not to use the default experiment service ? If using a different experiment service is needed, wouldn't the service setter suffice ?

This came from when we planned on having a local database. The idea was that one may want to save the experiment in the cloud DB, except when there's a large amount of data, or temporary data, which they only want to save locally. For example, I create an experiment with auto-save turned on (in case I forget to save). Then I run different kinds of analysis and save the results in the local db. In the end I compare all these results and save the "good" one in the cloud (e.g. to share with others).

Since we don't have a local database, and it's unlikely that anyone would interact with the ExperimentData interface directly, we can also remove it to keep the interface simple.

Thanks for the info. Yes, I think we should clean this param until it'll be needed (it's in multiple methods)

jyu00 · 2021-06-30T20:58:40Z

Just one more point: Can you describe the relationship between service, analysis result, and experiment data in the module docstring?

Added in e916c5f.

mriedem

Looks good, I left several nits and questions inline. There are plenty of things that could be done in a follow up for additional functionality, but maybe some of those things are dependent on the actual IBM Quantum service implementation since this just seems to be the base class framework?

The only thing that I'd probably -1 on is the global vs public typo but that could also be fixed in a small follow up since I'm sure you're eager to get this moving.

mriedem · 2021-07-01T18:32:23Z

+:class:`DbExperimentDataV1` is the main class that defines the structure of
+experiment data, which consists of the following::
+
+    * Results from circuit execution, which is called ``data`` in this class.


The formatting here in the rendered docs looks a bit weird to me, too far indented maybe?

I got the docs from the build output artifact.

Fixed in 2e01e94

mriedem · 2021-07-01T18:33:39Z

+example, to retrieve a saved analysis result.
+
+Currently only IBM Quantum provides this database service, but we plan on having
+a local database service in the near future.


Suggested change

a local database service in the near future.

a local database service in the future.

nit but I'd avoid saying "near" in these docs since time and priorities can be funny when docs are originally written, "near" turns into like a year from now. 😄

Ha, I specifically said "near future" to make us look better lol. But I guess we should avoid any commitment in an open source project. Updated in 2e01e94.

mriedem · 2021-07-01T18:36:08Z

+
+    An experiment database service allows you to store experiment data and metadata
+    in a database. An experiment can have one or more jobs, analysis results,
+    and figures.


There is now also a generic "files" (aka artifacts) interface for uploading files associated with an experiment, e.g. configuration files, input parameter files and other post-experiment metadata collection. Support for that in Qiskit can always be added later though.

It's not implemented in qiskit-ibmq-provider. When I asked, the response suggested it was for internal use only?

It's not implemented in qiskit-ibmq-provider.

It's relatively new and I wasn't sure if qiskit-ibmq-provider is frozen for the experiment service while this repo is being developed.

When I asked, the response suggested it was for internal use only?

Blake has use cases for it being used in Qiskit publicly as well but I don't know all the details.

Sounds like I'll need to follow up with Blake :)

mriedem · 2021-07-01T18:38:26Z

+    ) -> str:
+        """Create a new experiment in the database.
+
+        Args:


There is also the visibility parameter which will default to "private" for Qiskit experiments but options include: private, project, group, hub, public.

Those control who, besides the owner of the experiment (whoever created it), can view/list the experiment.

It's called share level in the IQX UI and is a similar concept to jobs but not exactly the same implementation.

I'm trying to avoid IBMQ-specific fields, especially those that cannot be implemented by a local DB. It is implemented on the ibmq-provider side.

It is implemented on the ibmq-provider side.

So does qiskit-experiments (this PR I guess) define the interface for working with experiments in Qiskit and then you have different services that implement that interface, and for IBM Quantum that would be the code in qiskit-ibmq-provider? It wasn't clear to me how that's going to work (and I didn't read the large design doc 😄 ).

I was under the impression that the experiment service code in qiskit-ibmq-provider was essentially going to be frozen and be transitioned to this repo.

qiskit-experiments defines the code to run the actual experiments. It can use IBM Quantum to run circuits and save results, but it doesn't have to. Qiskit is designed to work with different providers (IBM, Honeywell, Aer, etc), and there is no dependency on IBM Quantum.

This PR essentially defines how it would interact with an "experiment database service", if one is available. IBM Quantum is just the only one providing this service right now.

mriedem · 2021-07-01T18:39:25Z

+    ) -> None:
+        """Update an existing experiment.
+
+        Args:


You can also update visibility if you own the experiment, e.g. to make it public or share at some level in the provider.

mriedem · 2021-07-01T19:09:42Z

+from qiskit_experiments.database_service.exceptions import DbExperimentDataError
+
+
+class TestDbAnalysisResult(QiskitTestCase):


If these are all just unit tests you can ignore, but if at some point there are integration tests that are using the real service then it'd be good to tag experiments with something that lets us know they are from Qiskit CI testing for cleanup on the server side in case test case cleanup fails for some reason.

These are all just unit tests. qiskit-experiments does not have a dependency on qiskit-ibmq-provider. There will be new integration tests in qiskit-ibmq-provider.

mriedem · 2021-07-01T19:10:15Z

+        values = {
+            "result_data": {"foo": "bar"},
+            "result_type": "some_type",
+            "device_components": ["Q1", "Q1"],


Did you mean to have Q1 in here twice?

Fixed in 2e01e94

mriedem · 2021-07-01T19:10:41Z

+        """Test stored data attributes."""
+        attrs = {
+            "job_ids": ["job1"],
+            "share_level": "global",


Suggested change

"share_level": "global",

"share_level": "public",

mriedem · 2021-07-01T19:11:15Z

+        }
+        exp_data = DbExperimentData(
+            backend=self.backend,
+            experiment_type="qiskit_test",


OK this is probably sufficient to let us know if these were ever part of an integration test that we could cleanup orphans on the real service.

mriedem · 2021-07-01T19:12:42Z

+
+        service = self._set_mock_service()
+        exp_data = DbExperimentData(backend=self.backend, experiment_type="qiskit_test")
+        exp_data.add_figures(figure, save_figure=True)


Does this actually write an svg file to disk? If so it'd be good to have a self.addCleanup call to remove the generated file if any (for local testing anyway).

It doesn't. It uses a mock which does nothing other than letting me verify the proper methods are called.

jyu00

@mriedem Thanks for reviewing this. A lot of your comments in database_service.py are implemented in qiskit-ibmq-provider (see this PR). The database service interface defined here is supposed to be generic, and I try to avoid anything that's IBMQ only (not completely successful, but I try), especially things that cannot be implemented in a local database.

jyu00 · 2021-07-01T20:23:37Z

+
+    An experiment database service allows you to store experiment data and metadata
+    in a database. An experiment can have one or more jobs, analysis results,
+    and figures.


It's not implemented in qiskit-ibmq-provider. When I asked, the response suggested it was for internal use only?

jyu00 · 2021-07-01T20:27:15Z

+    ) -> str:
+        """Create a new experiment in the database.
+
+        Args:


I'm trying to avoid IBMQ-specific fields, especially those that cannot be implemented by a local DB. It is implemented on the ibmq-provider side.

jyu00 · 2021-07-01T20:30:02Z

+            A dictionary containing the retrieved experiment data.
+
+        Raises:
+            ExperimentEntryNotFound: If the experiment does not exist.


I don't think it's wrong to raise ExperimentEntryNotFound for an experiment you don't have access to. Because as far as you're concerned, the experiment doesn't exist.

jyu00 · 2021-07-01T20:37:38Z

+            device_components: Target device components, such as qubits.
+            tags: Tags to be associated with the analysis result.
+            quality: Quality of this analysis.
+            verified: Whether the result quality has been verified.


That how results DB users use it today. But there is no requirement that says a human has to verify something to set this flag - again trying to keep things generic.

jyu00 · 2021-07-01T20:46:12Z

+class DbAnalysisResult:
+    """Base common type for all versioned AnalysisResult abstract classes.
+
+    Note this class should not be inherited from directly, it is intended


Because the comment is telling people not to do MyCustomClass(DbAnalysisResult), and nothing in abc would prevent that.

jyu00 · 2021-07-01T21:20:26Z

+        """Add the experiment figure.
+
+        Args:
+            figures (str or bytes or pyplot.Figure or list): Names of the figure


Fixed in 2e01e94

jyu00 · 2021-07-01T21:20:35Z

+        """
+        if isinstance(result_key, int):
+            result_key = self._analysis_results.keys()[result_key]
+        elif result_key not in self._analysis_results:


Fixed in 2e01e94

jyu00 · 2021-07-01T21:22:09Z

+        """Return the class name and version."""
+        return self._source
+
+    def __str__(self):


It was intentional since I didn't want the output to be a page long lol. Having said that, backend and tags seem important enough to be included, so I added them in 2e01e94.

jyu00 · 2021-07-01T21:22:34Z

+        values = {
+            "result_data": {"foo": "bar"},
+            "result_type": "some_type",
+            "device_components": ["Q1", "Q1"],


Fixed in 2e01e94

jyu00 · 2021-07-01T21:26:06Z

+:class:`DbExperimentDataV1` is the main class that defines the structure of
+experiment data, which consists of the following::
+
+    * Results from circuit execution, which is called ``data`` in this class.


Fixed in 2e01e94

jyu00 · 2021-07-01T21:40:36Z

@nkanazawa1989 @eliarbel @yaelbh I think this is ready to be merged. A few things not yet addressed in this PR:

rename some of the db service methods (e.g. create_experiment to create_experiment_entry)
local support for tagging a figure
adding value, variance, and chisq as separate fields in DbAnalysisResult.

1 & 2 will take a bit of effort, and 3 I need to think a bit more about how it integrates with the analysis classes.

nkanazawa1989

Thanks Jessie! This LGTM :) Looking forward to using experiment service!

yaelbh · 2021-07-06T12:38:21Z

I'll update #115

* move from terra * run black * convert service exception to log * fix lint * log post processing failure * remove data index * fix test can package typo * remove experiment_class and result_class * fix lint * doc update Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com> * rename classes * review comments * fix lint * review comments * fix lint * add db service to doc * fix doc * remove extra service keyword * fix type hint * review comments Co-authored-by: Yael Ben-Haim <yaelbh@il.ibm.com> Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>

jyu00 added 4 commits June 16, 2021 20:29

move from terra

d3af475

run black

7506759

convert service exception to log

4503267

fix lint

5e97146

This was referenced Jun 17, 2021

[WIP] Work with Terra PR 5499 #66

Closed

Integration with result database #115

Merged

jyu00 added 3 commits June 17, 2021 14:41

log post processing failure

6bd5c64

remove data index

1780a37

fix test can package typo

e3a2d31

Merge branch 'main' into results-db

b727010

yaelbh mentioned this pull request Jun 20, 2021

Spectroscopy and calibrations integration V3 #88

Merged

jyu00 and others added 8 commits June 23, 2021 08:54

Merge remote-tracking branch 'upstream/main' into results-db

d2482ac

Merge remote-tracking branch 'origin/results-db' into results-db

8139169

remove experiment_class and result_class

1fd9dd5

fix lint

404d7be

Merge branch 'main' into results-db

aa507c8

Merge branch 'main' into results-db

248f5aa

Merge branch 'main' into results-db

b367355

Merge branch 'main' into results-db

c09d010

nkanazawa1989 suggested changes Jun 24, 2021

View reviewed changes

yaelbh and others added 6 commits June 27, 2021 10:34

Merge branch 'main' into results-db

87f2355

doc update

f09667c

Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>

rename classes

c98e2c6

review comments

9070a89

Merge remote-tracking branch 'origin/results-db' into results-db

25c5156

fix lint

5205ee6

Merge branch 'main' into results-db

11d392a

coruscating added this to the Release 0.1 milestone Jun 29, 2021

eliarbel reviewed Jun 30, 2021

View reviewed changes

eliarbel approved these changes Jun 30, 2021

View reviewed changes

jyu00 added 4 commits June 30, 2021 16:47

review comments

e916c5f

Merge remote-tracking branch 'origin/results-db' into results-db

156bf4a

Merge remote-tracking branch 'upstream/main' into results-db

e2980ad

fix lint

7169edd

jyu00 added 4 commits June 30, 2021 17:42

add db service to doc

111c7ad

fix doc

ede3a31

remove extra service keyword

6653396

fix type hint

c433999

mriedem reviewed Jul 1, 2021

View reviewed changes

review comments

2e01e94

jyu00 commented Jul 1, 2021

View reviewed changes

Merge branch 'main' into results-db

6f564d8

nkanazawa1989 approved these changes Jul 5, 2021

View reviewed changes

Merge branch 'main' into results-db

d086374

yaelbh merged commit 929eadb into qiskit-community:main Jul 6, 2021

nkanazawa1989 mentioned this pull request Jul 23, 2021

Discussion of experiment and analysis data #230

Closed

	a local database service in the near future.
	a local database service in the future.

		from qiskit_experiments.database_service.exceptions import DbExperimentDataError


		class TestDbAnalysisResult(QiskitTestCase):

Conversation

jyu00 commented Jun 17, 2021

Summary

Details and comments

Uh oh!

yaelbh commented Jun 17, 2021

Uh oh!

yaelbh commented Jun 17, 2021

Uh oh!

jyu00 commented Jun 17, 2021

Uh oh!

jyu00 commented Jun 17, 2021

Uh oh!

nkanazawa1989 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jyu00 commented Jun 28, 2021

Uh oh!

nkanazawa1989 commented Jun 30, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eliarbel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jyu00 commented Jun 30, 2021

Uh oh!

mriedem left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

nkanazawa1989 left a comment •

edited

Loading

jyu00 Jul 1, 2021 •

edited

Loading

mriedem Jul 2, 2021 •

edited

Loading