-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
copied (and updated) the publish-a-dataset-howto.md and uploading-dat…
…a-issue template file from the deprecated coco repository before deletion there
- Loading branch information
Showing
3 changed files
with
187 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
name: Generic Issue | ||
about: Create an issue to help us improve | ||
title: '' | ||
labels: '' | ||
assignees: '' | ||
|
||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
name: Submit a COCO data set | ||
about: Describes how to submit a COCO data set to the official archive. | ||
title: "[DATA SUBMISSION] please put your title here" | ||
labels: 'data submission' | ||
assignees: brockho | ||
|
||
--- | ||
|
||
Hello COCO user! | ||
|
||
Here is the right place to submit one or several COCO data sets to the official COCO archive (https://numbbo.it/data-archive/). Please provide the information requested below and submit the issue. Do not hesitate to read about the details in our [How to publish a dataset documentation](https://github.com/numbbo/coco/blob/development/howtos/publish-a-dataset-howto.md). | ||
|
||
#### Reference | ||
[Put the full reference (for citations) and a link to the pdf here.] | ||
|
||
#### Description of the Algorithm(s) | ||
[Please provide a short description for each algorithm in the data.] | ||
|
||
#### Link to Data | ||
[Please prepare and upload your COCO data (see [how to](https://github.com/numbbo/coco/blob/development/howtos/publish-a-dataset-howto.md)) and provide the link here. If you want to submit more than one data set at the same time, please indicate which file corresponds to which algorithm.] | ||
|
||
#### Optional: Source Code of Experiment | ||
[We are happy to also share any source code of the algorithm and the COCO experiment, you were running. Please provide, if you wish, any code, links, etc. here] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
# How to Publish Your Data Within COCO | ||
|
||
There are two ways to make data easily accessible to the community: | ||
|
||
- propose inclusion of your data into [`cocopp.archives`](https://numbbo.it/data-archive), or | ||
- host your own COCO archive with your data. | ||
|
||
In both cases, first, the data need to be prepared. For this, for each | ||
dataset (that is, each benchmarked algorithm variant): | ||
|
||
1. <details><summary><b>Zip the data folder. | ||
</b> (click to view)</summary> | ||
A data zipfile contains a single folder under which all data from a | ||
single full experiment was collected. The folder can contain subfolders | ||
(or subsub...folders), for example of data from different (sub)batches of | ||
the complete experiment. Valid formats are | ||
<tt>.gzip</tt> or <tt>.tgz</tt> or <tt>.zip</tt> | ||
</details> | ||
|
||
1. <details><summary><b>Rename the zip file. | ||
</b> (click to view)</summary> | ||
The name of the zipfile defines the name of the data set. | ||
The name should represent the benchmarked algorithm and may contain | ||
authors names (but rather not the name of the test suite). | ||
The name can have any length, but the first ten-or-so characters should | ||
be a meaningful algorithm "abbreviation". | ||
|
||
## Propose Inclusion to the COCO Data Archive | ||
|
||
This option is available if one or several datasets were used in a publication | ||
or in a preprint available for example on [arXiv](https://arxiv.org) or | ||
[HAL](https://hal.archives-ouvertes.fr). | ||
For this: | ||
|
||
3. Upload the above data zipfile(s) to a file sharing site or to an accessible URL. | ||
4. Ask for the inclusion into [`cocopp.archives`](https://numbbo.it/data-archive). | ||
For this, open an [issue at the data-archive Gitlab repository of COCO](https://github.com/numbbo/data-archive/issues) | ||
(you need to have a Github account) with | ||
|
||
- the publication reference and a link to the paper | ||
- a very short description of each dataset including the name of | ||
- the algorithm | ||
- the test suite | ||
- the zip file | ||
- a link to the dataset zip file(s) | ||
- (optional) a link to the source code to reproduce the dataset | ||
|
||
## Host an Archive | ||
|
||
Hosting an archive means putting one or several data zipfiles with an added | ||
"archive definition text file" online in a dedicated folder that can be | ||
accessed under an URL, like http://cma-es.github.io/lq-cma/data-archives/lq-gecco2019. | ||
For example, any folder under a personal homepage root will do. | ||
|
||
For this: | ||
|
||
3. <details><summary><b>Move the above data zipfile(s) into a clean folder</b>, | ||
possibly with subfolders (click to see more).</summary> | ||
The folder name is only used as part of the URL and can be changed after | ||
creating the archive. If desired, subfolders can be created that become part | ||
of the names of the datasets under this subfolder. These can not be changed | ||
without repeating the following creation procedure:</details> | ||
|
||
1. <details><summary><b>Create the archive</b> | ||
(two lines of Python code, click to see more).</summary> | ||
Assume the data zipfiles are in the folder <tt>elisa_2020</tt> or its | ||
subfolders and <tt>cocopp</tt> is installed (<tt>pip install cocopp</tt>). | ||
In a Python shell, it suffices to type: | ||
|
||
```python | ||
import cocopp | ||
cocopp.archiving.create('elisa_2020') | ||
``` | ||
|
||
thereby "creating" the archive locally by adding an archive | ||
definition file to the folder <tt>elisa_2020</tt>. | ||
Archives can contain other archives as subfolders or, | ||
the other way around, additional subarchives can be | ||
created in any archive subfolder. This is how | ||
https://numbbo.it/data-archive/ is organized. | ||
<details><summary>Alternative code (from a system shell, click to expand)</summary> | ||
<tt>python -c "import cocopp; cocopp.archiving.create('elisa_2020')"</tt> | ||
</details> | ||
</details> | ||
|
||
1. **Upload the archive folder** and its content to where it can be accessed | ||
via an URL. The archive is now accessible with `cocopp.archiving.get('URL')` | ||
(see below example). | ||
|
||
1. **Open an** [**issue** at the Github repository of COCO](https://github.com/numbbo/data-archive/issues) | ||
(you need to have a Github account) signalling the URL of the archive with | ||
a short description of the dataset(s) in the archive. | ||
|
||
### Example of an resulting archive | ||
|
||
For example, the `bbob-mixint` archive at | ||
https://github.com/numbbo/data-archive/tree/gh-pages/data-archive/bbob-mixint | ||
contains four datasets and the folder structure looks like | ||
<font size="1"> | ||
|
||
``` | ||
bbob-mixint/ | ||
|-- 2019-gecco-benchmark/ | ||
| |-- CMA-ES-pycma.tgz | ||
| |-- DE-scipy.tgz | ||
| |-- RANDOMSEARCH.tgz | ||
| `-- TPE-hyperopt.tgz | ||
|-- coco_archive_definition.txt | ||
``` | ||
</font> | ||
The corresponding `coco_archive_definition.txt` file looks like | ||
<details ><summary>(click to view)</summary><font size="1"> | ||
```python | ||
[ | ||
('2019-gecco-benchmark/CMA-ES-pycma.tgz', | ||
'0d8e7f2c77f4e43176bc9424ee8f9a0bfe8e7f66fabc95b15ea7a56ad8b1d667', | ||
38514), | ||
('2019-gecco-benchmark/DE-scipy.tgz', | ||
'494483b1bce9185f8977ce9abf6f6eac3a660efd6fa09321e305dfb79296cd18', | ||
35401), | ||
('2019-gecco-benchmark/RANDOMSEARCH.tgz', | ||
'14b237093fd1f393871c578b6b28b6f9a6c3d8dc8921e3bdb024b3cc7cdd287d', | ||
26006), | ||
('2019-gecco-benchmark/TPE-hyperopt.tgz', | ||
'34fede46a00c8adef4c388565c3b759c07a7d7d83366e115632b407764e64bf6', | ||
19633)] | ||
``` | ||
|
||
</font> | ||
with hashcodes and filesizes as additional entries. | ||
</details> | ||
|
||
### Example for using an archive | ||
|
||
```python | ||
import cocopp | ||
|
||
url = 'http://cma-es.github.io/lq-cma/data-archives/lq-gecco2019' | ||
arch = cocopp.archiving.get(url) | ||
print(arch) # `arch` "is" a `list` of relative filenames | ||
['CMA-ES__2019-gecco-surr.tgz', | ||
'SLSQP+CMA_2019-gecco-surr.tgz', | ||
'SLSQP-11_2019-gecco-surr.tgz', | ||
'lq-CMA-ES_2019-gecco-surr.tgz'] | ||
|
||
# compare local result with data from lq-cma archive | ||
# and from the cocopp.archives.bbob archive | ||
cocopp.main([# 'exdata/my_local_results', # in case | ||
arch.get('SLSQP-11'), # downloads if necessary | ||
cocopp.archives.bbob.get_first('2010/IPOP-CMA'), | ||
arch.get('CMA-ES__2019')]) | ||
``` |