Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: long-term data storage #2222

Open
brockho opened this issue Oct 24, 2023 · 0 comments
Open

Discussion: long-term data storage #2222

brockho opened this issue Oct 24, 2023 · 0 comments

Comments

@brockho
Copy link
Contributor

brockho commented Oct 24, 2023

During our first COCO sprint, we discussed the idea of a general long-term storage format for benchmarking data (an idea put forward by the IOH Profiler people together with @olafmersmann). A first suggestion from our side is the following.

For each experiment (with a concrete experiment id or timestamp or ...), we store its metadata (things that stay constant over the entire experiment) in a metadata table like this:

exp. Timestamp key value
1 "author" "Mr. Coco"
1 "doi" https://doi.org/876.123
... ... ...
2 "author" "H. Simpson"
2 "cma.opts.sigma0" 1.5
2 "suite" "bbob-biobj-ext"
2 "experimental setup" "bbob-biobj"
3 "link to code"  
3 "url"  
3 "publication" "GECCO 2023 Proceedings"
3 "algo implementation" "pycma, version 3.1"
3 "software" COCO
3 "suite" bbob-constrained
3 "experimental setup" bbob-constrained
3 "algname" "COBYLA"

To be discussed: which entries are mandatory and which are optional.

For each experiment, we then can then store the single evaluations (or a subset thereof) in a big table like this one:

timestamp function id dim instance #funevals indicator value #g-calls #non-feasible points evaluated so far target reached f-values g-values x-values experimental data
1 1 2 12 123 1.20E+03 nan nan 1.30E+03 nan nan [1,23] {"param1": 12, "param2": 44}
1 1 2 12 124 1.12E+03 nan nan 1.25E+03 nan nan [1.1,19.7] {"param1": 10, "param2": 44.9}
4 1 2 12 123 1.42E+03 nan nan 2.00E+03 nan nan [0.7,23.62] {"name": "exp2", "doi": "https://doi.org/10.287.22"}
2 1 3 12 123 1.23E+03 nan nan 4.00E+03 [1.2, 3.234] nan [232,2376,21]  
3 1 5 11 234 1.29E+02 54 288 1.30E+02 [1.1, 0,85] [7654, 8987, 3123] [43,123,22,23,1]  

Our ideas behind all this are:

  • All columns until (and including) indicator value look like they have to be mandatory (at least for most experiments and certainly for all COCO data, produced so far)
  • All other columns are non-mandatory and could be different in different experiments tables (which are then not compatible anymore).
  • The #funevals column is rather an "effort spent" column and must be a monotonously increasing function, for example in the case of constrained problems the number of combined f- and g-evaluations. This means also that it might, in some cases, contain vectors such as the number of calls to each individual objective function if they are callable independently (and need, for example, different times to evaluate).
  • The indicator value column contains the objective function to be optimized, such as the best so far f-value in the unconstrained, single-objective case, a quality indicator in the multiobjective case, the Lagrangian in the constrained case, ...
  • The target reached column seems a nice-to-have in the COCO context, even if we don't write these data ourselves right now (but it should be easy to reconstruct because the targets are fixed in our case.
  • Entries in the same experiment table should be, in principle, comparable with each other.

Note that this is a first draft and will be hopefully extended here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant