Possible bias in significance testing from unbalanced instances #1977

nikohansen · 2020-06-04T16:13:13Z

We may want to encourage experimenters to repeat trials on instances depending on the observed success, see also #1117 and #1978.

If some instances are repeated much more frequently than others, the significance test will mainly test the difference on these instances. This seems to be rather undesirable.

We can not easily stratify the significance test, because data sets do not need to be run on the same instances. We could however take the median of each instance to conduct the test, or another representative subsample. A subsample choice that also works if experiments have been done on a single instance seems preferable (even if such experiments should not be considered a valid data set).

nikohansen · 2023-06-24T13:25:09Z

Currently, toolsstats.significancetest starts with

    balance_instances_saved, genericsettings.balance_instances = genericsettings.balance_instances, False

and seems not to correct for different number of repetitions of instances.

nikohansen · 2024-06-09T11:39:01Z

A somewhat related issue: when within-trial independent restarts are superseded by experimental repetitions, the $p$-values as they currently calculated will often consider only a comparatively small budget and only reflect relatively easy targets/problems. Should also $p$-values be calculated with simulated restarts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bias in significance testing from unbalanced instances #1977

Possible bias in significance testing from unbalanced instances #1977

nikohansen commented Jun 4, 2020

nikohansen commented Jun 24, 2023

nikohansen commented Jun 9, 2024

Possible bias in significance testing from unbalanced instances #1977

Possible bias in significance testing from unbalanced instances #1977

Comments

nikohansen commented Jun 4, 2020

nikohansen commented Jun 24, 2023

nikohansen commented Jun 9, 2024