Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bias in significance testing from unbalanced instances #1977

Open
nikohansen opened this issue Jun 4, 2020 · 2 comments
Open

Possible bias in significance testing from unbalanced instances #1977

nikohansen opened this issue Jun 4, 2020 · 2 comments

Comments

@nikohansen
Copy link
Contributor

We may want to encourage experimenters to repeat trials on instances depending on the observed success, see also #1117 and #1978.

If some instances are repeated much more frequently than others, the significance test will mainly test the difference on these instances. This seems to be rather undesirable.

We can not easily stratify the significance test, because data sets do not need to be run on the same instances. We could however take the median of each instance to conduct the test, or another representative subsample. A subsample choice that also works if experiments have been done on a single instance seems preferable (even if such experiments should not be considered a valid data set).

@nikohansen
Copy link
Contributor Author

Currently, toolsstats.significancetest starts with

    balance_instances_saved, genericsettings.balance_instances = genericsettings.balance_instances, False

and seems not to correct for different number of repetitions of instances.

@nikohansen
Copy link
Contributor Author

A somewhat related issue: when within-trial independent restarts are superseded by experimental repetitions, the $p$-values as they currently calculated will often consider only a comparatively small budget and only reflect relatively easy targets/problems. Should also $p$-values be calculated with simulated restarts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant