Estimating mutation score by sampling #490

tomato42 · 2019-11-02T00:45:48Z

I was thinking about the issue #484, and I think there is a bit of a problem with it: what if the test coverage changes but the code does not? How to see if the changes to test coverage are ok, when no application code was changed?

Maybe the cosmic-ray could execute a random selection of the tests and calculate confidence interval for it?

The formula I found, uses:
p for percentage of mutants that survived
q for percentage of mutants that were killed (i.e. 1 - p)
n number of mutants tested
N number of mutants total
z z-score (scaling factor for the given confidence level, 1.65 for 90%, 1.96 for 95%, 2.58 for 99%)

So if I have 8000 mutants, tested randomly 40 of them, 20% of them survived and I want to know a 95% confidence interval for that 20% I calculate:
sqrt((p * q)/n) * z * (1 - sqrt(n/N)) = sqrt(0.2 * 0.8 / 40) * 1.96 * (1 - sqrt(40/8000)) = 0.115

so by executing 40 tests, I know that the real mutation score of this test suite is 20% ± 11.5% (95% confidence)

The nice thing is that if the execution was selecting the tests at random, that estimation could be simply a switch to cr-report to base it off of total jobs vs complete jobs and already calculated survival rate.

The text was updated successfully, but these errors were encountered:

abingham · 2019-11-02T09:20:10Z

You're absolutely right that using coverage to drive CR is a heuristic; it generally won't tell us exactly what tests need to be run since logical paths through the code might alter the relationships between tests and coverage. I'm not confident enough in my statistics to assess if what you're proposing is mathematically sound, but I certainly like the approach in principle. Assuming the mathematics is (or could be made) correct, I think it's worth pursuing. I think you could very easily write an interceptor that a) identifies the random mutations to run and b) marks all others as skipped. You'd then need a specialized (but probably quite simple) reporting tool to interpret the results.

…

On Sat, Nov 2, 2019 at 1:45 AM Hubert Kario ***@***.***> wrote: I was thinking about the issue #484 <#484>, and I think there is a bit of a problem with it: what if the test coverage changes but the code does not? Maybe the cosmic-ray could execute a random selection of the tests and calculate confidence interval for it? The formula I found, uses: *p* for percentage of mutants that survived *q* for percentage of mutants that were killed (i.e. 1 - *p*) *n* number of mutants tested *N* number of mutants total *z* z-score (scaling factor for the given confidence level, 1.65 for 90%, 1.96 for 95%, 2.58 for 99%) So if I have 8000 mutants, tested randomly 40 of them, 20% of them survived and I want to know a 95% confidence interval for that 20% I calculate: sqrt((*p* * *q*)/*n*) * *z* * (1 - sqrt(*n*/*N*)) = sqrt(0.2 * 0.8 / 40) * 1.96 * (1 - sqrt(40/8000)) = 0.115 so by executing 40 tests, I know that the real mutation score of this test suite is 20% ± 11.5% (95% confidence) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#490?email_source=notifications&email_token=AAATK6ASCI4RNII55AFSGU3QRTETZA5CNFSM4JIBS2XKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWH56YA>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAATK6A3IYLJUTUHO67JNZTQRTETZANCNFSM4JIBS2XA> .

tomato42 · 2019-11-02T09:38:36Z

I've updated the question later: if the cosmic-ray exec would execute test cases in random order, then cr-report could simply take the results from DB and calculate the confidence interval (probably with a switch)

so it wouldn't be a new interceptor, but rather ability to estimate results from a partial run (like in CI, where you can run the tests for 20-30 minutes and make do with what you got)

tomato42 · 2019-11-02T13:06:09Z

I've proposed PR to implement it

Here's one lecture that goes into error estimation: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Confidence_Intervals/BS704_Confidence_Intervals_print.html

for the estimation of the survival rate to be representative, the sample must be random, so execute the tasks in random order see sixty-north#490 and sixty-north#491

for the estimation of the survival rate to be representative, the sample must be random, so execute the tasks in random order see #490 and #491

tomato42 mentioned this issue Nov 2, 2019

Estimate mutation score by sampling #491

Merged

abingham closed this as completed in #491 Nov 4, 2019

tomato42 mentioned this issue Dec 29, 2023

process work items in random order #540

Merged

abingham pushed a commit that referenced this issue Jan 11, 2024

process work items in random order (#540)

9713e21

for the estimation of the survival rate to be representative, the sample must be random, so execute the tasks in random order see #490 and #491

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimating mutation score by sampling #490

Estimating mutation score by sampling #490

tomato42 commented Nov 2, 2019 •

edited

Loading

abingham commented Nov 2, 2019 via email

tomato42 commented Nov 2, 2019

tomato42 commented Nov 2, 2019

Estimating mutation score by sampling #490

Estimating mutation score by sampling #490

Comments

tomato42 commented Nov 2, 2019 • edited Loading

abingham commented Nov 2, 2019 via email

tomato42 commented Nov 2, 2019

tomato42 commented Nov 2, 2019

tomato42 commented Nov 2, 2019 •

edited

Loading