-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Estimating mutation score by sampling #490
Comments
You're absolutely right that using coverage to drive CR is a heuristic; it
generally won't tell us exactly what tests need to be run since logical
paths through the code might alter the relationships between tests and
coverage.
I'm not confident enough in my statistics to assess if what you're
proposing is mathematically sound, but I certainly like the approach in
principle. Assuming the mathematics is (or could be made) correct, I think
it's worth pursuing. I think you could very easily write an interceptor
that a) identifies the random mutations to run and b)
marks all others as skipped. You'd then need a specialized (but probably
quite simple) reporting tool to interpret the results.
…On Sat, Nov 2, 2019 at 1:45 AM Hubert Kario ***@***.***> wrote:
I was thinking about the issue #484
<#484>, and I think there
is a bit of a problem with it: what if the test coverage changes but the
code does not?
Maybe the cosmic-ray could execute a random selection of the tests and
calculate confidence interval for it?
The formula I found, uses:
*p* for percentage of mutants that survived
*q* for percentage of mutants that were killed (i.e. 1 - *p*)
*n* number of mutants tested
*N* number of mutants total
*z* z-score (scaling factor for the given confidence level, 1.65 for 90%,
1.96 for 95%, 2.58 for 99%)
So if I have 8000 mutants, tested randomly 40 of them, 20% of them
survived and I want to know a 95% confidence interval for that 20% I
calculate:
sqrt((*p* * *q*)/*n*) * *z* * (1 - sqrt(*n*/*N*)) = sqrt(0.2 * 0.8 / 40)
* 1.96 * (1 - sqrt(40/8000)) = 0.115
so by executing 40 tests, I know that the real mutation score of this test
suite is 20% ± 11.5% (95% confidence)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#490?email_source=notifications&email_token=AAATK6ASCI4RNII55AFSGU3QRTETZA5CNFSM4JIBS2XKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWH56YA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAATK6A3IYLJUTUHO67JNZTQRTETZANCNFSM4JIBS2XA>
.
|
I've updated the question later: if the so it wouldn't be a new interceptor, but rather ability to estimate results from a partial run (like in CI, where you can run the tests for 20-30 minutes and make do with what you got) |
I've proposed PR to implement it Here's one lecture that goes into error estimation: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Confidence_Intervals/BS704_Confidence_Intervals_print.html |
for the estimation of the survival rate to be representative, the sample must be random, so execute the tasks in random order see sixty-north#490 and sixty-north#491
I was thinking about the issue #484, and I think there is a bit of a problem with it: what if the test coverage changes but the code does not? How to see if the changes to test coverage are ok, when no application code was changed?
Maybe the cosmic-ray could execute a random selection of the tests and calculate confidence interval for it?
The formula I found, uses:
p for percentage of mutants that survived
q for percentage of mutants that were killed (i.e. 1 - p)
n number of mutants tested
N number of mutants total
z z-score (scaling factor for the given confidence level, 1.65 for 90%, 1.96 for 95%, 2.58 for 99%)
So if I have 8000 mutants, tested randomly 40 of them, 20% of them survived and I want to know a 95% confidence interval for that 20% I calculate:
sqrt((p * q)/n) * z * (1 - sqrt(n/N)) = sqrt(0.2 * 0.8 / 40) * 1.96 * (1 - sqrt(40/8000)) = 0.115
so by executing 40 tests, I know that the real mutation score of this test suite is 20% ± 11.5% (95% confidence)
The nice thing is that if the execution was selecting the tests at random, that estimation could be simply a switch to
cr-report
to base it off of total jobs vs complete jobs and already calculated survival rate.The text was updated successfully, but these errors were encountered: