Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

effect sizes vs power #28

Open
coralzhang opened this issue Sep 9, 2024 · 3 comments
Open

effect sizes vs power #28

coralzhang opened this issue Sep 9, 2024 · 3 comments

Comments

@coralzhang
Copy link

I am trying to understand the power calculation function power.general.withDoublets. I created 3 reference studies, with low, medium and high fold change values; I expect the power to increase with every setting being fixed. But the power provided was lowest for the medium fold changes. Can you explain why?
`
dat=scPower::de.ref.study
dat2=dat[dat$name=="Blueprint (CLL) uCLL-iCLL",]
dat2=dat2[order(dat2$FoldChange),]

dat3=dat2[11:15,]; dat3$rank=10001:10005
dat4=dat2[121:125,]; dat4$rank=10001:10005
dat5=dat2[331:335,]; dat5$rank=10001:10005

power3<-power.general.withDoublets(nSamples=10,nCells=7000,readDepth=25000,ct.freq=0.3,
type="de", ref.study=dat3, ref.study.name="Blueprint (CLL) uCLL-iCLL",
samplesPerLane=2,read.umi.fit = scPower::read.umi.fit[read.umi.fit$type=="10X_PBMC_1",],
gamma.mixed.fits = scPower::gamma.mixed.fits,ct="CD4 T cells",
disp.fun.param=scPower::disp.fun.param,mappingEfficiency = 1,
min.UMI.counts = 3,perc.indiv.expr = 0.9,sign.threshold = 0.05,MTmethod="Bonferroni",
multipletRateGrowth="constant")

power4<-power.general.withDoublets(nSamples=10,nCells=7000,readDepth=25000,ct.freq=0.3,
type="de", ref.study=dat4, ref.study.name="Blueprint (CLL) uCLL-iCLL",
samplesPerLane=2,read.umi.fit = scPower::read.umi.fit[read.umi.fit$type=="10X_PBMC_1",],
gamma.mixed.fits = scPower::gamma.mixed.fits,ct="CD4 T cells",
disp.fun.param=scPower::disp.fun.param,mappingEfficiency = 1,
min.UMI.counts = 3,perc.indiv.expr = 0.9,sign.threshold = 0.05,MTmethod="Bonferroni",
multipletRateGrowth="constant")

power5<-power.general.withDoublets(nSamples=10,nCells=7000,readDepth=25000,ct.freq=0.3,
type="de", ref.study=dat5, ref.study.name="Blueprint (CLL) uCLL-iCLL",
samplesPerLane=2,read.umi.fit = scPower::read.umi.fit[read.umi.fit$type=="10X_PBMC_1",],
gamma.mixed.fits = scPower::gamma.mixed.fits,ct="CD4 T cells",
disp.fun.param=scPower::disp.fun.param,mappingEfficiency = 1,
min.UMI.counts = 3,perc.indiv.expr = 0.9,sign.threshold = 0.05,MTmethod="Bonferroni",
multipletRateGrowth="constant")

`

here is the output:
power3:
name powerDetect exp.probs power sampleSize
1 Blueprint (CLL) uCLL-iCLL 0.9889792 0.9900965 0.9988715 10
totalCells usableCells multipletFraction ctCells readDepth readDepthSinglet
1 7000 7000 7.67e-06 2100 25000 25000
mappedReadDepth expressedGenes
1 25000 11210

power4:
name powerDetect exp.probs power sampleSize
1 Blueprint (CLL) uCLL-iCLL 0.501316 0.9900965 0.5063288 10
totalCells usableCells multipletFraction ctCells readDepth readDepthSinglet
1 7000 7000 7.67e-06 2100 25000 25000
mappedReadDepth expressedGenes
1 25000 11210

power5:
name powerDetect exp.probs power sampleSize totalCells
1 Blueprint (CLL) uCLL-iCLL 0.9900965 0.9900965 1 10 7000
usableCells multipletFraction ctCells readDepth readDepthSinglet
1 7000 7.67e-06 2100 25000 25000
mappedReadDepth expressedGenes
1 25000 11210

@KatharinaSchmid
Copy link
Collaborator

Hey,
thanks for your question and your interest in our tool :)
The reason for this behavior is the meaning behind the FoldChange: a FoldChange of 1 means no difference between the two conditions (expression_1/expression_2), and in the end the deviation from 1 is influencing the power. In your example, the FoldChanges in dat4 are the closest to 1, so the most difficult to detect, resulting in lower power.

It is much better visible when converting the FoldChanges to LogFoldChanges:

> dat2$LogFoldChange<-log2(dat2$FoldChange)
> dat3=dat2[11:15,]; dat3$rank=10001:10005
> dat4=dat2[121:125,]; dat4$rank=10001:10005
> dat5=dat2[331:335,]; dat5$rank=10001:10005
> dat3
                         name FoldChange  rank LogFoldChange
552 Blueprint (CLL) uCLL-iCLL 0.08987483 10001     -3.475939
362 Blueprint (CLL) uCLL-iCLL 0.09053963 10002     -3.465307
516 Blueprint (CLL) uCLL-iCLL 0.09447516 10003     -3.403921
484 Blueprint (CLL) uCLL-iCLL 0.09537010 10004     -3.390319
507 Blueprint (CLL) uCLL-iCLL 0.09725035 10005     -3.362153
> dat4
                         name FoldChange  rank LogFoldChange
433 Blueprint (CLL) uCLL-iCLL  0.3523094 10001     -1.505085
550 Blueprint (CLL) uCLL-iCLL  0.3585967 10002     -1.479566
584 Blueprint (CLL) uCLL-iCLL  0.3601297 10003     -1.473412
617 Blueprint (CLL) uCLL-iCLL  0.3607892 10004     -1.470772
454 Blueprint (CLL) uCLL-iCLL  0.3720297 10005     -1.426510
> dat5
                         name FoldChange  rank LogFoldChange
407 Blueprint (CLL) uCLL-iCLL   18.18620 10001      4.184772
388 Blueprint (CLL) uCLL-iCLL   20.49132 10002      4.356941
549 Blueprint (CLL) uCLL-iCLL   20.49420 10003      4.357143
442 Blueprint (CLL) uCLL-iCLL   23.92357 10004      4.580361
528 Blueprint (CLL) uCLL-iCLL   27.34854 10005      4.773392

Now, it is visible that dat4 has the lowest absolute LogFoldChanges, i.e. gets the lowest power.

I hope this explanation was helpful, please let me know if you have more questions.

Best regards,
Katharina

@coralzhang
Copy link
Author

Thank you for the response!

@coralzhang
Copy link
Author

I have a follow up question: the nSamples in this function, is it total sample size, or per group? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants