effect sizes vs power #28

coralzhang · 2024-09-09T20:06:45Z

I am trying to understand the power calculation function power.general.withDoublets. I created 3 reference studies, with low, medium and high fold change values; I expect the power to increase with every setting being fixed. But the power provided was lowest for the medium fold changes. Can you explain why?
`
dat=scPower::de.ref.study
dat2=dat[dat$name=="Blueprint (CLL) uCLL-iCLL",]
dat2=dat2[order(dat2$FoldChange),]

dat3=dat2[11:15,]; dat3$rank=10001:10005
dat4=dat2[121:125,]; dat4$rank=10001:10005
dat5=dat2[331:335,]; dat5$rank=10001:10005

power3<-power.general.withDoublets(nSamples=10,nCells=7000,readDepth=25000,ct.freq=0.3,
type="de", ref.study=dat3, ref.study.name="Blueprint (CLL) uCLL-iCLL",
samplesPerLane=2,read.umi.fit = scPower::read.umi.fit[read.umi.fit$type=="10X_PBMC_1",],
gamma.mixed.fits = scPower::gamma.mixed.fits,ct="CD4 T cells",
disp.fun.param=scPower::disp.fun.param,mappingEfficiency = 1,
min.UMI.counts = 3,perc.indiv.expr = 0.9,sign.threshold = 0.05,MTmethod="Bonferroni",
multipletRateGrowth="constant")

power4<-power.general.withDoublets(nSamples=10,nCells=7000,readDepth=25000,ct.freq=0.3,
type="de", ref.study=dat4, ref.study.name="Blueprint (CLL) uCLL-iCLL",
samplesPerLane=2,read.umi.fit = scPower::read.umi.fit[read.umi.fit$type=="10X_PBMC_1",],
gamma.mixed.fits = scPower::gamma.mixed.fits,ct="CD4 T cells",
disp.fun.param=scPower::disp.fun.param,mappingEfficiency = 1,
min.UMI.counts = 3,perc.indiv.expr = 0.9,sign.threshold = 0.05,MTmethod="Bonferroni",
multipletRateGrowth="constant")

power5<-power.general.withDoublets(nSamples=10,nCells=7000,readDepth=25000,ct.freq=0.3,
type="de", ref.study=dat5, ref.study.name="Blueprint (CLL) uCLL-iCLL",
samplesPerLane=2,read.umi.fit = scPower::read.umi.fit[read.umi.fit$type=="10X_PBMC_1",],
gamma.mixed.fits = scPower::gamma.mixed.fits,ct="CD4 T cells",
disp.fun.param=scPower::disp.fun.param,mappingEfficiency = 1,
min.UMI.counts = 3,perc.indiv.expr = 0.9,sign.threshold = 0.05,MTmethod="Bonferroni",
multipletRateGrowth="constant")

`

here is the output:
power3:
name powerDetect exp.probs power sampleSize
1 Blueprint (CLL) uCLL-iCLL 0.9889792 0.9900965 0.9988715 10
totalCells usableCells multipletFraction ctCells readDepth readDepthSinglet
1 7000 7000 7.67e-06 2100 25000 25000
mappedReadDepth expressedGenes
1 25000 11210

power4:
name powerDetect exp.probs power sampleSize
1 Blueprint (CLL) uCLL-iCLL 0.501316 0.9900965 0.5063288 10
totalCells usableCells multipletFraction ctCells readDepth readDepthSinglet
1 7000 7000 7.67e-06 2100 25000 25000
mappedReadDepth expressedGenes
1 25000 11210

power5:
name powerDetect exp.probs power sampleSize totalCells
1 Blueprint (CLL) uCLL-iCLL 0.9900965 0.9900965 1 10 7000
usableCells multipletFraction ctCells readDepth readDepthSinglet
1 7000 7.67e-06 2100 25000 25000
mappedReadDepth expressedGenes
1 25000 11210

KatharinaSchmid · 2024-09-10T15:26:03Z

Hey,
thanks for your question and your interest in our tool :)
The reason for this behavior is the meaning behind the FoldChange: a FoldChange of 1 means no difference between the two conditions (expression_1/expression_2), and in the end the deviation from 1 is influencing the power. In your example, the FoldChanges in dat4 are the closest to 1, so the most difficult to detect, resulting in lower power.

It is much better visible when converting the FoldChanges to LogFoldChanges:

> dat2$LogFoldChange<-log2(dat2$FoldChange)
> dat3=dat2[11:15,]; dat3$rank=10001:10005
> dat4=dat2[121:125,]; dat4$rank=10001:10005
> dat5=dat2[331:335,]; dat5$rank=10001:10005
> dat3
                         name FoldChange  rank LogFoldChange
552 Blueprint (CLL) uCLL-iCLL 0.08987483 10001     -3.475939
362 Blueprint (CLL) uCLL-iCLL 0.09053963 10002     -3.465307
516 Blueprint (CLL) uCLL-iCLL 0.09447516 10003     -3.403921
484 Blueprint (CLL) uCLL-iCLL 0.09537010 10004     -3.390319
507 Blueprint (CLL) uCLL-iCLL 0.09725035 10005     -3.362153
> dat4
                         name FoldChange  rank LogFoldChange
433 Blueprint (CLL) uCLL-iCLL  0.3523094 10001     -1.505085
550 Blueprint (CLL) uCLL-iCLL  0.3585967 10002     -1.479566
584 Blueprint (CLL) uCLL-iCLL  0.3601297 10003     -1.473412
617 Blueprint (CLL) uCLL-iCLL  0.3607892 10004     -1.470772
454 Blueprint (CLL) uCLL-iCLL  0.3720297 10005     -1.426510
> dat5
                         name FoldChange  rank LogFoldChange
407 Blueprint (CLL) uCLL-iCLL   18.18620 10001      4.184772
388 Blueprint (CLL) uCLL-iCLL   20.49132 10002      4.356941
549 Blueprint (CLL) uCLL-iCLL   20.49420 10003      4.357143
442 Blueprint (CLL) uCLL-iCLL   23.92357 10004      4.580361
528 Blueprint (CLL) uCLL-iCLL   27.34854 10005      4.773392

Now, it is visible that dat4 has the lowest absolute LogFoldChanges, i.e. gets the lowest power.

I hope this explanation was helpful, please let me know if you have more questions.

Best regards,
Katharina

coralzhang · 2024-09-12T17:39:23Z

Thank you for the response!

coralzhang · 2024-09-16T20:02:33Z

I have a follow up question: the nSamples in this function, is it total sample size, or per group? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

effect sizes vs power #28

effect sizes vs power #28

coralzhang commented Sep 9, 2024

KatharinaSchmid commented Sep 10, 2024

coralzhang commented Sep 12, 2024

coralzhang commented Sep 16, 2024

effect sizes vs power #28

effect sizes vs power #28

Comments

coralzhang commented Sep 9, 2024

KatharinaSchmid commented Sep 10, 2024

coralzhang commented Sep 12, 2024

coralzhang commented Sep 16, 2024