Skip to content

Commit 915e856

Browse files
committed
Merge branch 'master' into fgwas_v2_paper_sumstats_merge
2 parents 86b9d09 + 444afa3 commit 915e856

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+587
-5825
lines changed

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,9 @@ outputs/**
2222
**.mmm
2323
**.metafile
2424
docs/source/**
25+
example_data
26+
sim
27+
Icon
28+
**/.DS_Store
2529
example_data/*
30+
snipar/tests/tmp/*

docs/simulation.rst

+89-8
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,18 @@ To simulate data, please first create a directory to store the data:
1515

1616
``mkdir sim``
1717

18-
Now, we are going to simulate data for 3000 families genotyped at 1000 independent SNPs. We are going to simulate 20 generations of assortative mating with parental phenotype correlation 0.5.
18+
Now, we are going to simulate data for 3000 families, each with two full-siblings, genotyped at 1000 independent SNPs.
19+
We simulate a phenotype affected by direct genetic effects and assortative mating.
20+
We are going to simulate 20 generations of assortative mating with parental phenotype correlation 0.5, reaching an approximate equilibrium.
21+
The command for this is:
1922

20-
``simulate.py 1000 0.5 sim/ --nfam 3000 --impute --n_am 20 --r_par 0.5``
23+
``simulate.py 1000 0.5 sim/ --nfam 3000 --impute --n_am 20 --r_par 0.5 --save_par_gts``
24+
25+
where the first argument gives the number of causal SNPs, the second argument gives the
26+
random mating heritability, the third gives the output directory, --nfam gives the number of families, --impute
27+
tells the script to impute missing parental genotypes, --n_am gives the number of generations of assortative mating,
28+
the parental phenotype correlation is given by --r_par, and --save_par_gts tells the script to output the
29+
genotypes of the parents of the final generation in addition to the genotypes of the final generation.
2130

2231
Please change your working directory to sim/:
2332

@@ -81,12 +90,14 @@ and the direct and population effects should both be the same (1 in expectation)
8190
average parental NTC should be zero (in expectation). To check this, read in the
8291
effect estimate output files in *R* or look at them using a text viewer (e.g. less -S on a unix system).
8392

84-
To compute the PGS from the true direct genetic effects+estimation error (such as would be obtained from a GWAS),
93+
To compute the PGS from the true direct genetic effects+estimation error (such as would be obtained from a family-based GWAS),
8594
use the following command:
8695

8796
``pgs.py direct_v1 --bed chr_@ --imp chr_@ --weights causal_effects.txt --beta_col direct_v1``
8897

89-
It outputs the PGS to a :ref:`PGS file <pgs_file>`: direct_v1.pgs.txt.
98+
It outputs the PGS to a :ref:`PGS file <pgs_file>`: direct_v1.pgs.txt. (Notice also that the inferred
99+
correlation between parents' PGSs is lower than when using the true direct genetic effects as weights due to
100+
estimation error in the weights.)
90101

91102
To estimate direct effect and average NTC of the PGS, use the following command:
92103

@@ -100,12 +111,82 @@ from noisy weights (in direct_v1.1.effects.txt) will be smaller than the populat
100111
This is because the PGS does not capture all of the heritability due to estimation error in the weights.
101112
The PGS has its population effect inflated (relative to its
102113
direct effect) by assortative mating, which induces a correlation of the PGS with the component of the heritability
103-
not captured by the PGS due to estimation error. This inflation is not captured by the direct effect of the PGS
104-
because chromosomes segregate independently during meiosis. (In this simulation, all causal SNPs segregate independently.)
105-
Here, the ratio between direct and population effects of the PGS should be around 0.87.
114+
not directly captured by the PGS due to estimation error. This inflation is not captured by the direct effect of the PGS
115+
because of the within-family variation used to estimate the direct effect is due to the random segregation of genetic material during meiosis.
116+
Here, the ratio between direct and population effects of the PGS should be around 0.86.
106117

107118
One should also observe a statistically significant average parental NTC (in direct_v1.2.effects.txt) of the PGS from
108119
the two-generation model despite the absence of parental indirect genetic effects in this simulation. Here,
109120
the ratio between the average NTC and the direct effect should be around 0.15. This demonstrates
110121
that statistically significant average NTC estimates cannot be interpreted as demonstrating
111-
parental indirect genetic effects, especially for phenotypes affected by assortative mating.
122+
parental indirect genetic effects, especially for phenotypes affected by assortative mating.
123+
124+
Adjusting for assortative mating
125+
--------------------------------
126+
127+
We now show how to adjust two-generation PGI results for assortative mating
128+
using the procedure outlined in `Estimation of indirect genetic effects and heritability under assortative mating <https://www.biorxiv.org/content/10.1101/2023.07.10.548458v1.abstract>`_.
129+
The estimation procedure is summarized in this diagram:
130+
131+
.. image:: two_gen_estimation.png
132+
:scale: 30 %
133+
:align: center
134+
:alt: Two-generation estimation procedure accouting for assortative mating
135+
136+
The estimation requires as inputs: an estimate of the correlation between parents' scores, :math:`r_k`;
137+
the regression coefficients from two-generation PGI analysis, (:math:`\delta_{\text{PGI}:k},\alpha_{\text{PGI}:k}`);
138+
and a heritability estimate, :math:`h^2_f`,from MZ-DZ twin comparisons, `RDR <https://www.nature.com/articles/s41588-018-0178-9>`_, or sib-regression.
139+
140+
The estimation procedure outputs estimates of: :math:`k`, the fraction of heritability the PGI would explain in a random mating population;
141+
:math:`r_\delta`, the correlation between parents' true direct genetic effect components;
142+
:math:`h^2_\text{eq}`, the equilibrium heritability, adjusting for the downward bias in heritability estimates from
143+
MZ-DZ comparisons, RDR, and sib-regression;
144+
:math:`\alpha_\delta`, the indirect genetic effect of true direct genetic effect PGI;
145+
and :math:`v_{\eta:\delta}`, the fraction of phenotypic variance contribued by the indirect genetic effect component
146+
that is correlated with the direct genetic effect component.
147+
148+
We can use *snipar* to compute the two-generation PGI estimates and the correlation between parents' scores,
149+
and we can input a heritability estimate into *pgs.py* script to complete the inputs, so that
150+
*snipar* will perform the two-generation analysis adjusting for assortative mating.
151+
152+
To perform the estimation, we will combine the offspring and parental genotype files.
153+
This enables us to estimate the correlation between parents' scores
154+
using the observed parental genotypes. (This is better than using the sibling
155+
genotypes because the correlation estimate from observed parental genotypes is uncorrelated with the PGS regression coefficients.)
156+
157+
``plink --bfile chr_1 --bmerge chr_1_par --out chr_1_combined``
158+
159+
We now compute the noisy PGI using the observed offspring and parental genotypes:
160+
161+
``pgs.py direct_v1_obs --bed chr_@_combined --weights causal_effects.txt --beta_col direct_v1 --pedigree pedigree.txt``
162+
163+
To complete the inputs to two-generation PGI analysis, we need an estimate of heritability,
164+
as one would obtain from sib-regression, RDR, or MZ-DZ twin comparisons. This estimate is
165+
a downard biased estimate of the equilibrium heritability, :math:`h^2_\text{eq}`, by a factor of :math:`(1-r_\delta)`, where
166+
:math:`r_\delta` is the correlation between the parents' direct genetic effect components.
167+
168+
We can obtain this from the VCs.txt output of the simulation, which can be read into R/Python/etc as table.
169+
Each row gives, for each generation,
170+
the variance of the direct genetic effect component, the phenotypic variance, and the
171+
correlation between parents' direct genetic effect components. The equilibrium heritability is
172+
obtained by using the values in the last row:
173+
dividing the variance of the direct genetic effect component (first column) by the phenotypic variance
174+
(second column). To then obtain the heritability as estimated by sib-regression, RDR, and MZ-DZ twin comparisons,
175+
we multiply the equilibrium heritability by :math:`(1-r_\delta)`, where :math:`r_\delta` is obtained from the third column of
176+
the last row. The equilibrium heritability should be around 0.59, and :math:`r_\delta` should be around 0.29, so the heritability as estimated
177+
by sib-regression, RDR, MZ-DZ twin comparisons should be around :math:`h^2_f \approx (1-0.29) \times 0.59=0.42`.
178+
179+
We can now perform two-generation PGI analysis accounting for assortative mating using the following command,
180+
with the h2f argument set to the number computed from your VCs.txt file as outlined above (here we use 0.42):
181+
182+
``pgs.py direct_v1_obs --pgs direct_v1_obs.pgs.txt --phenofile phenotype.txt --h2f 0.42,0``
183+
184+
This script will take the input heritability estimate (0.42) and the standard error of the estimate (here 0 since we used the true value)
185+
to estimate the fraction of heritability the PGI would explain in a random mating population,
186+
:math:`k`, which should be around 0.5; the correlation between parents' direct genetic effect components, :math:`r_\delta`,
187+
which should be around 0.29; the equilibrium heritability, :math:`h^2_\text{eq}`, which should be around 0.59;
188+
the ratio between direct and population effects that would be expected based on assortative mating alone, :math:`\rho_k`,
189+
which should be around 0.86; the indirect genetic effect of true direct genetic effect PGI, :math:`\alpha_\delta`, which should not be
190+
statistically significantly different from zero (with high probability) because there are no parental indirect genetic effects in this simulation;
191+
and :math:`v_{\eta:\delta}`, the contribution to the phenotypic variance from the indirect genetic effect component correlated with direct genetic effect component,
192+
which should also not be statistically indistinguishable from zero (with high probability). These estimates are output to direct_v1_obs.am_adj_pars.txt.

0 commit comments

Comments
 (0)