Clarification on input parameters MAF, N, and sdY in coloc for GWAS and eQTL data #178

Alice9503 · 2024-11-19T10:42:50Z

Hi,

I am a new user of the coloc package and need clarification on how to correctly set up input parameters, particularly MAF, N, and sdY. I am working with GWAS and eQTL data, where the GWAS dataset is much larger than the eQTL dataset.

Example of my data:

GWAS Data:

head(gwas)
   CHR_gwas    SNP_gwas POS_gwas A1_gwas A2_gwas N_gwas AF1_gwas   T_gwas SE_T_gwas P_noSPA_gwas  BETA_gwas
      <int>      <char>    <int>  <char>  <char>  <int>    <num>    <num>     <num>        <num>      <num>
1:       18   rs1573362 45455641       A       G 282601 0.501888 -42.2791   43.1588     0.327275 -0.0226980
2:       18  rs11874858 45457818       A       G 282263 0.451565  55.4193   42.9665     0.197111  0.0300194
3:       18   rs4940109 45458078       G       A 282282 0.480057  53.2239   43.1160     0.217040  0.0286306
4:       18   rs4940110 45458519       T       C 282275 0.480053  52.3038   43.1097     0.225026  0.0281438
5:       18  rs57620563 45458763       A       C 282070 0.446226 -48.6179   42.8916     0.257002 -0.0264272
6:       18 rs201752156 45458821     CAT       C 281829 0.485550  51.3425   43.0838     0.233383  0.0276598
     SE_gwas   P_gwas CONVERGE_gwas varbeta_gwas       rs_id
       <num>    <num>         <int>        <num>      <char>
1: 0.0231702 0.327275             1 0.0005368582   rs1573362
2: 0.0232740 0.197111             1 0.0005416791  rs11874858
3: 0.0231933 0.217040             1 0.0005379292   rs4940109
4: 0.0231966 0.225026             1 0.0005380823   rs4940110
5: 0.0233146 0.257002             1 0.0005435706  rs57620563
6: 0.0232106 0.233383             1 0.0005387320 rs201752156

eQTL Data:

head(eqtl)
   phenotype_id  variant_id start_distance         af ma_samples ma_count pval_nominal       slope   slope_se
         <char>      <char>          <int>      <num>      <int>    <int>        <num>       <num>      <num>
1:         AGRN rs757557694        -991531 0.01089498        612      615   0.50559341  0.02634526 0.03957399
2:         AGRN    rs806731        -989198 0.03244476       1263     1295   0.01467575 -0.06758886 0.02769577
3:         AGRN rs540662756        -972962 0.01857967        973      979   0.69826483  0.01224067 0.03157521
4:         AGRN rs114420996        -961307 0.03668183       1395     1419   0.37630071 -0.02341123 0.02646101
5:         AGRN  rs62637817        -959770 0.02768842       1195     1205   0.30569805 -0.02937410 0.02867709
6:         AGRN  rs62639104        -955190 0.02649279       1150     1158   0.54228284 -0.01783010 0.02925986
     chr
   <int>
1:     1
2:     1
3:     1
4:     1
5:     1
6:     1

My Questions and Observations:

1. MAF Calculation

I calculated MAF for the eQTL dataset as eqtl$MAF <- pmin(eqtl$af, 1 - eqtl$af) based on af (the ALT allele frequency). However:

The GWAS dataset is much larger than the eQTL dataset. Would it be more accurate to calculate MAF using AF1_gwas from the GWAS data instead of af from the eQTL data?
And the way we calculate MAF with AF, right?

2. Sample Size (𝑁)

In my eQTL dataset:

The intersection of samples between the expression and genotype data has 12,345 individuals.
Expression data has no missing values, but genotype data does, meaning each SNP could have a different effective sample size.
Should I use the overall sample size (12,345) for all SNPs, or calculate 𝑁 individually for each SNP (similar to N_gwas in the GWAS dataset)?

3. Understanding `sdY`

I understand that sdY refers to the standard deviation of the trait values (here, the gene expression levels in eQTL data). Right?
Why can it be estimated using 𝑁 and MAF? Isn’t MAF a concept specific to genotype data, not expression data? Could you explain the relationship between these parameters?

I appreciate any clarification and guidance on these issues!

Thank you!

The text was updated successfully, but these errors were encountered:

Alice9503 · 2024-11-19T11:35:16Z

Additionally, I have gene expression data for 12,345 individuals, and I am calculating sdY based on this dataset. I have the following questions:

Should sdY be calculated using all 12,345 individuals (since the expression data has no missing values), regardless of missing genotypes for specific SNPs? Or should sdY be calculated separately for each SNP, considering only the intersecting samples (as some SNPs may have missing genotypes)?
If sdY should be calculated across all individuals, I computed it using the following code:

sd_per_gene <- apply(pro_nor_dat[, -1], 2, sd)

Here are the results for some genes:

> head(sd_per_gene)
     A1BG     AAMDC    AARSD1     ABCA2   ABHD14B      ABL1 
0.9997928 1.0000313 0.9998905 1.0001168 0.9999087 1.0000225

Most genes have values very close to 1. In this case:

Is it valid to simplify the analysis by setting sdY = 1 for all genes?
Or do I need to use the precise sdY values calculated for each gene?

Thank you!

chr1swallace · 2024-11-19T17:25:14Z

for sdY, it looks like your data are standardised, which means sdY=1 for all genes.

On 19/11/2024 12:09, Alice9503 wrote: Additionally, I have gene expression data for 12,345 individuals, and I am calculating |sdY| based on this dataset. I have the following questions: 1. Should |sdY| be calculated using all 12,345 individuals (since the expression data has no missing values), regardless of missing genotypes for specific SNPs? Or should |sdY| be calculated separately for each SNP, considering only the intersecting samples (as some SNPs may have missing genotypes)? 2. If |sdY| should be calculated across all individuals, I computed it using the following code: |sd_per_gene <- apply(pro_nor_dat[, -1], 2, sd) | Here are the results for some genes: |> head(sd_per_gene) A1BG AAMDC AARSD1 ABCA2 ABHD14B ABL1 0.9997928 1.0000313 0.9998905 1.0001168 0.9999087 1.0000225 | Most genes have values very close to 1. In this case: * Is it valid to simplify the analysis by setting |sdY = 1| for all genes? * Or do I need to use the precise |sdY| values calculated for each gene? Thank you! — Reply to this email directly, view it on GitHub <#178 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQWR2B376IYR2GTZUYWRLL2BMS6PAVCNFSM6AAAAABSBZAE22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBVGQ3DSMZUGY>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

--------------gRtSnEWAVgwgQ1VumDdgXpxr Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit <!DOCTYPE html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <p>for sdY, it looks like your data are standardised, which means sdY=1 for all genes.<br> </p> <div class="moz-cite-prefix">On 19/11/2024 12:09, Alice9503 wrote:<br> </div> <blockquote type="cite" ***@***.***"> <p dir="auto">Additionally, I have gene expression data for 12,345 individuals, and I am calculating <code class="notranslate">sdY</code> based on this dataset. I have the following questions:</p> <ol dir="auto"> <li> <p dir="auto">Should <code class="notranslate">sdY</code> be calculated using all 12,345 individuals (since the expression data has no missing values), regardless of missing genotypes for specific SNPs? Or should <code class="notranslate">sdY</code> be calculated separately for each SNP, considering only the intersecting samples (as some SNPs may have missing genotypes)?</p> </li> <li> <p dir="auto">If <code class="notranslate">sdY</code> should be calculated across all individuals, I computed it using the following code:</p> </li> </ol> <pre class="notranslate"><code class="notranslate">sd_per_gene <- apply(pro_nor_dat[, -1], 2, sd) </code></pre> <p dir="auto">Here are the results for some genes:</p> <pre class="notranslate"><code class="notranslate">> head(sd_per_gene) A1BG AAMDC AARSD1 ABCA2 ABHD14B ABL1 0.9997928 1.0000313 0.9998905 1.0001168 0.9999087 1.0000225 </code></pre> <p dir="auto">Most genes have values very close to 1. In this case:</p> <ul dir="auto"> <li> <p dir="auto">Is it valid to simplify the analysis by setting <code class="notranslate">sdY = 1</code> for all genes?</p> </li> <li> <p dir="auto">Or do I need to use the precise <code class="notranslate">sdY</code> values calculated for each gene?</p> </li> </ul> <p dir="auto">Thank you!</p> <p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br> Reply to this email directly, <a href="#178 (comment)" originalsrc="#178 (comment)" shash="SDg+Bqk0Je++j2caKvf3w+v2HsWXYVq3LFrKfumAYwx2ZYfi34Y/kXBz+UuymOJ+VGgpbWOFW/CjglamKxf7BhgHjXg4C8fseDQ5ae05ufcIJewKG1/HkvxaB/C4959/khxNzhchuy/UOUXgVGUCgiOm5JLGh/7OFxz2Z9eR2qg=" moz-do-not-send="true">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAQWR2B376IYR2GTZUYWRLL2BMS6PAVCNFSM6AAAAABSBZAE22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBVGQ3DSMZUGY" originalsrc="https://github.com/notifications/unsubscribe-auth/AAQWR2B376IYR2GTZUYWRLL2BMS6PAVCNFSM6AAAAABSBZAE22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBVGQ3DSMZUGY" shash="Yq5T6XmHS5dXQmOftRjfzPdnXyeoCmWZE48XkRz650wADlWHmSqyPCgJk4hS66jiJ4DBf9nxy9kqG98p7DqV7f8Or0qg4Y1S2H7fsC11AYlo9JWdTU56dtaSy64nSE3s1SoH8S+22leKb6uXwWNnrvSNnJH8ZsGh65qwyO0B2nw=" moz-do-not-send="true">unsubscribe</a>.<br> You are receiving this because you are subscribed to this thread.<img src="https://github.com/notifications/beacon/AAQWR2DVM372MJC54UBSC5L2BMS6PA5CNFSM6AAAAABSBZAE22WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUUEVAKE.gif" alt="" moz-do-not-send="true" width="1" height="1"><span style="color: transparent; font-size: 0; display: none; visibility: hidden; overflow: hidden; opacity: 0; width: 0; height: 0; max-width: 0; max-height: 0; mso-hide: all">Message ID: <span><chr1swallace/coloc/issues/178/2485469346</span><span>@</span><span>github</span><span>.</span><span>com></span></span></p> <script type="application/ld+json">[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#178 (comment)", "url": "#178 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script> </blockquote> </body> </html>

…

--------------gRtSnEWAVgwgQ1VumDdgXpxr--

chr1swallace · 2024-11-19T17:27:30Z

Hi Alice, Thanks for a very clear question with examples of data - makes answering much easier! As you have beta and se (or slope and se) and sdY, you won't need MAF. For sample size, just supply one value - the size of the sample, not per snp. hth, Chris

On 19/11/2024 10:43, Alice9503 wrote: Hi, I am a new user of the |coloc| package and need clarification on how to correctly set up input parameters, particularly |MAF|, |N|, and |sdY|. I am working with GWAS and eQTL data, where the GWAS dataset is much larger than the eQTL dataset. Example of my data: GWAS Data: |head(gwas) CHR_gwas SNP_gwas POS_gwas A1_gwas A2_gwas N_gwas AF1_gwas T_gwas SE_T_gwas P_noSPA_gwas BETA_gwas <int> <char> <int> <char> <char> <int> <num> <num> <num> <num> <num> 1: 18 rs1573362 45455641 A G 282601 0.501888 -42.2791 43.1588 0.327275 -0.0226980 2: 18 rs11874858 45457818 A G 282263 0.451565 55.4193 42.9665 0.197111 0.0300194 3: 18 rs4940109 45458078 G A 282282 0.480057 53.2239 43.1160 0.217040 0.0286306 4: 18 rs4940110 45458519 T C 282275 0.480053 52.3038 43.1097 0.225026 0.0281438 5: 18 rs57620563 45458763 A C 282070 0.446226 -48.6179 42.8916 0.257002 -0.0264272 6: 18 rs201752156 45458821 CAT C 281829 0.485550 51.3425 43.0838 0.233383 0.0276598 SE_gwas P_gwas CONVERGE_gwas varbeta_gwas rs_id <num> <num> <int> <num> <char> 1: 0.0231702 0.327275 1 0.0005368582 rs1573362 2: 0.0232740 0.197111 1 0.0005416791 rs11874858 3: 0.0231933 0.217040 1 0.0005379292 rs4940109 4: 0.0231966 0.225026 1 0.0005380823 rs4940110 5: 0.0233146 0.257002 1 0.0005435706 rs57620563 6: 0.0232106 0.233383 1 0.0005387320 rs201752156 | eQTL Data: |head(eqtl) phenotype_id variant_id start_distance af ma_samples ma_count pval_nominal slope slope_se <char> <char> <int> <num> <int> <int> <num> <num> <num> 1: AGRN rs757557694 -991531 0.01089498 612 615 0.50559341 0.02634526 0.03957399 2: AGRN rs806731 -989198 0.03244476 1263 1295 0.01467575 -0.06758886 0.02769577 3: AGRN rs540662756 -972962 0.01857967 973 979 0.69826483 0.01224067 0.03157521 4: AGRN rs114420996 -961307 0.03668183 1395 1419 0.37630071 -0.02341123 0.02646101 5: AGRN rs62637817 -959770 0.02768842 1195 1205 0.30569805 -0.02937410 0.02867709 6: AGRN rs62639104 -955190 0.02649279 1150 1158 0.54228284 -0.01783010 0.02925986 chr <int> 1: 1 2: 1 3: 1 4: 1 5: 1 6: 1 | My Questions and Observations: 1. MAF Calculation I calculated MAF for the eQTL dataset as |eqtl$MAF <- pmin(eqtl$af, 1 - eqtl$af)| based on |af| (the ALT allele frequency). However: * The GWAS dataset is much larger than the eQTL dataset. Would it be more accurate to calculate |MAF| using |AF1_gwas| from the GWAS data instead of |af| from the eQTL data? * And the way we calculate |MAF| with AF, right? 2. Sample Size (𝑁) In my eQTL dataset: * The intersection of samples between the expression and genotype data has 12,345 individuals. * Expression data has no missing values, but genotype data does, meaning each SNP could have a different effective sample size. * Should I use the overall sample size (12,345) for all SNPs, or calculate 𝑁 individually for each SNP (similar to |N_gwas| in the GWAS dataset)? 3. Understanding |sdY| * I understand that sdY refers to the standard deviation of the trait values (here, the gene expression levels in eQTL data). *Right?* * Why can it be estimated using 𝑁 and |MAF|? Isn’t |MAF| a concept specific to genotype data, not expression data? Could you explain the relationship between these parameters? I appreciate any clarification and guidance on these issues! Thank you! — Reply to this email directly, view it on GitHub <#178>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQWR2ECMXK27STRKIDGCD32BMI37AVCNFSM6AAAAABSBZAE22VHI2DSMVQWIX3LMV43ASLTON2WKOZSGY3TCNRXGEZTEMA>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

--------------F4RRA4FEOlMvDxb07YJjgLXa Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit <!DOCTYPE html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <p>Hi Alice,</p> <p>Thanks for a very clear question with examples of data - makes answering much easier! As you have beta and se (or slope and se) and sdY, you won't need MAF. For sample size, just supply one value - the size of the sample, not per snp. <br> </p> <p>hth,<br> </p> <div class="moz-cite-prefix">Chris<br> </div> <div class="moz-cite-prefix">On 19/11/2024 10:43, Alice9503 wrote:<br> </div> <blockquote type="cite" ***@***.***"> <p dir="auto">Hi,</p> <p dir="auto">I am a new user of the <code class="notranslate">coloc</code> package and need clarification on how to correctly set up input parameters, particularly <code class="notranslate">MAF</code>, <code class="notranslate">N</code>, and <code class="notranslate">sdY</code>. I am working with GWAS and eQTL data, where the GWAS dataset is much larger than the eQTL dataset.</p> <h3 dir="auto">Example of my data:</h3> <h4 dir="auto">GWAS Data:</h4> <pre class="notranslate"><code class="notranslate">head(gwas) CHR_gwas SNP_gwas POS_gwas A1_gwas A2_gwas N_gwas AF1_gwas T_gwas SE_T_gwas P_noSPA_gwas BETA_gwas <int> <char> <int> <char> <char> <int> <num> <num> <num> <num> <num> 1: 18 rs1573362 45455641 A G 282601 0.501888 -42.2791 43.1588 0.327275 -0.0226980 2: 18 rs11874858 45457818 A G 282263 0.451565 55.4193 42.9665 0.197111 0.0300194 3: 18 rs4940109 45458078 G A 282282 0.480057 53.2239 43.1160 0.217040 0.0286306 4: 18 rs4940110 45458519 T C 282275 0.480053 52.3038 43.1097 0.225026 0.0281438 5: 18 rs57620563 45458763 A C 282070 0.446226 -48.6179 42.8916 0.257002 -0.0264272 6: 18 rs201752156 45458821 CAT C 281829 0.485550 51.3425 43.0838 0.233383 0.0276598 SE_gwas P_gwas CONVERGE_gwas varbeta_gwas rs_id <num> <num> <int> <num> <char> 1: 0.0231702 0.327275 1 0.0005368582 rs1573362 2: 0.0232740 0.197111 1 0.0005416791 rs11874858 3: 0.0231933 0.217040 1 0.0005379292 rs4940109 4: 0.0231966 0.225026 1 0.0005380823 rs4940110 5: 0.0233146 0.257002 1 0.0005435706 rs57620563 6: 0.0232106 0.233383 1 0.0005387320 rs201752156 </code></pre> <h4 dir="auto">eQTL Data:</h4> <pre class="notranslate"><code class="notranslate">head(eqtl) phenotype_id variant_id start_distance af ma_samples ma_count pval_nominal slope slope_se <char> <char> <int> <num> <int> <int> <num> <num> <num> 1: AGRN rs757557694 -991531 0.01089498 612 615 0.50559341 0.02634526 0.03957399 2: AGRN rs806731 -989198 0.03244476 1263 1295 0.01467575 -0.06758886 0.02769577 3: AGRN rs540662756 -972962 0.01857967 973 979 0.69826483 0.01224067 0.03157521 4: AGRN rs114420996 -961307 0.03668183 1395 1419 0.37630071 -0.02341123 0.02646101 5: AGRN rs62637817 -959770 0.02768842 1195 1205 0.30569805 -0.02937410 0.02867709 6: AGRN rs62639104 -955190 0.02649279 1150 1158 0.54228284 -0.01783010 0.02925986 chr <int> 1: 1 2: 1 3: 1 4: 1 5: 1 6: 1 </code></pre> <h3 dir="auto">My Questions and Observations:</h3> <h4 dir="auto">1. MAF Calculation</h4> <p dir="auto">I calculated MAF for the eQTL dataset as <code class="notranslate">eqtl$MAF <- pmin(eqtl$af, 1 - eqtl$af)</code> based on <code class="notranslate">af</code> (the ALT allele frequency). However:</p> <ul dir="auto"> <li> <p dir="auto">The GWAS dataset is much larger than the eQTL dataset. Would it be more accurate to calculate <code class="notranslate">MAF</code> using <code class="notranslate">AF1_gwas</code> from the GWAS data instead of <code class="notranslate">af</code> from the eQTL data?</p> </li> <li> <p dir="auto">And the way we calculate <code class="notranslate">MAF</code> with AF, right?</p> </li> </ul> <h4 dir="auto">2. Sample Size (𝑁)</h4> <p dir="auto">In my eQTL dataset:</p> <ul dir="auto"> <li> <p dir="auto">The intersection of samples between the expression and genotype data has 12,345 individuals.</p> </li> <li> <p dir="auto">Expression data has no missing values, but genotype data does, meaning each SNP could have a different effective sample size.</p> </li> <li> <p dir="auto">Should I use the overall sample size (12,345) for all SNPs, or calculate 𝑁 individually for each SNP (similar to <code class="notranslate">N_gwas</code> in the GWAS dataset)?</p> </li> </ul> <h4 dir="auto">3. Understanding <code class="notranslate">sdY</code></h4> <ul dir="auto"> <li> <p dir="auto">I understand that sdY refers to the standard deviation of the trait values (here, the gene expression levels in eQTL data). <strong>Right?</strong></p> </li> <li> <p dir="auto">Why can it be estimated using 𝑁 and <code class="notranslate">MAF</code>? Isn’t <code class="notranslate">MAF</code> a concept specific to genotype data, not expression data? Could you explain the relationship between these parameters?</p> </li> </ul> <p dir="auto">I appreciate any clarification and guidance on these issues!</p> <p dir="auto">Thank you!</p> <p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br> Reply to this email directly, <a href="#178" originalsrc="#178" shash="bDelE57W/O3+iSypWGZuKb8R+ttyBC5B/pE1/MpVbbo+2Cu6Kj8IhETm7AdXAimNxmtn6awselr0QtC/gbW6f0EzkpF4tSH2jBYfKlwhdzcKGuo/pRDouFII5H6HmhUdwooFM7Pfxih9/tGhTyWbXhMNUxp3VDmwXXw8Y9awZoo=" moz-do-not-send="true">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAQWR2ECMXK27STRKIDGCD32BMI37AVCNFSM6AAAAABSBZAE22VHI2DSMVQWIX3LMV43ASLTON2WKOZSGY3TCNRXGEZTEMA" originalsrc="https://github.com/notifications/unsubscribe-auth/AAQWR2ECMXK27STRKIDGCD32BMI37AVCNFSM6AAAAABSBZAE22VHI2DSMVQWIX3LMV43ASLTON2WKOZSGY3TCNRXGEZTEMA" shash="XsR1i4qvM3lW6Acp/MmUgg+JWlbfSQV6MJ3N+mV6Fa9BYvG3H1eIMIl3g6BUIGLr+7tPw9VW+qgGgDBmjB2jQFXkRdM3gOPSkt1fYjBrW780Ep/VJynN5OdfeVCWSsMMgAKiOlGWSz9F2Xu5tKz7c6/1q9zWu+Su3y1OZ8Yph6E=" moz-do-not-send="true">unsubscribe</a>.<br> You are receiving this because you are subscribed to this thread.<img src="https://github.com/notifications/beacon/AAQWR2DYUQANE4FNGINYQV32BMI37A5CNFSM6AAAAABSBZAE22WGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHJ6PTYDA.gif" alt="" moz-do-not-send="true" width="1" height="1"><span style="color: transparent; font-size: 0; display: none; visibility: hidden; overflow: hidden; opacity: 0; width: 0; height: 0; max-width: 0; max-height: 0; mso-hide: all">Message ID: <span><chr1swallace/coloc/issues/178</span><span>@</span><span>github</span><span>.</span><span>com></span></span></p> <script type="application/ld+json">[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#178", "url": "#178", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script> </blockquote> </body> </html>

…

--------------F4RRA4FEOlMvDxb07YJjgLXa--

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on input parameters MAF, N, and sdY in coloc for GWAS and eQTL data #178

Clarification on input parameters MAF, N, and sdY in coloc for GWAS and eQTL data #178

Alice9503 commented Nov 19, 2024

Alice9503 commented Nov 19, 2024

chr1swallace commented Nov 19, 2024 via email

chr1swallace commented Nov 19, 2024 via email

Clarification on input parameters MAF, N, and sdY in coloc for GWAS and eQTL data #178

Clarification on input parameters MAF, N, and sdY in coloc for GWAS and eQTL data #178

Comments

Alice9503 commented Nov 19, 2024

Example of my data:

GWAS Data:

eQTL Data:

My Questions and Observations:

1. MAF Calculation

2. Sample Size (𝑁)

3. Understanding sdY

Alice9503 commented Nov 19, 2024

chr1swallace commented Nov 19, 2024 via email

chr1swallace commented Nov 19, 2024 via email

3. Understanding `sdY`