Skip to content

Commit 5bd5506

Browse files
gwas by subtraction doc edit (#869)
- Edit some of the GWAS by subtraction document. Includes rewording and clarificatons
1 parent ed4d209 commit 5bd5506

3 files changed

Lines changed: 28 additions & 22 deletions

File tree

docs/Bioinformatics_Concepts/GWAS_By_Subtraction.md

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ It is useful to understand GWAS-by-subtraction via linear algebra.
1414
Consider a [Euclidian space](https://en.wikipedia.org/wiki/Euclidean_space) in which:
1515

1616
- GWAS traits are vectors.
17-
- The [inner product](https://en.wikipedia.org/wiki/Inner_product_space) of two traits is their [genetic covariance](Genetic_Correlation.md). Denote the inner product of $u$ and $v$ as $\langle u,v \rangle$.
18-
- We assume all phenotypes have been normalized to have variance of 1. Under this assumption, a trait's squared [Euclidian norm](https://en.wikipedia.org/wiki/Inner_product_space#Norm_properties) is its heritability: $\lVert v \rVert^2=h^2_v$ where $h^2_v$ is the heritability of $v$.
17+
- The [inner product](https://en.wikipedia.org/wiki/Inner_product_space) of two traits is their [genetic covariance](Genetic_Correlation.md#genetic-covariance). Denote the inner product of traits $u$ and $v$ as $\langle u,v \rangle$.
18+
- We assume all phenotypes have been normalized to have variance of 1. Under this assumption, a trait's squared [Euclidian norm](https://en.wikipedia.org/wiki/Inner_product_space#Norm_properties) is its heritability: $\lVert v \rVert^2=h^2_v$ where $h^2_v$ is the heritability of trait $v$.
1919

2020

2121

@@ -90,13 +90,14 @@ Where:
9090
- $x\in\mathbb{R}^M$ is the random genotype. We assume $x$ has mean zero, but unlike in [LDSC](LDSC.md), we do not assume it has been variance standardized. Let $H_i$ be the variance of the $i$th variant.
9191
- $\beta_F,\beta_R\in\mathbb{R}^M$ are the underlying causal effects of the genetic variants.
9292
- $F,R$ are the two orthonormal underlying factors.
93-
- $\delta_1, \delta_2$ are the non-genetic components of the two traits. We assume these effects are independent of all genotypes.
93+
- $a_F,a_R,b\in\mathbb{R}$ are the scalar multipliers that relate the normalized factors $F,R$ to the unnormalized factors $F',R'$.
94+
- $\delta_1, \delta_2\in\mathbb{R}$ are the random non-genetic components of the two traits. We assume these effects are independent of all genotypes.
9495

9596

9697
### Marginal Model
9798

9899

99-
Let's now focus on SNP $i$, and develop a model around the marginal GWAS regression on this SNP.
100+
Let's now focus on arbitrary SNP $i$, and model the marginal GWAS regression on this SNP.
100101

101102

102103
Define
@@ -125,32 +126,32 @@ R &= \hat\beta_{R,i}x_i+\zeta_{R,i}\\
125126
\end{align}
126127
$$
127128

128-
We assume $\zeta_{F,i},\zeta_{R_i}$ are approximately independent of $x_i$. This is a good approximation so long as individual variant effects ($\beta_{R,i},\beta_{R,i}$) are small, as is the case for most non-Mendelian traits.
129+
We assume $\zeta_{F,i},\zeta_{R_i}$ are approximately independent of $x_i$. While not strictly true, this is a good approximation so long as individual variant effects ($\beta_{R,i},\beta_{R,i}$) are small, as is the case for polygenic traits.
129130

130131
### Theoretical covariance
131132

132-
Next, let us examine the genetic covariance structure of the random variables $(x_i, T_1, T_2)$.
133+
Next, let us examine the genetic covariance structure of the scalar random variables $(x_i, T_1, T_2)$.
133134

134-
We will denote by $\mathrm{GCov}$ and $\mathrm{GVar}$ the genetic covariance and variance respectively\footnote{Because of our earlier assumption that phenotype variance has been normalized to 1, genetic variance equals heritability.}.
135+
We will denote by $\mathrm{GCov}$ and $\mathrm{GVar}$ the genetic covariance and variance respectively[^covnote].
135136

136137

137138
$$
138139
\begin{align}
139-
&\mathrm{GCov}(X_i, T_1)\\
140-
&=\mathrm{GCov}(X_i, a_F F + a_R R + \delta_1)\\
141-
&=\mathrm{Cov}(X_i, a_F F + a_R R ) & \text{Since $\delta_1$ is non-genetic}\\
142-
&=\mathrm{Cov}(X_i, a_F (\hat\beta_{F,i}X_i+\zeta_{F,i}) + a_R (\hat\beta_{R,i}X_i+\zeta_{R,i})+\delta_1)\\
140+
&\mathrm{GCov}(x_i, T_1)\\
141+
&=\mathrm{GCov}(x_i, a_F F + a_R R + \delta_1)\\
142+
&=\mathrm{Cov}(x_i, a_F F + a_R R ) & \text{Since $\delta_1$ is non-genetic}\\
143+
&=\mathrm{Cov}(x_i, a_F (\hat\beta_{F,i}x_i+\zeta_{F,i}) + a_R (\hat\beta_{R,i}x_i+\zeta_{R,i})+\delta_1)\\
143144
&\approx \left(a_F\hat\beta_{F,i}+a_R\hat\beta_{R,i}\right) H_i & \text{By approximate independence}
144145
\end{align}
145146
$$
146147

147148

148149
$$
149150
\begin{align}
150-
&\mathrm{GCov}(X_i, T_2)\\
151-
&=\mathrm{GCov}(X_i, b F+\delta_2)\\
152-
&=\mathrm{Cov}(X_i, b F) & \text{Since $\delta_2$ is non-genetic}\\
153-
&=\mathrm{Cov}(X_i, b (\hat\beta_{F,i}X_i+\zeta_{F,i}))\\
151+
&\mathrm{GCov}(x_i, T_2)\\
152+
&=\mathrm{GCov}(x_i, b F+\delta_2)\\
153+
&=\mathrm{Cov}(x_i, b F) & \text{Since $\delta_2$ is non-genetic}\\
154+
&=\mathrm{Cov}(x_i, b (\hat\beta_{F,i}x_i+\zeta_{F,i}))\\
154155
&\approx b\hat\beta_{F,i} H_i & \text{By approximate independence}
155156
\end{align}
156157
$$
@@ -361,3 +362,4 @@ Of the components of $\theta_i$ and $Q_i$, the most interesting is $\hat\beta_{R
361362

362363

363364

365+
[^covnote]: Because of our earlier assumption that phenotype variance has been normalized to 1, genetic variance equals heritability.

docs/Bioinformatics_Concepts/Genetic_Correlation.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ The model is
88

99
$$
1010
\begin{align}
11-
Y_A &= E_A + G_A,\\
12-
Y_B &= E_B + G_B.
11+
Y_A &= E_A + G_A, \label{model1} \\
12+
Y_B &= E_B + G_B. \label{model2}
1313
\end{align}
1414
$$
1515

@@ -27,4 +27,8 @@ What does it mean biologically when two traits are genetically correlated? The m
2727
Besides these straightforward cases, there are more exotic possible causes of genetic correlation, as discussed [here](https://gcbias.org/2016/04/19/what-is-genetic-correlation/). Briefly,
2828

2929
- Two traits can be genetically correlated because genetics affects the behavior of a parent, which affects the phenotype of their child.
30-
- Two traits can be genetically correlated because individuals with these traits tend to mate at a higher rate than would be expected under random mating.
30+
- Two traits can be genetically correlated because individuals with these traits tend to mate at a higher rate than would be expected under random mating.
31+
32+
## Genetic Covariance
33+
34+
Some applications require the calculation of the genetic covariance between two traits. In the context of the model of $(\ref{model1},\ref{model2})$, the genetic covariance is $\mathbb{Cov}(G_A, G_B)$. Note that genetic covariance depends strongly on how the traits are scaled.

docs/Bioinformatics_Concepts/Heritability.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Note that in the uncorrelated-additive model above, $h^2$ is equal to the [coeff
2222

2323
$$
2424
\begin{align}
25-
\mathrm{cor}(G,Y)^2&=\frac{\mathrm{Cov}(Y,G)^2}{\mathrm{Var}(G) \mathrm{Var(Y)}}\\
25+
\mathrm{Corr}(G,Y)^2&=\frac{\mathrm{Cov}(Y,G)^2}{\mathrm{Var}(G) \mathrm{Var(Y)}}\\
2626
&=\frac{\mathrm{Cov}(E+G,G)^2}{\mathrm{Var}(G) \mathrm{Var(Y)}}& \text{ by }(\ref{model})\\
2727
&=\frac{\mathrm{Var}(G)^2}{\mathrm{Var}(G) \mathrm{Var(Y)}} & \text{$G$ and $E$ uncorrelated}\\
2828
&=\frac{\mathrm{Var}(G)}{\mathrm{Var}(Y)}\\
@@ -74,20 +74,20 @@ $$
7474
\begin{align}
7575
G&:=\mathbb{E} (Y|g),\\
7676
E&:=Y-G,\\
77-
h^2&:= \frac{\mathbb{Var}(G) }{\mathbb{Var}(Y)}.
77+
h^2&:= \frac{\mathrm{Var}(G) }{\mathrm{Var}(Y)}.
7878
\end{align}
7979
$$
8080

8181
We have
8282

8383
$$
8484
\begin{align}
85-
\mathbb{Cov}(G,E)&=\mathbb{E}( \mathbb{E}(Y|g) -\mathbb{E}Y )( Y- \mathbb{E}(Y|g) )\\
85+
\mathrm{Cov}(G,E)&=\mathbb{E}( \mathbb{E}(Y|g) -\mathbb{E}Y )( Y- \mathbb{E}(Y|g) )\\
8686
&=0,
8787
\end{align}
8888
$$
8989

90-
where the last line follows from the Projection Theorem (pg. 345 in Grimmet and Stirzaker[@grimmett2020probability]). Where before we needed to assume $\mathbb{Cov}(E,G)=0$, here this property is automatic.
90+
where the last line follows from the Projection Theorem (pg. 345 in Grimmet and Stirzaker[@grimmett2020probability]). Where before we needed to assume $\mathrm{Cov}(E,G)=0$, here this property is automatic.
9191

9292

9393
- This approach has the **advantage** of its mathematical clarity. Whereas the standard definition of heritability requires some fairly restrictive assumptions, this alternative definition is applicable to any phenotype representable by a random variable in $L_2$. Mathematically, it is now crystal clear what we mean when we speak of $G$ and $E$.

0 commit comments

Comments
 (0)