Skip to content

Commit b5b5dee

Browse files
Yet more doc edits 2 (#857)
- Complete draft of GWAS by subtraction documentation
1 parent 3dc6b03 commit b5b5dee

2 files changed

Lines changed: 131 additions & 6 deletions

File tree

docs/Bioinformatics_Concepts/GWAS_By_Subtraction.md

Lines changed: 130 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,8 @@ $$
7474
\begin{align}
7575
F &= x^T \beta_{F} \\
7676
R &= x^T \beta_{R} \\
77-
T_1 &= \underbrace{a_F F}_{=:F'} + \underbrace{a_R R}_{=:R'} + \delta_1 \\
78-
T_2&= bF +\delta_2 \\
77+
T_1 &= \underbrace{a_F F}_{=:F'} + \underbrace{a_R R}_{=:R'} + \delta_1 \label{joint_t_1} \\
78+
T_2&= bF +\delta_2 \label{joint_t_2} \\
7979
\mathbb{Cov}(F,R)&=0\\
8080
\mathbb{Var}(F)&=1\\
8181
\mathbb{Var}(R)&=1.
@@ -169,7 +169,7 @@ $$
169169
\begin{align}
170170
&\mathrm{GCov}(T_1,T_2)\\
171171
&= \mathrm{GCov}(a_F F + a_R R+\delta_1, bF+\delta_2)\\
172-
&= \mathrm{Cov}(a_F F + a_R R, bF) & \text{Since $\delta_1,\deta_2$ are non-genetic}\\
172+
&= \mathrm{Cov}(a_F F + a_R R, bF) & \text{Since $\delta_1,\delta_2$ are non-genetic}\\
173173
&=a_Fb & \text{Since $F$ and $R$ are uncorrelated}
174174
\end{align}
175175
$$
@@ -212,7 +212,7 @@ $$
212212
\end{align}
213213
$$
214214

215-
Furthermore, we can apply [LDSC](LDSC.md) and [CT-LDSC](Cross_Trait_LDSC.md) to the $T_1$ and $T_2$ summary statistics to estimate their [genetic covariance](Genetic_Correlation.md) and [heritabilities](Heritability.md) (again, heritability equals genetic variance, since we have assumed that phenotype variances are normalized to 1). Denote these estimates as $L_{1,2},L_{1,1},L_{2,2}$.
215+
Furthermore, we can apply [LDSC](LDSC.md)[@bulik2015ld] and [CT-LDSC](Cross_Trait_LDSC.md)[@bulik2015atlas] to the $T_1$ and $T_2$ summary statistics to estimate their [genetic covariance](Genetic_Correlation.md) and [heritabilities](Heritability.md) (again, heritability equals genetic variance, since we have assumed that phenotype variances are normalized to 1). Denote these estimates as $L_{1,2},L_{1,1},L_{2,2}$.
216216

217217

218218
Combining the above, we have that the empirical covariance matrix of $(x_i, T_1, T_2)$ is
@@ -228,11 +228,136 @@ H_i & \hat\beta_{T_1,i} H_i & \hat\beta_{T_2,i} H_i\\
228228
\end{align}
229229
$$
230230

231+
### Solution
232+
233+
234+
We can equate $\Sigma_{\text{Empirical}}$ and $\Sigma_{\text{Theoretical}}$ to solve for $a_F, a_R, b, \hat\beta_{F,i}, \hat\beta_{R,i}$. We have:
235+
236+
237+
$$
238+
\begin{align}
239+
\Sigma_{\text{Theoretical}} & = \Sigma_{\text{Empirical}}\\
240+
\begin{bmatrix}
241+
H_i & (a_F\hat\beta_{F_i}+a_R\hat\beta_{R,i})H_i & b\hat\beta_{F,i}H_i \\
242+
(a_F\hat\beta_{F_i}+a_R\hat\beta_{R,i})H_i& a_F^2+a_R^2 & a_F b \\
243+
b\hat\beta_{F,i}H_i &a_F b & b^2
244+
\end{bmatrix}
245+
&=
246+
\begin{bmatrix}
247+
H_i & \hat\beta_{T_1,i} H_i & \hat\beta_{T_2,i} H_i\\
248+
\hat\beta_{T_1,i}H_i & L_{1,1} & L_{1,2}\\
249+
\hat\beta_{T_2,i} H_i & L_{1,2} & L_{2,2}
250+
\end{bmatrix}
251+
\end{align}
252+
$$
253+
254+
Solving the lower-right $2\times 2$ submatrix, we have:
255+
256+
$$
257+
\begin{align}
258+
b&=\sqrt{L_{2,2}} \label{b_solve} \\
259+
a_F&= \frac{L_{1,2}}{\sqrt{L_{2,2}}} \label{a_F_solve} \\
260+
a_R&=\sqrt{L_{1,1}-\frac{L_{1,2}^2}{L_{2,2}}} \label{a_R_solve} .
261+
\end{align}
262+
$$
263+
264+
265+
Equating the first columns of the two matrices yields
266+
267+
$$
268+
\begin{align}
269+
\hat\beta_{F,i}&=\frac{\hat\beta_{ T_2,i} }{b} \label{beta_F_solve}\\
270+
\hat\beta_{R,i}&=\frac{1}{a_R}\left(\hat\beta_{ T_1,i} -a_F\frac{\hat\beta_{ T_2,i} }{b}\right) \label{beta_R_solve}.
271+
\end{align}
272+
$$
273+
274+
Note from $(\ref{b_solve}, \ref{a_F_solve}, \ref{a_R_solve})$ that $a_F, a_R$ and $b$ do not depend on the specific genetic variant $i$ under consideration. This is consistent with the model specified in $(\ref{joint_t_1}, \ref{joint_t_2})$, in which $a_F, a_R$ and $b$ are global.
275+
276+
277+
To recap, given summary statistics for traits $T_1$ and $T_2$, we can:
278+
279+
- Run LDSC and CT-LDSC to estimate $L_{1,1},L_{1,2}, L_{2,2}$.
280+
- Apply $(\ref{b_solve},\ref{a_F_solve}, \ref{a_R_solve})$ to estimate $a_F,a_R,$ and $b$.
281+
- Apply $(\ref{beta_F_solve}, \ref{beta_R_solve})$ to estimate $\hat\beta_{F,i}, \hat\beta_{R,i}$ for each genetic variant $i$.
282+
283+
284+
We would like to synthesize summary statistics for $R$ in order to pass them to downstream analysis tools like [MAGMA](MAGMA_Overview.md) and [S-LDSC](S_LDSC_For_Cell_And_Tissue_ID.md). This requires estimates of the standard errors of $\hat\beta_{R,i}$.
285+
286+
287+
### Uncertainty
288+
289+
To estimate these standard errors, define $\nu\in\mathbb{R}^5$ to be the key non-redundant entries of $\Sigma_{\text{Empirical}}$. That is
290+
291+
$$
292+
\begin{align}
293+
\nu_i &:= (\Sigma_{\text{Empirical}, (1,2) }, \Sigma_{\text{Empirical}, (1,3)},
294+
\Sigma_{\text{Empirical}, (2,2)},
295+
\Sigma_{\text{Empirical}, (2,3)},
296+
\Sigma_{\text{Empirical}, (3,3)},
297+
)^T\\
298+
&= (
299+
\hat\beta_{T_1,i}H_i,
300+
\hat\beta_{T_2,i}H_i,
301+
L_{1,1},
302+
L_{1,2},
303+
L_{2,2}
304+
)^T.
305+
\end{align}
306+
$$
307+
308+
Let $\theta\in\mathbb{R}^5$ denote the key parameters we solve for. That is,
309+
310+
$$
311+
\begin{align}
312+
\theta_i&:= (a_F,a_R,b, \hat\beta_{F,i}, \hat\beta_{R,i})^T
313+
\end{align}
314+
$$
315+
316+
317+
Let $g:\mathbb{R}^5 \to \mathbb{R}^5$ denote the function mapping $\nu_i$ to $\theta_i$ via the solution method [above](#solution).
318+
319+
We estimate the standard error of $\theta$ using the [delta method](https://en.wikipedia.org/wiki/Delta_method).
320+
321+
The delta method says that if $K_i$ is the sampling covariance matrix of $\nu_i$, and $J_i$ is the Jacobian of $g$ evaluated at $\nu_i$, then the sampling covariance matrix of $\theta$ can be estimated as
322+
323+
$$
324+
\begin{align}
325+
Q_i:=J_iK_iJ_i^T.
326+
\end{align}
327+
$$
328+
329+
330+
- Computing $J_i$ requires only elementary calculus.
331+
- To simplify matters, we approximate $K_i$ as block diagonal. That is,
332+
333+
$$
334+
\begin{align}
335+
K_i \approx
336+
\begin{bmatrix}
337+
V_{\text{SNP},i} & 0 \\
338+
0 & V_{\text{LD}}
339+
\end{bmatrix}
340+
\end{align}
341+
$$
342+
343+
where $V_{\text{SNP},i}\in\mathbb{R}^{2\times 2}$ and $V_{\text{LD}}\in\mathbb{R}^{2 \times 2}$. This amounts to the assumption that, to a first approximation, the global linkage-disequilibrium score regression outputs and the local $\hat\beta_i$ do no covary.
344+
345+
- Standard linkage-disequilibrium score regression uses [the jackknife](https://en.wikipedia.org/wiki/Jackknife_resampling) to generate estimates of the sampling covariation of its output. We can use these estimates to populate $V_{\text{LD}}$.
346+
- We can populate $V_{\text{SNP},i}$ using the approach described in [the notes on LDSC](LDSC.md#sampling-noise-and-ldsc).
347+
348+
349+
Combining the above produces an estimate of $K_i$, to which we can apply the delta method to estimate $Q_i$, the sampling covariance of $\theta_i$.
350+
351+
352+
### Output
353+
354+
Of the components of $\theta_i$ and $Q_i$, the most interesting is $\hat\beta_{R,i}$ and its standard error. By repeating the above-described procedure for each variant $i$, we can estimate $\hat\beta_{R,i}$ and its standard error for all variants $i$. This provides us with a full set of GWAS summary statistics for $R$, the GWAS-by-subtraction component of $T_1$ orthogonal to $T_2$. We can then analyze these summary statistics using standard post-GWAS tools.
355+
356+
231357

232358

233359

234360

235361

236-
To be continued $\ldots$
237362

238363

docs/Bioinformatics_Concepts/LDSC.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -676,7 +676,7 @@ $$
676676
[^MHC_Note]: LDSC implementations usually exclude the MHC region, partially for this reason.
677677

678678

679-
## Sampling noise and LDSC
679+
## Sampling noise and LDSC
680680

681681

682682
For some applications such as Genomic SEM[@grotzinger2019genomic], it is of interest to use terms in the LDSC equation to

0 commit comments

Comments
 (0)