You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
&= \mathrm{GCov}(a_F F + a_R R+\delta_1, bF+\delta_2)\\
172
-
&= \mathrm{Cov}(a_F F + a_R R, bF) & \text{Since $\delta_1,\deta_2$ are non-genetic}\\
172
+
&= \mathrm{Cov}(a_F F + a_R R, bF) & \text{Since $\delta_1,\delta_2$ are non-genetic}\\
173
173
&=a_Fb & \text{Since $F$ and $R$ are uncorrelated}
174
174
\end{align}
175
175
$$
@@ -212,7 +212,7 @@ $$
212
212
\end{align}
213
213
$$
214
214
215
-
Furthermore, we can apply [LDSC](LDSC.md) and [CT-LDSC](Cross_Trait_LDSC.md) to the $T_1$ and $T_2$ summary statistics to estimate their [genetic covariance](Genetic_Correlation.md) and [heritabilities](Heritability.md) (again, heritability equals genetic variance, since we have assumed that phenotype variances are normalized to 1). Denote these estimates as $L_{1,2},L_{1,1},L_{2,2}$.
215
+
Furthermore, we can apply [LDSC](LDSC.md)[@bulik2015ld] and [CT-LDSC](Cross_Trait_LDSC.md)[@bulik2015atlas] to the $T_1$ and $T_2$ summary statistics to estimate their [genetic covariance](Genetic_Correlation.md) and [heritabilities](Heritability.md) (again, heritability equals genetic variance, since we have assumed that phenotype variances are normalized to 1). Denote these estimates as $L_{1,2},L_{1,1},L_{2,2}$.
216
216
217
217
218
218
Combining the above, we have that the empirical covariance matrix of $(x_i, T_1, T_2)$ is
Note from $(\ref{b_solve}, \ref{a_F_solve}, \ref{a_R_solve})$ that $a_F, a_R$ and $b$ do not depend on the specific genetic variant $i$ under consideration. This is consistent with the model specified in $(\ref{joint_t_1}, \ref{joint_t_2})$, in which $a_F, a_R$ and $b$ are global.
275
+
276
+
277
+
To recap, given summary statistics for traits $T_1$ and $T_2$, we can:
278
+
279
+
- Run LDSC and CT-LDSC to estimate $L_{1,1},L_{1,2}, L_{2,2}$.
280
+
- Apply $(\ref{b_solve},\ref{a_F_solve}, \ref{a_R_solve})$ to estimate $a_F,a_R,$ and $b$.
281
+
- Apply $(\ref{beta_F_solve}, \ref{beta_R_solve})$ to estimate $\hat\beta_{F,i}, \hat\beta_{R,i}$ for each genetic variant $i$.
282
+
283
+
284
+
We would like to synthesize summary statistics for $R$ in order to pass them to downstream analysis tools like [MAGMA](MAGMA_Overview.md) and [S-LDSC](S_LDSC_For_Cell_And_Tissue_ID.md). This requires estimates of the standard errors of $\hat\beta_{R,i}$.
285
+
286
+
287
+
### Uncertainty
288
+
289
+
To estimate these standard errors, define $\nu\in\mathbb{R}^5$ to be the key non-redundant entries of $\Sigma_{\text{Empirical}}$. That is
Let $g:\mathbb{R}^5 \to \mathbb{R}^5$ denote the function mapping $\nu_i$ to $\theta_i$ via the solution method [above](#solution).
318
+
319
+
We estimate the standard error of $\theta$ using the [delta method](https://en.wikipedia.org/wiki/Delta_method).
320
+
321
+
The delta method says that if $K_i$ is the sampling covariance matrix of $\nu_i$, and $J_i$ is the Jacobian of $g$ evaluated at $\nu_i$, then the sampling covariance matrix of $\theta$ can be estimated as
322
+
323
+
$$
324
+
\begin{align}
325
+
Q_i:=J_iK_iJ_i^T.
326
+
\end{align}
327
+
$$
328
+
329
+
330
+
- Computing $J_i$ requires only elementary calculus.
331
+
- To simplify matters, we approximate $K_i$ as block diagonal. That is,
332
+
333
+
$$
334
+
\begin{align}
335
+
K_i \approx
336
+
\begin{bmatrix}
337
+
V_{\text{SNP},i} & 0 \\
338
+
0 & V_{\text{LD}}
339
+
\end{bmatrix}
340
+
\end{align}
341
+
$$
342
+
343
+
where $V_{\text{SNP},i}\in\mathbb{R}^{2\times 2}$ and $V_{\text{LD}}\in\mathbb{R}^{2 \times 2}$. This amounts to the assumption that, to a first approximation, the global linkage-disequilibrium score regression outputs and the local $\hat\beta_i$ do no covary.
344
+
345
+
- Standard linkage-disequilibrium score regression uses [the jackknife](https://en.wikipedia.org/wiki/Jackknife_resampling) to generate estimates of the sampling covariation of its output. We can use these estimates to populate $V_{\text{LD}}$.
346
+
- We can populate $V_{\text{SNP},i}$ using the approach described in [the notes on LDSC](LDSC.md#sampling-noise-and-ldsc).
347
+
348
+
349
+
Combining the above produces an estimate of $K_i$, to which we can apply the delta method to estimate $Q_i$, the sampling covariance of $\theta_i$.
350
+
351
+
352
+
### Output
353
+
354
+
Of the components of $\theta_i$ and $Q_i$, the most interesting is $\hat\beta_{R,i}$ and its standard error. By repeating the above-described procedure for each variant $i$, we can estimate $\hat\beta_{R,i}$ and its standard error for all variants $i$. This provides us with a full set of GWAS summary statistics for $R$, the GWAS-by-subtraction component of $T_1$ orthogonal to $T_2$. We can then analyze these summary statistics using standard post-GWAS tools.
0 commit comments