Skip to content

Commit 3c25eb7

Browse files
GiaoLeeclaude
andcommitted
docs: translate gsea skill documentation to English
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent b380e72 commit 3c25eb7

4 files changed

Lines changed: 195 additions & 195 deletions

File tree

Lines changed: 134 additions & 134 deletions
Original file line numberDiff line numberDiff line change
@@ -1,182 +1,182 @@
11
---
22
name: gsea
3-
description: 对按统计量排序的基因列表执行 GSEA 分析,输出富集结果表、运行分数表和绘图结果。
3+
description: Run GSEA on a ranked gene list and produce the enrichment table, running-score table, and enrichment plots.
44
license: MIT
55
author: AIPOCH
66
---
77
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
88
9-
## 何时读取外部文件
9+
## When to read external files
1010

11-
| 情况 | 读取文件 | 目的 |
11+
| Situation | Read | Purpose |
1212
|---|---|---|
13-
| 需要了解算法细节 | `references/algorithm.md` | 统计方法与公式 |
14-
| 需要执行分析 | `scripts/main.R` | 获取完整命令 |
15-
| 遇到报错 | `references/troubleshooting.md` | 查找解决方案 |
16-
| 需要 CLI 示例 | `references/cli-guide.md` | 参数用法示例 |
13+
| Need algorithm details | `references/algorithm.md` | Statistical method and formulas |
14+
| Need to run an analysis | `scripts/main.R` | Full command reference |
15+
| Hit an error | `references/troubleshooting.md` | Look up error codes and fixes |
16+
| Need CLI examples | `references/cli-guide.md` | Worked argument examples |
1717

18-
## 适用场景
18+
## Scope
1919

20-
适用于:
21-
- 对按统计量排序的基因列表执行 GSEA 分析
22-
- 基于已有 `enrichGSEA.csv` `gsea_running_scores.csv` 生成富集曲线图
23-
- 使用 `tests/data/sample_deg_results.csv` 做最小可运行验证
20+
Use this skill for:
21+
- Running GSEA on a gene list ranked by a statistic
22+
- Generating enrichment curve plots from existing `enrichGSEA.csv` and `gsea_running_scores.csv`
23+
- Smoke-testing the pipeline with `tests/data/sample_deg_results.csv`
2424

25-
不适用于:
26-
- 原始表达矩阵的差异分析
27-
- 单样本 ssGSEA
28-
- 网络分析或多组学整合分析
25+
Do not use it for:
26+
- Differential expression on raw expression matrices
27+
- Single-sample ssGSEA
28+
- Network analysis or multi-omics integration
2929

30-
## 使用方法
30+
## Usage
3131

32-
分析模式:
32+
Analysis mode:
3333
`Rscript scripts/main.R --input tests/data/sample_deg_results.csv --outdir ./GSEA_analysis --type KEGG --species human --seed 42 --timeout 300`
3434

35-
绘图模式:
35+
Plot mode:
3636
`Rscript scripts/main.R --running_file ./GSEA_analysis/Table/gsea_running_scores.csv --enrich_file ./GSEA_analysis/Table/enrichGSEA.csv --plot_output ./GSEA_analysis/plot/gsea_plot.pdf --top_n 5 --plot_format pdf --seed 42 --timeout 300`
3737

38-
说明:详见 `references/cli-guide.md`
38+
See `references/cli-guide.md` for more.
3939

40-
模式选择说明:
41-
- 仅提供 `--input` 时进入分析模式
42-
- 同时提供 `--running_file` `--enrich_file` 时进入绘图模式
43-
- 若同时提供分析参数与绘图参数,则绘图模式优先,分析模式会被跳过,并输出警告信息
40+
Mode selection:
41+
- Passing only `--input` runs analysis mode
42+
- Passing both `--running_file` and `--enrich_file` runs plot mode
43+
- If both sets of arguments are provided, plot mode takes precedence; analysis mode is skipped and a warning is logged
4444

45-
## 参数说明
45+
## Arguments
4646

47-
### 分析模式参数
47+
### Analysis-mode arguments
4848

49-
| 短参数 | 长参数 | 类型 | 默认值 | 是否必填 | 说明 |
49+
| Short | Long | Type | Default | Required | Description |
5050
|---|---|---|---|---|---|
51-
| `-i` | `--input` | character | `NULL` | | 输入 CSV 文件 |
52-
| `-o` | `--outdir` | character | `GSEA_analysis` | | 输出目录 |
53-
| `-g` | `--gene_col` | character | `name` | | 基因列名 |
54-
| `-f` | `--fc_col` | character | `logFC` | | 排序统计量列名 |
55-
| `-t` | `--type` | character | `KEGG` | | 基因集类型:`KEGG``HALLMARKS``GO_BP``GO_MF``GO_CC`;预载 RDS 中会自动将 `HALLMARKS` 映射到资产键 `Hallmarks` |
56-
| `-s` | `--species` | character | `human` | | 物种:`human``mouse``rat` |
57-
| `-p` | `--pvalue_cutoff` | numeric | `0.05` | | 显著性阈值 |
58-
| `-m` | `--method` | character | `fgsea` | | GSEA 方法:`fgsea` `DOSE` |
59-
| `-c` | `--chunk_size` | numeric | `1000` | | 大基因集转换时的分块大小 |
60-
| `-r` | `--rds_path` | character | `NULL` | | 预存基因集 RDS 路径 |
61-
| `-v` | `--verbose` | logical | `FALSE` | | 输出详细日志 |
62-
| | `--seed` | integer | `42` | | 随机种子 |
63-
| | `--timeout` | integer | `300` | | 超时秒数,`<=0` 表示不限制 |
64-
| `-h` | `--help` | logical | `FALSE` | | 显示帮助 |
65-
66-
### 绘图模式参数
67-
68-
| 短参数 | 长参数 | 类型 | 默认值 | 是否必填 | 说明 |
51+
| `-i` | `--input` | character | `NULL` | yes | Input CSV file |
52+
| `-o` | `--outdir` | character | `GSEA_analysis` | no | Output directory |
53+
| `-g` | `--gene_col` | character | `name` | no | Gene column name |
54+
| `-f` | `--fc_col` | character | `logFC` | no | Ranking-statistic column name |
55+
| `-t` | `--type` | character | `KEGG` | no | Gene-set type: `KEGG`, `HALLMARKS`, `GO_BP`, `GO_MF`, `GO_CC`. With a preloaded RDS, `HALLMARKS` is automatically mapped to the asset key `Hallmarks` |
56+
| `-s` | `--species` | character | `human` | no | Species: `human`, `mouse`, `rat` |
57+
| `-p` | `--pvalue_cutoff` | numeric | `0.05` | no | Significance threshold |
58+
| `-m` | `--method` | character | `fgsea` | no | GSEA backend: `fgsea` or `DOSE` |
59+
| `-c` | `--chunk_size` | numeric | `1000` | no | Chunk size for large gene-set conversion |
60+
| `-r` | `--rds_path` | character | `NULL` | no | Path to a pre-stored gene-set RDS |
61+
| `-v` | `--verbose` | logical | `FALSE` | no | Verbose logging |
62+
| | `--seed` | integer | `42` | no | Random seed |
63+
| | `--timeout` | integer | `300` | no | Timeout in seconds; `<=0` disables it |
64+
| `-h` | `--help` | logical | `FALSE` | no | Show help |
65+
66+
### Plot-mode arguments
67+
68+
| Short | Long | Type | Default | Required | Description |
6969
|---|---|---|---|---|---|
70-
| | `--running_file` | character | `NULL` | | `gsea_running_scores.csv` 路径 |
71-
| | `--enrich_file` | character | `NULL` | | `enrichGSEA.csv` 路径 |
72-
| | `--plot_output` | character | `gsea_plot.pdf` | | 输出图文件路径 |
73-
| | `--plot_width` | numeric | `8` | | 图宽 |
74-
| | `--plot_height` | numeric | `6` | | 图高 |
75-
| | `--plot_format` | character | `pdf` | | 输出格式:`pdf` `png` |
76-
| | `--top_n` | numeric | `1` | | 未指定 `geneSetID` 时绘制前 N 条通路 |
77-
| | `--rank_by` | character | `p.adjust` | | 通路排序列 |
78-
| | `--geneSetID` | character | `""` | | 逗号分隔的通路 ID |
79-
| | `--plot_title` | character | `""` | | 图标题 |
80-
| | `--colors` | character | `#4DBBD5,#E64B35,#00A087,#F39B7F,#3C5488,#8491B4` | | 颜色列表 |
81-
| | `--base_size` | numeric | `11` | | 基础字号 |
82-
| | `--subplots` | character | `1,2,3` | | 显示子图编号 |
83-
| | `--rel_heights` | character | `1.5,0.8,1` | | 子图高度比例 |
84-
| | `--NES_table` | logical | `TRUE` | | 显示 NES 注释 |
85-
| | `--no_NES_table` | logical | `FALSE` | | 关闭 NES 注释 |
86-
| | `--NES_label_size` | numeric | `4` | | NES 注释字号 |
87-
| | `--NES_label_x` | numeric | `0.75` | | NES 注释横向位置 |
88-
| | `--NES_label_y` | numeric | `0.75` | | NES 注释纵向位置 |
89-
| | `--NES_label_color` | character | `black` | | NES 注释颜色 |
90-
| | `--NES_label_hjust` | numeric | `0` | | NES 注释水平对齐 |
91-
| | `--NES_label_vjust` | numeric | `1` | | NES 注释垂直对齐 |
92-
| | `--line_width` | numeric | `1` | | ES 线宽 |
93-
| | `--dot_size` | numeric | `1.2` | | ES 点大小 |
94-
| | `--legend_position` | character | `auto` | | 图例位置 |
95-
| | `--legend_x` | numeric | `0.02` | | 内嵌图例横坐标 |
96-
| | `--legend_y` | numeric | `0.02` | | 内嵌图例纵坐标 |
97-
| | `--legend_just_x` | numeric | `0` | | 图例横向对齐 |
98-
| | `--legend_just_y` | numeric | `0` | | 图例纵向对齐 |
99-
| | `--legend_text_size` | numeric | `9` | | 图例文字大小 |
100-
| | `--legend_key_size` | numeric | `0.6` | | 图例键大小 |
101-
| | `--legend_bg_alpha` | numeric | `0` | | 图例背景透明度 |
102-
| | `--grid_major_color` | character | `grey92` | | 主网格颜色 |
103-
| | `--grid_minor_color` | character | `grey92` | | 次网格颜色 |
104-
| | `--ylab_es` | character | `Enrichment Score` | | ES 面板纵轴标题 |
105-
| | `--ylab_rank` | character | `Ranked List Metric` | | 排名面板纵轴标题 |
106-
| | `--xlab_rank` | character | `Rank in Ordered Dataset` | | 排名面板横轴标题 |
107-
| | `--hit_height` | numeric | `1` | | 命中条高度 |
108-
| | `--hit_gap` | numeric | `0` | | 命中条间距 |
109-
| | `--hit_linewidth` | numeric | `0.5` | | 命中条线宽 |
110-
| | `--rank_bar_alpha` | numeric | `0.9` | | 排名条透明度 |
111-
| | `--rank_bar_height_ratio` | numeric | `0.3` | | 排名条高度比例 |
112-
| | `--rank_metric_segment_color` | character | `grey` | | 排名线颜色 |
113-
| | `--rank_metric_segment_width` | numeric | `0.3` | | 排名线宽 |
114-
| | `--rank_metric_segment_alpha` | numeric | `1` | | 排名线透明度 |
115-
| | `--pvalue_table` | logical | `FALSE` | | 显示 P 值表 |
116-
| | `--ES_geom` | character | `line` | | ES 绘制方式:`line` `dot` |
117-
| | `--verbose` | logical | `FALSE` | | 输出详细日志 |
118-
| | `--seed` | integer | `42` | | 随机种子 |
119-
| | `--timeout` | integer | `300` | | 超时秒数,`<=0` 表示不限制 |
120-
| `-h` | `--help` | logical | `FALSE` | | 显示帮助 |
121-
122-
## 输入格式
123-
124-
分析模式输入为 CSV 文件,至少包含两列:
125-
- 基因列,默认列名为 `name`
126-
- 排序统计量列,默认列名为 `logFC`
127-
128-
示例:
70+
| | `--running_file` | character | `NULL` | yes | Path to `gsea_running_scores.csv` |
71+
| | `--enrich_file` | character | `NULL` | yes | Path to `enrichGSEA.csv` |
72+
| | `--plot_output` | character | `gsea_plot.pdf` | no | Output plot path |
73+
| | `--plot_width` | numeric | `8` | no | Plot width |
74+
| | `--plot_height` | numeric | `6` | no | Plot height |
75+
| | `--plot_format` | character | `pdf` | no | Output format: `pdf` or `png` |
76+
| | `--top_n` | numeric | `1` | no | Number of top pathways to plot when `geneSetID` is not given |
77+
| | `--rank_by` | character | `p.adjust` | no | Column used to rank pathways |
78+
| | `--geneSetID` | character | `""` | no | Comma-separated pathway IDs |
79+
| | `--plot_title` | character | `""` | no | Plot title |
80+
| | `--colors` | character | `#4DBBD5,#E64B35,#00A087,#F39B7F,#3C5488,#8491B4` | no | Color list |
81+
| | `--base_size` | numeric | `11` | no | Base font size |
82+
| | `--subplots` | character | `1,2,3` | no | Sub-panel indices to display |
83+
| | `--rel_heights` | character | `1.5,0.8,1` | no | Relative panel heights |
84+
| | `--NES_table` | logical | `TRUE` | no | Show NES annotation |
85+
| | `--no_NES_table` | logical | `FALSE` | no | Disable NES annotation |
86+
| | `--NES_label_size` | numeric | `4` | no | NES label font size |
87+
| | `--NES_label_x` | numeric | `0.75` | no | NES label x position |
88+
| | `--NES_label_y` | numeric | `0.75` | no | NES label y position |
89+
| | `--NES_label_color` | character | `black` | no | NES label color |
90+
| | `--NES_label_hjust` | numeric | `0` | no | NES label horizontal justification |
91+
| | `--NES_label_vjust` | numeric | `1` | no | NES label vertical justification |
92+
| | `--line_width` | numeric | `1` | no | ES line width |
93+
| | `--dot_size` | numeric | `1.2` | no | ES dot size |
94+
| | `--legend_position` | character | `auto` | no | Legend position |
95+
| | `--legend_x` | numeric | `0.02` | no | Inset legend x coordinate |
96+
| | `--legend_y` | numeric | `0.02` | no | Inset legend y coordinate |
97+
| | `--legend_just_x` | numeric | `0` | no | Legend horizontal justification |
98+
| | `--legend_just_y` | numeric | `0` | no | Legend vertical justification |
99+
| | `--legend_text_size` | numeric | `9` | no | Legend text size |
100+
| | `--legend_key_size` | numeric | `0.6` | no | Legend key size |
101+
| | `--legend_bg_alpha` | numeric | `0` | no | Legend background alpha |
102+
| | `--grid_major_color` | character | `grey92` | no | Major grid color |
103+
| | `--grid_minor_color` | character | `grey92` | no | Minor grid color |
104+
| | `--ylab_es` | character | `Enrichment Score` | no | ES panel y-axis title |
105+
| | `--ylab_rank` | character | `Ranked List Metric` | no | Rank panel y-axis title |
106+
| | `--xlab_rank` | character | `Rank in Ordered Dataset` | no | Rank panel x-axis title |
107+
| | `--hit_height` | numeric | `1` | no | Hit-bar height |
108+
| | `--hit_gap` | numeric | `0` | no | Hit-bar gap |
109+
| | `--hit_linewidth` | numeric | `0.5` | no | Hit-bar line width |
110+
| | `--rank_bar_alpha` | numeric | `0.9` | no | Rank-bar alpha |
111+
| | `--rank_bar_height_ratio` | numeric | `0.3` | no | Rank-bar height ratio |
112+
| | `--rank_metric_segment_color` | character | `grey` | no | Rank-line color |
113+
| | `--rank_metric_segment_width` | numeric | `0.3` | no | Rank-line width |
114+
| | `--rank_metric_segment_alpha` | numeric | `1` | no | Rank-line alpha |
115+
| | `--pvalue_table` | logical | `FALSE` | no | Show p-value table |
116+
| | `--ES_geom` | character | `line` | no | ES geometry: `line` or `dot` |
117+
| | `--verbose` | logical | `FALSE` | no | Verbose logging |
118+
| | `--seed` | integer | `42` | no | Random seed |
119+
| | `--timeout` | integer | `300` | no | Timeout in seconds; `<=0` disables it |
120+
| `-h` | `--help` | logical | `FALSE` | no | Show help |
121+
122+
## Input format
123+
124+
Analysis-mode input is a CSV with at least:
125+
- a gene column (default name `name`)
126+
- a ranking-statistic column (default name `logFC`)
127+
128+
Example:
129129
```csv
130130
name,logFC,pvalue,padj
131131
TP53,2.5,0.001,0.01
132132
BRCA1,1.8,0.005,0.02
133133
EGFR,-1.2,0.01,0.05
134134
```
135135

136-
取值约束:
137-
- `type` 支持 `KEGG``HALLMARKS``GO_BP``GO_MF``GO_CC`
138-
- 当使用预载 RDS 时,`HALLMARKS` 会自动匹配资产中的 `Hallmarks` 键名
139-
- `species` 支持 `human``mouse``rat`
136+
Value constraints:
137+
- `type` accepts `KEGG`, `HALLMARKS`, `GO_BP`, `GO_MF`, `GO_CC`
138+
- When using a preloaded RDS, `HALLMARKS` is automatically matched to the asset key `Hallmarks`
139+
- `species` accepts `human`, `mouse`, `rat`
140140

141-
## 输出文件
141+
## Output files
142142

143-
| 文件名 | 格式 | 内容说明 |
143+
| File | Format | Description |
144144
|---|---|---|
145-
| `data/GSEA_list.rda` | RDA | 完整 GSEA 结果对象 |
146-
| `Table/enrichGSEA.csv` | CSV | 富集结果表 |
147-
| `Table/gsea_running_scores.csv` | CSV | 运行分数表;若无富集结果则输出空表头文件 |
148-
| `plot/` | directory | 绘图输出目录 |
149-
| `session_info.txt` | TXT | R 版本与包版本信息 |
145+
| `data/GSEA_list.rda` | RDA | Full GSEA result object |
146+
| `Table/enrichGSEA.csv` | CSV | Enrichment result table |
147+
| `Table/gsea_running_scores.csv` | CSV | Running-score table; if no enrichment passes, a header-only file is still written |
148+
| `plot/` | directory | Plot output directory |
149+
| `session_info.txt` | TXT | R version and package versions |
150150

151-
`enrichGSEA.csv` 主要包含:`ID``Description``NES``pvalue``p.adjust``core_enrichment`
151+
`enrichGSEA.csv` mainly contains: `ID`, `Description`, `NES`, `pvalue`, `p.adjust`, `core_enrichment`.
152152

153-
## 错误处理
153+
## Error handling
154154

155-
常见错误码:
156-
- `SKILL_FILE_NOT_FOUND`:输入文件不存在
157-
- `SKILL_MISSING_COLUMNS`:缺少必要列
158-
- `SKILL_EMPTY_DATA`:输入数据为空或过滤后为空
159-
- `SKILL_INVALID_PARAMETER`:参数值不合法
160-
- `SKILL_PACKAGE_NOT_FOUND`:依赖包未安装
161-
- `SKILL_ANALYSIS_FAILED`:分析重试后仍失败
155+
Common error codes:
156+
- `SKILL_FILE_NOT_FOUND`: input file does not exist
157+
- `SKILL_MISSING_COLUMNS`: required columns are missing
158+
- `SKILL_EMPTY_DATA`: input is empty, or empty after filtering
159+
- `SKILL_INVALID_PARAMETER`: an argument has an invalid value
160+
- `SKILL_PACKAGE_NOT_FOUND`: a required package is not installed
161+
- `SKILL_ANALYSIS_FAILED`: GSEA still failed after retries
162162

163-
排查文档:`references/troubleshooting.md`
163+
Triage doc: `references/troubleshooting.md`
164164

165-
退出状态码:
166-
- `0`:运行成功
167-
- `1`:运行失败
165+
Exit codes:
166+
- `0`: success
167+
- `1`: failure
168168

169-
## 测试方法
169+
## Testing
170170

171-
最小测试数据集:`tests/data/sample_deg_results.csv`
171+
Minimal test dataset: `tests/data/sample_deg_results.csv`
172172

173-
最小运行命令:
173+
Minimal command:
174174
`Rscript scripts/main.R --input tests/data/sample_deg_results.csv --outdir ./test_output --type KEGG --species human --seed 42 --timeout 300 --verbose`
175175

176-
预期输出:
176+
Expected output:
177177
- `./test_output/data/GSEA_list.rda`
178178
- `./test_output/Table/enrichGSEA.csv`
179179
- `./test_output/Table/gsea_running_scores.csv`
180180
- `./test_output/session_info.txt`
181-
- 若无显著富集结果,`gsea_running_scores.csv` 仍会生成,但只包含表头
182-
- 退出状态码为 `0`
181+
- If no significant enrichment is found, `gsea_running_scores.csv` is still written but contains only the header
182+
- Exit code `0`
Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,38 @@
1-
# GSEA 方法说明
1+
# Algorithm
22

3-
## 方法概述
3+
## Overview
44

5-
本 Skill 接收按统计量排序的基因列表,使用 `clusterProfiler::GSEA()` 进行基因集富集分析。
6-
支持 `fgsea` `DOSE` 两种后端,支持 `KEGG``HALLMARKS``GO_BP``GO_MF``GO_CC` 五类基因集。
5+
This skill takes a gene list ranked by a statistic and runs gene-set enrichment analysis with `clusterProfiler::GSEA()`.
6+
It supports the `fgsea` and `DOSE` backends, and five gene-set families: `KEGG`, `HALLMARKS`, `GO_BP`, `GO_MF`, `GO_CC`.
77

8-
## 输入与预处理
8+
## Input and preprocessing
99

10-
输入文件为 CSV,至少包含:
11-
- 基因列,默认 `name`
12-
- 排序统计量列,默认 `logFC`
10+
The input file is a CSV with at least:
11+
- a gene column (default `name`)
12+
- a ranking-statistic column (default `logFC`)
1313

14-
脚本会执行以下预处理:
15-
1. 校验输入文件存在
16-
2. 校验列名存在
17-
3. 去除空值与空字符串
18-
4. `logFC` 降序生成排名向量
14+
The script preprocesses it by:
15+
1. Verifying the input file exists
16+
2. Verifying the required columns exist
17+
3. Dropping NA and empty-string entries
18+
4. Building a ranked vector by sorting on `logFC` descending
1919

20-
## 分析流程
20+
## Pipeline
2121

22-
1. 读取输入数据
23-
2. 加载基因集数据或读取 `--rds_path`
24-
3. 生成 `TERM2GENE`
25-
4. 运行 GSEA
26-
5. 导出结果表、运行分数表和会话信息
22+
1. Read input data
23+
2. Load the gene-set data, or read from `--rds_path`
24+
3. Build `TERM2GENE`
25+
4. Run GSEA
26+
5. Export the result table, running-score table, and session info
2727

28-
## 关键统计量
28+
## Key statistics
2929

30-
- `ES`:富集分数,表示运行曲线最大偏离量
31-
- `NES`:标准化富集分数,用于消除基因集大小影响
32-
- `p.adjust`:多重检验校正后的显著性指标
30+
- `ES`: enrichment score, the maximum deviation of the running curve
31+
- `NES`: normalized enrichment score, controlling for gene-set size
32+
- `p.adjust`: significance after multiple-testing correction
3333

34-
## 可复现性
34+
## Reproducibility
3535

36-
- 入口参数 `--seed` 默认值为 `42`
37-
- 运行结束写出 `session_info.txt`
38-
- 相同输入与参数组合应得到一致结果
36+
- The entry-point flag `--seed` defaults to `42`
37+
- `session_info.txt` is written at the end of the run
38+
- Identical input and arguments should yield identical results

0 commit comments

Comments
 (0)