methylKit is an R/Bioconductor package for DNA methylation analysis, handling data import, quality control, normalization, and differential methylation analysis from bisulfite sequencing data.
if (!require('BiocManager', quietly = TRUE))
install.packages('BiocManager')
BiocManager::install('methylKit')Tell your AI agent what you want to do:
- "Load my Bismark coverage files into methylKit and run QC"
- "Find differentially methylated CpGs between treatment and control"
- "Generate PCA and correlation plots for my methylation samples"
"Read my Bismark coverage files into methylKit with sample metadata"
"Import methylation data for 4 controls and 4 treated samples"
"Generate methylation and coverage statistics plots for all samples"
"Show me PCA and sample correlation for my methylation data"
"Find differentially methylated CpGs with at least 25% difference and q < 0.01"
"Run differential methylation analysis between tumor and normal samples"
"Identify hypermethylated and hypomethylated CpGs separately"
"Filter CpGs by coverage (minimum 10x) and normalize samples"
"Unite samples requiring CpG coverage in at least 3 samples per group"
- Create sample metadata with file paths, sample IDs, and treatment groups
- Import Bismark coverage files with methRead()
- Generate QC plots (coverage stats, methylation stats)
- Filter by coverage and normalize between samples
- Unite samples to get common CpGs
- Visualize sample relationships (PCA, correlation)
- Run differential methylation analysis
- Export significant differentially methylated CpGs
| Column | Description |
|---|---|
| chr, start, end | Genomic position |
| meth.diff | Methylation difference (%) |
| pvalue | Raw p-value |
| qvalue | FDR-adjusted p-value |
Positive meth.diff = hypermethylated in treatment Negative meth.diff = hypomethylated in treatment
- Use pipeline = 'bismarkCoverage' when reading Bismark .cov files
- Set destrand = TRUE in unite() to combine CpGs on both strands
- Typical filters: lo.count = 10 (minimum coverage), hi.perc = 99.9 (remove PCR artifacts)
- For memory issues, use save.db = TRUE for database-backed objects
- Use min.per.group in unite() if samples have variable coverage
- Overdispersion = 'MN' (multiplicative) is recommended for calculateDiffMeth()
- Common thresholds: difference = 25%, qvalue = 0.01