All notable changes to the aa-tRNA-seq pipeline are documented in this file.
- Pipeline summary outputs (bcerror, odds_ratios) now report positions in tRNA-only coordinates (1-indexed) instead of full-reference coordinates that included adapter sequences. This fixes incorrect nucleotide positions in downstream tools like clover's
plot_tRNA_structure().
trim_referencerule produces a tRNA-only FASTA (trna_only.fa) by stripping 5'/3' adapter sequences from the adapted reference. This FASTA is used by clover for MODOMICS annotation and structure visualization.build_trna_reference.py --mode trimfor generating adapter-stripped tRNA-only FASTA files.get_bcerror_freqs.pyandcompute_odds_ratios.pyaccept--offset-5pand--offset-3pto filter adapter positions and convert to tRNA-only coordinates.
- Updated dorado from 1.3.1 to 1.4.0, basecalling model from v5.1.0 to v5.3.0, and modkit from 0.6.0 to >=0.6.1.
- Shell scripts (
setup-env.sh,setup-tools.sh) now readdorado_versionanddorado_modelfromconfig/config-base.ymlinstead of hardcoding defaults. - Updated stale documentation references for dorado and modkit paths/versions.
- Pipeline now warns on startup if installed dorado version does not match
dorado_versionin config.
bwa_alignOOM at 48 GB: decoupled dorado tags from alignment by stripping tags from FASTQ, droppingbwa mem -C, and injecting tags from the unaligned BAM afterward via newinject_ubam_tagsrule
- Quarto QC report with per-sample tabs (#81)
- Per-tRNA pairwise modification odds ratios (#85)
- Reference sequence similarity QC (#84)
- Squiggy session JSON export for Positron IDE
- Utility to collapse redundant GtRNAdb FASTA sequences
- Multiple 3' adapter support for PT tag detection
- Skip mode for reference validation
- Pre-download dorado mod base models rule (avoids race conditions)
- nvitop GPU monitoring dependency
classify_chargingswitched from GPU to CPU with parallel workers (8 threads)- WarpDemuX workflow simplified: eliminated
merge_pods_for_demux, passes raw POD5 dirs directly bwa_alignfiltering changed from-F 4to-F 20(also excludes reverse-strand reads)- Removed redundant awk position filter from
bwa_align - Removed
protected()directive fromrebasecalloutput
- Race condition when parallel GPU jobs download dorado modification models simultaneously
- Reverse-strand reads not filtered at alignment step
- Redundant awk position filter in
bwa_alignsuperseded by adapter-based filtering - Graceful fallback for
get_pipeline_commitwhen git unavailable - Various snakefmt formatting and test corrections
- Run manifest generation for reproducibility tracking
- Native SLURM cluster support with GPU configuration
- Pytest unit tests and expanded CI coverage (#80)
- Adapter position tagging with parasail alignment - adds PT tags (#79)
- MkDocs documentation website (#77)
- Publication citation (White et al. 2025 Nat Commun)
- Reference validation and building step to ensure tRNA sequences have proper CCA endings and adapter structure required for charging classification
- Optional WarpDemuX barcode demultiplexing support for pooled/multiplexed sequencing runs (#74)
- Optimized modkit thresholds from ModkitOpt
- Pixi package manager support as primary environment manager
- Mermaid diagram for workflow visualization
- Standardized output directory structure to nested sample paths
- Updated dorado version from 0.9.1 to 1.3.1
- Migrated GitHub Actions CI from conda to pixi (#75)
- Separated tool installation from environment activation (
pixi run setup) - Updated README to use pixi instead of conda (#72)
- Normalized file permissions across repository
- Major fix: Improved 5' adapter detection from 0.04% to 82% (#82)
- Resolved GLIBCXX version errors with configurable CUDA support
- Shell glob pattern errors in tests
- Claude Code session start hook for automated development setup (#69)
- Comprehensive CI/CD build and test checks (#68)
- New project initialization structure (#65)
- Prefer pandas over polars for stability on some cluster nodes
- Reduced Remora logging level
- LSF-specific cluster configuration (#63)
- Renamed test directory to recommended
.tests/location - Reduced and made optional dorado verbosity
- Downgraded numpy to fix remora stats compatibility
- Modkit integration for RNA modification analysis (#59)
modkit_pileuprule for modification pileupsmodkit_summaryrule for modification summariesmodkit_extractandmodkit_extract_fullrules for detailed modification data
- Automatic dorado and model download/installation (#56)
- Modified base calling support (pseU, m5C, inosine_m6A)
- Full modkit outputs with optimized memory allocation
- Eliminated support for FAST5 files - pipeline now POD5-only (#43)
- Reorganized output directory structure
- Renamed charging tags during transfer (ML→CL, MM→CM)
- Updated model download strategy
- Increased memory allocation for modkit rules
- Restored
-voption in dorado for proper verbosity control
- Rule for calculating CPM of charged/uncharged tRNAs (#28)
- Remora CCA classifier for charging state classification (#18)
- GPU pipeline support for
cca_classifyrule (#23) - Charging probability extraction and analysis (#46)
- Implemented ML threshold (≥200 = charged, <200 = uncharged)
- Compressed output files for storage efficiency
- Various tweaks to file handling (#27)
- Actually use the threshold value in classification
- Alignment filtering capabilities with configurable parameters (#13)
- Optional Remora signal metrics extraction
- Kmer models included in pipeline
- Logging and optional failed BAM outputs
- New BAM tag indicating why reads are filtered
- Support for processing reads from both pass and fail directories
- Cleanup filtering approach for full-length tRNA reads
- Updated test data
- Ignore supplementary and secondary alignments
- Snakemake v8 compatibility (#10)
- Insertion double-counting bug (#15)
- Dropped redundant summary align stats (#14)
- Added 'pod5' to list of possible pod5 directories
- Support for merging multiple sequencing runs per sample
- Support for unmapped BAM as input (#5)
- Pipeline commit and config recording for reproducibility (#9)
- Bedgraph output generation
- Alignment statistics calculations
- Base calling error frequency calculations
- Use v5.0.0 dorado models with modification calling
- Expose dorado and bwa command-line options
- Reworked alignment stats output (#8)
- Set rebasecalled outputs as read-only
- Keep additional BAM flags (e.g., pi) during processing (#1)
- Use -T → -C options to preserve all BAM tags from dorado
- Initial pipeline release
- Core workflow: POD5 merge → rebasecall → align → filter
- BWA MEM alignment to tRNA + adapter reference
- Post-alignment filtering for full-length tRNAs
- Basic summary statistics generation
- Snakemake workflow with modular rule structure
- Conda environment specification
- Sample configuration via TSV files