You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A workflow for gene family tree inference using GeneRax, from multiple sequence alignments.
Description
A shell pipeline that takes raw Multiple Sequence Alignments (MSAs), filters, trims, and infers gene trees, and runs GeneRax for species-tree–aware gene-tree reconciliation.
Pipeline overview
MSAs (.fa)
│
Filter MSAs — remove gene families with fewer than N species
│
trimAl — clean, autotrim
│
FastTreeMP — infer one maximum-likelihood gene tree per family
│
Build families file — assemble the GeneRax [FAMILIES] input file
│
GeneRax — reconcile gene trees with the species tree
Per-family MSAs in FASTA format, has been tested on OrthoFinder's (MultipleSequenceAlignments/)
Species tree
A species tree in Newick format
Key parameters
Parameter
Default
Description
MIN_TAXA
7
Minimum number of distinct species for a gene family to be retained
ncores
40
Parallel jobs for trimming and GeneRax MPI ranks
msa_path
./MultipleSequenceAlignments
Path to MSA directory
species_tree
./species_tree.nw
Path to species tree file in Newick Format
Note that these parameters can be changed by editing them in the configuration section of the script.
Output
Output
Description
trim_path/*.fa
Trimmed MSAs
tree_path/*.tree
Per-family FastTree gene trees (Newick)
out_file
GeneRax [FAMILIES] input file
generax_out/
GeneRax reconciliation results
Notes
Stop codons (*) are replaced with gaps (-) before trimming and selenocysteine residues (U) are recoded as unknown (X) to prevent fastree and GeneRax errors.
trimAl's -automated1 heuristic selects the trimming method automatically based on alignment properties.
The mpiexec -oversubscribe flag allows more MPI ranks than physical cores; remove it if your MPI implementation does not support it.