Skip to content

ArtemisKotoula/MSA-to-GeneRax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

MSA-to-GeneRax

A workflow for gene family tree inference using GeneRax, from multiple sequence alignments.

Description

A shell pipeline that takes raw Multiple Sequence Alignments (MSAs), filters, trims, and infers gene trees, and runs GeneRax for species-tree–aware gene-tree reconciliation.

Pipeline overview

    MSAs (.fa)
        │
    Filter MSAs       — remove gene families with fewer than N species
        │
      trimAl          — clean, autotrim
        │
    FastTreeMP        — infer one maximum-likelihood gene tree per family
        │
Build families file   — assemble the GeneRax [FAMILIES] input file
        │  
    GeneRax           — reconcile gene trees with the species tree

Usage

bash ./ortho_to_generax_workflow.sh --min_taxa <int> --ncores <int>

Note that, while

Input requirements

Input Description
msa_path/*.fa Per-family MSAs in FASTA format, has been tested on OrthoFinder's (MultipleSequenceAlignments/)
Species tree A species tree in Newick format

Key parameters

Parameter Default Description
MIN_TAXA 7 Minimum number of distinct species for a gene family to be retained
ncores 40 Parallel jobs for trimming and GeneRax MPI ranks
msa_path ./MultipleSequenceAlignments Path to MSA directory
species_tree ./species_tree.nw Path to species tree file in Newick Format

Note that these parameters can be changed by editing them in the configuration section of the script.

Output

Output Description
trim_path/*.fa Trimmed MSAs
tree_path/*.tree Per-family FastTree gene trees (Newick)
out_file GeneRax [FAMILIES] input file
generax_out/ GeneRax reconciliation results

Notes

  • Stop codons (*) are replaced with gaps (-) before trimming and selenocysteine residues (U) are recoded as unknown (X) to prevent fastree and GeneRax errors.
  • trimAl's -automated1 heuristic selects the trimming method automatically based on alignment properties.
  • The mpiexec -oversubscribe flag allows more MPI ranks than physical cores; remove it if your MPI implementation does not support it.

Dependencies

Tool Purpose
GNU parallel Parallel trimming
trimAl Alignment trimming
FastTree (FastTreeMP) Gene-tree inference
GeneRax Gene/species-tree reconciliation
mpiexec (OpenMPI / MPICH) MPI launcher for GeneRax

About

A workflow for gene family tree inference using GeneRax, from multiple sequence alignments.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages