This repository contains an end-to-end notebook workflow for analyzing SARS-CoV-2 genomic sequences from California, New York, and Texas.
- Data extraction and filtering by state
- Sequence cleaning and FASTA export
- Alignment and guide-tree construction
- Consensus sequence generation
- Mutation hotspot visualization
- Comparative phylogenetic artifacts
- bio-finall.ipynb
- Detailed run and output guide: OUTPUT_AND_RUN_GUIDE.md
- CALIFORNIA
- NEWYORK
- TEXAS
- consenses
- State-specific cleaned CSV and FASTA files
- Consensus FASTA sequences
- Multiple sequence alignment artifacts
- Newick tree files (.nwk)
- Plot images (.png/.jpg)
Install dependencies and run the notebook cells in order:
pip install -r requirements.txt
jupyter notebookNo license has been set yet. Add a LICENSE file if you plan to share publicly.