This R project supports the analysis of sewage-based antibiotic resistance surveillance data generated using two approaches: an isolate-based approach and a gene-based metagenomic approach. The overall aim of the project is to directly compare, combine, and evaluate these approaches to determine how well they reflect clinical antibiotic resistance rates in Escherichia coli.
The study uses municipal sewage samples collected across ten European countries. From these samples, antibiotic resistance was assessed through two strategies. First, an isolate-based approach was applied using susceptibility testing of collected E. coli isolates. Second, a gene-based approach was applied using metagenomic sequencing to quantify antibiotic resistance genes in sewage.
This project uses data from three main sources:
Antibiotic susceptibility testing results from E. coli isolates recovered from municipal sewage samples.
Metagenomic sequencing outputs describing the abundance and distribution of antibiotic resistance genes in sewage samples.
Country-level clinical resistance prevalence estimates for E. coli, covering aminopenicillins, fluoroquinolones, third-generation cephalosporins, and aminoglycosides.
The analysis is implemented in R and focuses on data cleaning, integration, statistical modelling, and visualization. The core modelling framework uses beta regression, which is appropriate for proportional outcomes such as resistance prevalence.
The workflow includes:
- Importing and cleaning sewage isolate, metagenomic, and clinical datasets.
- Aggregating resistance indicators by country, sample and antibiotic class.
- Matching sewage-derived indicators to country- and antibiotic class-matched clinical resistance outcomes.
- Fitting beta regression models to quantify associations between sewage-based indicators and clinical resistance prevalence.
- Comparing model performance across isolate-based, gene-based, and combined approaches.
- Generating figures and summary tables for interpretation and reporting.
R (version 4.3 and above)
To rerun this analysis, follow the steps below.
- Clone the repo
git clone https://github.com/gilbertella/semar_wp1.git
- Install the following R packages
library(ggplot2)
library(pals)
library(knitr)
library(cowplot)
library(betareg)
library(tidyverse)
- Load the
analysis_file.Rmdand run all chunks - View outputs in the
tablesandfiguresfolders