Document ID: RA-DESEQ2-001
Version: 1.0
Date: 2026-05-24
Author: Bhavitha Kandru (kandru.b@northeastern.edu)
Status: Approved
Framework: GAMP 5 risk-based approach + ICH Q9
| System component | GAMP Category | Justification |
|---|---|---|
| R interpreter | Category 1 | Infrastructure software |
| DESeq2 package | Category 3 | Non-configured COTS, used as published |
| run_deseq2.R | Category 5 | Custom application; full validation required |
The validation effort focuses on Category 5 components.
| Severity / Likelihood | Low (1) | Medium (2) | High (3) |
|---|---|---|---|
| Low (1) | Low | Low | Medium |
| Medium (2) | Low | Medium | High |
| High (3) | Medium | High | Critical |
| Risk ID | Description | Severity | Likelihood | Risk Level | Mitigation | Residual Risk |
|---|---|---|---|---|---|---|
| R-001 | Sample sheet swapped (control labeled as treated, vice versa) | High | Medium | High | TC-PQ-003 verifies fold-change direction against known data; PCA visualization recommended at use | Medium |
| R-002 | Input count matrix contains non-integer values | Medium | Low | Low | DESeq2 raises explicit error; documented as user responsibility | Low |
| R-003 | Sample names in sheet don't match counts file | High | Medium | High | TC-DI-003 detects mismatch and aborts before analysis | Low |
| R-004 | Insufficient replicates for dispersion estimation | High | Low | Medium | DESeq2 raises explicit error; PQ uses reference dataset with sufficient replicates | Low |
| R-005 | Multiple-testing correction not applied | High | Low | Medium | TC-PQ-004 verifies padj column properties | Low |
| R-006 | Software version drift between runs | Medium | Medium | Medium | Every run logs versions; revalidation triggered by version change (see 09_revalidation_triggers.md) | Low |
| R-007 | Reference level for comparison mis-specified | Medium | Medium | Medium | Reference level defaults to "control"; documented in URS-005; explicit CLI argument | Low |
| R-008 | Output file overwritten without versioning | Medium | High | High | Output directory must be specified per run; downstream consumer responsible for archival | Medium |
| R-009 | Underlying DESeq2 statistical method changes between versions | High | Low | Medium | DESeq2 version pinned in environment; revalidation on major version change | Low |
| R-010 | Input file corrupted/truncated during transfer | Medium | Low | Low | TC-DI-004 verifies expected dimensions; upstream system responsibility | Low |
All risks are reduced to "Low" or "Medium" residual level. No critical or high residual risks remain.
The system is acceptable for use in research and pre-clinical development.
For GxP production use, additional risk controls would be required:
- R-001: Mandatory pre-analysis PCA review
- R-008: Output directory naming convention with timestamps and write-once enforcement
- R-006: Reproducible execution environment (Docker container with pinned dependencies)
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | Bhavitha Kandru | 2026-05-24 | _____ |
| QA Reviewer | _____ | _____ | _____ |
| Approver | _____ | _____ | _____ |