Fix: Udichi issues#177
Conversation
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughRefactors curve fitting and input sanitization (adds hill/biphasic fit pipeline), introduces exported curveFittingPGX, adds AAC/ACn helpers and ICn/IC50 reference modes, updates many compute/sensitivity functions to use normalized fits, broad documentation/test additions, and minor CI/metadata and formatting edits. ChangesRepository & Metadata
Core curve-fitting & sanitization
AUC/AAC / related compute changes
ICn / IC50 / ACn / AC50 changes
Other compute and sensitivity adapations
PharmacoSet, methods, subsetting, and workflows
Documentation, vignettes & minor fixes
Sequence Diagram(s)sequenceDiagram
participant User as User/Caller
participant Sanitize as sanitizeInput()
participant Normalize as .normalizeHillPars/.normalizeBiphasicPars
participant LogReg as logLogisticRegression()
participant Curve as .pgx_hill_curve/.pgx_biphasic_curve
participant Optim as optim/.pgx_mesh_eval
User->>Sanitize: provide conc, viability, Hill_fit, flags
Sanitize->>Normalize: detect & normalize Hill/biphasic fit (if provided)
Normalize-->>Sanitize: normalized params (named vector)
Sanitize-->>User: sanitized conc/viability and normalized fit
User->>LogReg: call with conc, viability, fit_type
LogReg->>Optim: prepare bounds & initial guess, run optim
Optim->>Curve: evaluate curve at conc (iteratively)
Curve-->>Optim: fitted predictions
Optim->>LogReg: return optimized params or trigger mesh fallback
LogReg-->>User: return named params + Rsquare
sequenceDiagram
participant Caller as curveFittingPGX()
participant Split as per-experiment splitter
participant Fit as logLogisticRegression()
participant AUC as computeAUC/computeAAC
participant IC as computeIC50/computeAC50
participant Output as rbindlist/data.table
Caller->>Split: validate scrn_data and group by experiment
Split->>Fit: for each experiment, call logLogisticRegression(fit_type)
Fit-->>Split: fit object or NULL on failure
Split->>AUC: computeAUC/computeAAC (handle warnings->NA)
Split->>IC: computeIC50/computeAC50 (hill or fallback)
Split->>Output: assemble per-experiment metrics and curve points
Output-->>Caller: return requested output_type (metrics / curves / all)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.github/workflows/check-bioc.yml (1)
309-315:⚠️ Potential issue | 🟡 MinorUpdate
docker-build-and-pushmatrix to align withbuild-check: useRELEASE_3_22for R 4.5.The
build-checkjob uses Bioconductor 3.22 (RELEASE_3_22) with R 4.5, but thedocker-build-and-pushjob (lines 311-315) still usesRELEASE_3_21with R 4.5 and marks it asis-latest: true. This inconsistency should be resolved.🔧 Suggested fix
matrix: bioc-version: - { tag: "devel", r-version: "4.5.0", is-latest: false } - { - tag: "RELEASE_3_21", + tag: "RELEASE_3_22", r-version: "4.5.0", is-latest: true, } - { tag: "RELEASE_3_20", r-version: "4.4.2", is-latest: false, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/check-bioc.yml around lines 309 - 315, Update the docker-build-and-push job's matrix entry for bioc-version so it matches build-check: change the tag from "RELEASE_3_21" to "RELEASE_3_22", keep r-version "4.5.0", and ensure is-latest is true for the RELEASE_3_22 entry; modify the matrix key "bioc-version" in the workflow so both jobs use the same RELEASE_3_22 / R 4.5 configuration.R/rankGeneDrugSensitivity.R (1)
83-125:⚠️ Potential issue | 🟠 MajorFlatten
ncto actual column names before building the fallback matrix.Line 94 appends
nnandNAvalues to a list returned bylapply(). In the< 3samples path at Line 120, that gives the fallback matrix the wrong column count and malformeddimnameswhendrugphenohas multiple columns.Suggested fix
nc <- lapply(seq_len(ncol(drugpheno)), function(i) { est <- paste("estimate", i, sep = ".") se <- paste("se", i, sep = ".") tstat <- paste("tstat", i, sep = ".") nc <- c(est, se, tstat) return(nc) }) - nc <- c(nc, n = nn, "fstat" = NA, "pvalue" = NA, "fdr") + nc <- c( + unlist(nc, use.names = FALSE), + "n", + "fstat", + "pvalue", + "fdr" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/rankGeneDrugSensitivity.R` around lines 83 - 125, The variable nc is a list when drugpheno has multiple columns (constructed by lapply), so later using nc as dimnames yields wrong column counts; convert the list to a flat character vector before appending extra names. Replace the block that builds nc in the multi-column branch (the lapply(...) block and the following c(...)) with code that unlists the lapply result (e.g., nc <- unlist(nc) or build a character vector directly) and then append the "n" (nn), "fstat", "pvalue", and "fdr" names so nc is a single character vector used as dimnames when creating the fallback matrix.
🟠 Major comments (24)
vignettes/pharmacogx.bib-30-30 (1)
30-30:⚠️ Potential issue | 🟠 MajorRemove local
filefields from committed BibTeX entries.These fields leak local user/machine path data (
/Users/petr/...) and are non-portable. They should not be versioned in shared bibliography files.Suggested cleanup
- file = {Full Text PDF:/Users/petr/Library/Application Support/Zotero/Profiles/vplr590v.default/zotero/storage/JVR857CJ/Subramanian et al. - 2005 - Gene set enrichment analysis A knowledge-based ap.pdf:application/pdf;Snapshot:/Users/petr/Library/Application Support/Zotero/Profiles/vplr590v.default/zotero/storage/J2PIQCGR/15545.html:text/html}- file = {Snapshot:/Users/petr/Library/Application Support/Zotero/Profiles/vplr590v.default/zotero/storage/DJDXBSUX/ng1180.html:text/html}- file = {Full Text PDF:/Users/petr/Library/Application Support/Zotero/Profiles/vplr590v.default/zotero/storage/UN964QIJ/Lamb et al. - 2006 - The Connectivity Map Using Gene-Expression Signat.pdf:application/pdf;Snapshot:/Users/petr/Library/Application Support/Zotero/Profiles/vplr590v.default/zotero/storage/F4JHRMVW/Lamb et al. - 2006 - The Connectivity Map Using Gene-Expression Signat.html:text/html}Also applies to: 60-60, 102-102
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vignettes/pharmacogx.bib` at line 30, Remove the local "file" BibTeX fields that leak user machine paths: search the .bib for any lines like file = {…} (the entries shown in the diff) and delete those entire "file" fields so only portable metadata remains; ensure you remove all other occurrences of the same "file" field in this bibliography (the other two reported occurrences) before committing.R/zzz.R-8-10 (1)
8-10:⚠️ Potential issue | 🟠 Major
on.exit(options(oldOpts))will undo thebhklab.startup_setting.Capturing all options with
options()and restoring them viaon.exit()will also resetbhklab.startup_back toNULLwhen the function exits, defeating the purpose of preventing repeated startup messages. Only thewarnoption should be saved and restored.🐛 Proposed fix to only save/restore the warn option
if (interactive() && is.null(getOption('bhklab.startup_'))) { - oldOpts <- options() - options(warn = -1) - on.exit(options(oldOpts)) + oldWarn <- getOption("warn") + options(warn = -1) + on.exit(options(warn = oldWarn))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/zzz.R` around lines 8 - 10, The current code captures all options into oldOpts and restores them with on.exit(options(oldOpts)), which inadvertently resets bhklab.startup_ to NULL; instead only save and restore the warn option: capture the current warn with a variable (e.g., oldWarn), call options(warn = -1), and use on.exit(options(warn = oldWarn)) so you don't clobber bhklab.startup_ or other global options—replace usages of oldOpts and on.exit(options(oldOpts)) accordingly.R/filterNoisyCurves.R-83-91 (1)
83-91:⚠️ Potential issue | 🟠 MajorHandle empty pair-set in
.computeCumSumDeltato avoid-Inf.For
length(xx) == 2, Line 87 removes all pairs, socum.sumbecomes empty and Line 91 returns-Infwith a warning.Proposed fix
tt <- t(combn(seq_along(xx), 2, simplify = TRUE)) tt <- tt[which(((tt[, 2] - tt[, 1]) >= 2) == TRUE), ] + if (nrow(tt) == 0) { + return(0) + } cum.sum <- unlist(lapply(seq_len(nrow(tt)), function(x) { xx[tt[x, 2]] - xx[tt[x, 1]] }))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/filterNoisyCurves.R` around lines 83 - 91, The .computeCumSumDelta logic can produce an empty pair-set (tt) for short inputs and lead to cum.sum being empty and max(cum.sum) returning -Inf; modify the function to check after filtering tt (or after computing cum.sum) whether there are zero rows/elements and return 0 (or a sensible default) in that case; specifically update the block that builds tt and cum.sum (variables tt, cum.sum and input xx) to return 0 when nrow(tt) == 0 or length(cum.sum) == 0 before calling max().R/methods-summarizeMolecularProfiles.R-190-200 (1)
190-200:⚠️ Potential issue | 🟠 MajorThe new missing-
ucellfallback is still bypassed before it runs.Line 194 force-reindexes
ddbyrownames(pp), so identifiers that exist only incolnames(dd)can still fail there, and Line 197 filterscell.linesagainstsampleidonly, so rowname/assay-name matches are dropped whenfill.missing = FALSE. Both happen before thealt_idx/alt_ddrecovery below, so the new resolver never sees those cases. Please resolve/filter against the combinedsampleid+rownames(pp)+colnames(dd)name set before reordering.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/methods-summarizeMolecularProfiles.R` around lines 190 - 200, The reindexing of dd and filtering of cell.lines happens before the ucell fallback (alt_idx/alt_dd) can run, so identifiers present only in colnames(dd) or in rownames(pp) get dropped; to fix, compute a unified name set first (e.g., union of pp rownames, dd colnames and pp[,"sampleid"] non-NA values), then use that combined set to filter and reorder dd and to intersect cell.lines (respecting fill.missing) so that alt_idx/alt_dd can still find matches; update the logic around dd <- dd[, rownames(pp), drop = FALSE], the cell.lines <- intersect(...) line, and any downstream uses so they operate on the combined name set before invoking alt_idx/alt_dd.tests/testthat/test_summarizeMolecularProfiles_missing_ucell.R-5-18 (1)
5-18:⚠️ Potential issue | 🟠 MajorThese tests never execute
summarizeMolecularProfiles.
matching_logic()is a copy of the new branch, and the fixtures are plain matrices/data.frames, so this file can go green even if the real method still fails earlier duringddreindexing or later when it rebuildscolData. Please drive these cases through a minimalPharmacoSetfixture and assert on the public method instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test_summarizeMolecularProfiles_missing_ucell.R` around lines 5 - 18, The test currently exercises a local copy matching_logic() instead of the public summarizeMolecularProfiles API; replace the direct matching_logic usage with a minimal PharmacoSet fixture (populate assayMatrix/dd as a matrix, colData with sample ids matching/mismatching ucell, and featureData/rowRanges as needed) and call summarizeMolecularProfiles(...) with that PharmacoSet and the same ucell inputs so the real reindexing and colData reconstruction run; assert on the returned PharmacoSet (or its assays/colData) to verify dd column alignment and rebuilt colData rather than inspecting the internal matching_logic function.R/geneDrugPerturbation.R-50-55 (1)
50-55:⚠️ Potential issue | 🟠 MajorKeep
durationin the model when there are two time points.Line 52 now requires more than two distinct durations. Two non-missing time points are already enough to estimate a numeric duration effect, so common 2-timepoint studies will stop adjusting for time and can shift the concentration coefficient.
Proposed fix
- ### add experiment duration if the vector consists of more than two different values + ### add experiment duration if the vector consists of more than one different value - if (length(sort(unique(duration))) > 2) { + unique_duration <- unique(duration[!is.na(duration)]) + if (length(unique_duration) > 1) { ff0 <- sprintf("%s + duration", ff0) ff <- sprintf("%s + duration", ff) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/geneDrugPerturbation.R` around lines 50 - 55, The model currently only adds duration when there are more than two distinct durations; change it to include duration whenever there are at least two non-missing distinct time points. Replace the condition in the block that updates ff0 and ff to check length(unique(duration[!is.na(duration)])) >= 2 (or > 1) so two timepoints will trigger adding " + duration" to ff0 and ff; reference the ff0, ff and duration symbols when making this change.R/plotPSig.R-115-116 (1)
115-116:⚠️ Potential issue | 🟠 MajorAdd rlang import for
.datapronoun usage.Lines 116 and 131 use
.data$X,.data$Y,.data$Cutoff, and.datadirectly without importing the.datapronoun from rlang. This will cause R CMD check to warn about undefined global variables.Add one of the following:
importFrom(rlang, .data)to NAMESPACE, or@importFrom rlang .datato the roxygen block for this function, or- Replace
.datawithrlang::.datathroughout🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/plotPSig.R` around lines 115 - 116, The use of the .data pronoun in the plotting code (e.g., inside geom_point and other ggplot layers referencing .data$X, .data$Y, .data$Cutoff) will trigger R CMD check warnings; fix by importing the pronoun from rlang or qualifying it: add importFrom(rlang, .data) to NAMESPACE or add `@importFrom` rlang .data to the roxygen block for the function that contains plot.elements, or alternatively replace occurrences of .data with rlang::.data in the plot code (references: plot.elements, geom_point, any other ggplot layers using .data).vignettes/DetectingDrugSynergyAndAntagonism.Rmd-162-169 (1)
162-169:⚠️ Potential issue | 🟠 MajorThe no-replicate branch is still inconsistent with downstream code.
This branch tries to remove the synthetic replicate column, but Line 259 later groups by
tech_repunconditionally. On datasets without undocumented replicates, the vignette will either leave the temporary column behind (tech_repstypo) or fail during normalization. Keeptech_repin the mapped data for all datasets, or make the laterby =vectors conditional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vignettes/DetectingDrugSynergyAndAntagonism.Rmd` around lines 162 - 169, The branch that handles no technical replicates is inconsistent: it tries to delete a synthetic column using the wrong name ("tech_reps") and later code unconditionally groups by tech_rep; instead ensure mathews_griner always has a tech_rep column and that groups[["colDataMap"]] and groups[["assayMap"]] consistently include "tech_rep". Fix the typo (remove "tech_reps") and, in the else branch of the mathews_griner[, tech_rep := ...] block, set mathews_griner[["tech_rep"]] <- 1 (or fill with a single replicate value) and append "tech_rep" to groups[["colDataMap"]] and groups[["assayMap"]] so downstream grouping (the later by = tech_rep) works without conditional changes.R/computeAmax.R-59-76 (1)
59-76:⚠️ Potential issue | 🟠 MajorCheck distinct dose levels before fitting.
The current guard counts cleaned points, not unique non-zero concentrations. Two replicates at the same dose still reach
logLogisticRegression(), but that input cannot identify a Hill curve and can return unstableAmaxvalues instead of the intendedNA.📉 Suggested guard
- if (length(log_conc) < 2L) { + if (length(unique(log_conc)) < 2L) { if (verbose) { - warning("Insufficient non-zero concentrations for curve fitting") + warning("Insufficient distinct non-zero concentrations for curve fitting") } x <- NA_real_ names(x) <- "Amax" return(x) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/computeAmax.R` around lines 59 - 76, The guard before fitting uses length(log_conc) which counts points not unique non-zero concentrations, so computeAmax may proceed with multiple replicates at the same dose; change the check to count distinct non-zero concentration levels (e.g., use unique(log_conc[!is.na(log_conc) & log_conc != 0]) or equivalent) and return NA for "Amax" when fewer than two unique non-zero dose levels exist before calling logLogisticRegression; ensure this check references the log_conc vector and short-circuits the same early-return behavior currently implemented.R/computeGR.R-88-95 (1)
88-95:⚠️ Potential issue | 🟠 MajorValidate
durationanddbl_timeas positive scalars.These checks only reject missing values.
0, negative,NA_real_, or length>1 inputs still reachtau <- duration / dbl_time, which can yield invalid or vectorized GR outputs instead of a clear error.⏱️ Suggested validation
- if (missing(duration)) { - stop("Cannot calculate GR without duration of experiment") - } - if (missing(dbl_time)) { - stop("Cannot calculate GR without cell doubling time") - } + if ( + missing(duration) || + !is.numeric(duration) || + length(duration) != 1L || + is.na(duration) || + !is.finite(duration) || + duration <= 0 + ) { + stop("'duration' must be a single positive finite numeric value.") + } + if ( + missing(dbl_time) || + !is.numeric(dbl_time) || + length(dbl_time) != 1L || + is.na(dbl_time) || + !is.finite(dbl_time) || + dbl_time <= 0 + ) { + stop("'dbl_time' must be a single positive finite numeric value.") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/computeGR.R` around lines 88 - 95, The inputs duration and dbl_time must be validated as positive numeric scalars before computing tau; add checks (before the line computing tau <- duration / dbl_time) that both duration and dbl_time are numeric, length 1, not NA, finite, and > 0, and call stop() with clear messages if any condition fails so you never divide by zero, negative, NA or vector inputs; reference the variables duration, dbl_time and the assignment to tau when adding the validation.R/computeICn.R-79-107 (1)
79-107:⚠️ Potential issue | 🟠 MajorValidate
nbefore converting it to a target viability.Line 79 maps any numeric
nstraight intotarget. Values liken = 150with percent input orn = 1.2with proportional input currently fall through to the endpoint guards and return0/Infas if they were valid ICn values.Suggested fix
+ if (!is.numeric(n) || length(n) != 1L || !is.finite(n)) { + stop("`n` must be a single finite numeric value.") + } + if (viability_as_pct && (n < 0 || n > 100)) { + stop("`n` must be between 0 and 100 when `viability_as_pct = TRUE`.") + } + if (!viability_as_pct && (n < 0 || n > 1)) { + stop("`n` must be between 0 and 1 when `viability_as_pct = FALSE`.") + } + target <- if (viability_as_pct) { 1 - (n / 100) } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/computeICn.R` around lines 79 - 107, Add explicit validation of the input n at the start of computeICn: before computing target, check that n is finite and within the expected range depending on viability_as_pct (0<=n<=100 when TRUE, 0<=n<=1 when FALSE); if invalid return NA_real_ (or NA_real_ when conc_as_log logic expects numeric) instead of proceeding to compute target and hitting the endpoint guards. Reference the variables n, viability_as_pct, target, and the function computeICn when adding this guard so invalid inputs like n=150 or n=1.2 are caught early.R/rankGeneDrugSensitivity.R-109-113 (1)
109-113:⚠️ Potential issue | 🟠 MajorInvert the row-validity mask back to the intended logic.
Lines 109-112 are the opposite of the commented predicate just below them: they keep rows when
type/batchare missing and when eitherdataordrugphenohas any value. That inflatessum(ccix)and can bypass the insufficient-data guard with incomplete experiments.Suggested fix
- ccix <- rowSums(!is.na(data)) > 0 | - rowSums(!is.na(drugpheno)) > 0 | - is.na(type) | - is.na(batch) + ccix <- rowSums(!is.na(data)) > 0 & + rowSums(!is.na(drugpheno)) > 0 & + !is.na(type) & + !is.na(batch)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/rankGeneDrugSensitivity.R` around lines 109 - 113, The row-validity mask ccix is inverted: replace the current "rowSums(!is.na(data)) > 0 | rowSums(!is.na(drugpheno)) > 0 | is.na(type) | is.na(batch)" with a mask that flags invalid rows when both data and drugpheno are empty OR type/batch are missing, e.g. use "(rowSums(!is.na(data)) == 0 & rowSums(!is.na(drugpheno)) == 0) | is.na(type) | is.na(batch)" so ccix correctly marks rows to exclude before subsetting with iix.R/computeABC.R-57-117 (1)
57-117:⚠️ Potential issue | 🟠 MajorHandle
Hill_fit1andHill_fit2independently.Lines 57-94 recompute both fits whenever either one is missing. That discards a supplied fit and also forces callers to keep both viability vectors around, even when only one curve actually needs fitting.
Suggested fix
- if (missing(Hill_fit1) || missing(Hill_fit2)) { + if (missing(Hill_fit1)) { Hill_fit1 <- logLogisticRegression( conc1, viability1, @@ ) - cleanData <- sanitizeInput( - conc = conc1, - Hill_fit = Hill_fit1, - conc_as_log = conc_as_log, - viability_as_pct = viability_as_pct, - trunc = trunc, - verbose = verbose - ) - pars1 <- cleanData[["Hill_fit"]] - log_conc1 <- cleanData[["log_conc"]] - Hill_fit2 <- logLogisticRegression( + cleanData <- sanitizeInput( + conc = conc1, + Hill_fit = Hill_fit1, + conc_as_log = conc_as_log, + viability_as_pct = viability_as_pct, + trunc = trunc, + verbose = verbose + ) + } else { + cleanData <- sanitizeInput( + conc = conc1, + viability = viability1, + Hill_fit = Hill_fit1, + conc_as_log = conc_as_log, + viability_as_pct = viability_as_pct, + trunc = trunc, + verbose = verbose + ) + } + pars1 <- cleanData[["Hill_fit"]] + log_conc1 <- cleanData[["log_conc"]] + + if (missing(Hill_fit2)) { + Hill_fit2 <- logLogisticRegression( conc2, viability2, @@ - cleanData <- sanitizeInput( - conc = conc2, - Hill_fit = Hill_fit2, - conc_as_log = conc_as_log, - viability_as_pct = viability_as_pct, - trunc = trunc, - verbose = verbose - ) - pars2 <- cleanData[["Hill_fit"]] - log_conc2 <- cleanData[["log_conc"]] + cleanData <- sanitizeInput( + conc = conc2, + Hill_fit = Hill_fit2, + conc_as_log = conc_as_log, + viability_as_pct = viability_as_pct, + trunc = trunc, + verbose = verbose + ) } else { cleanData <- sanitizeInput( - conc = conc1, - viability = viability1, - Hill_fit = Hill_fit1, + conc = conc2, + viability = viability2, + Hill_fit = Hill_fit2, conc_as_log = conc_as_log, viability_as_pct = viability_as_pct, trunc = trunc, verbose = verbose ) - pars1 <- cleanData[["Hill_fit"]] - log_conc1 <- cleanData[["log_conc"]] - cleanData <- sanitizeInput( - conc = conc2, - viability = viability2, - Hill_fit = Hill_fit2, - conc_as_log = conc_as_log, - viability_as_pct = viability_as_pct, - trunc = trunc, - verbose = verbose - ) - pars2 <- cleanData[["Hill_fit"]] - log_conc2 <- cleanData[["log_conc"]] } + pars2 <- cleanData[["Hill_fit"]] + log_conc2 <- cleanData[["log_conc"]]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/computeABC.R` around lines 57 - 117, The current block recomputes both Hill_fit1 and Hill_fit2 whenever either is missing, discarding a provided fit; change the logic to handle each fit independently: for Hill_fit1, if missing(Hill_fit1) call logLogisticRegression(conc1, viability1, ...) and then sanitizeInput(conc=conc1, Hill_fit=Hill_fit1, ...) to set pars1 and log_conc1; separately, for Hill_fit2, if missing(Hill_fit2) call logLogisticRegression(conc2, viability2, ...) and then sanitizeInput(conc=conc2, Hill_fit=Hill_fit2, ...) to set pars2 and log_conc2; if a fit is supplied skip logLogisticRegression and just call sanitizeInput with the provided Hill_fit; keep sanitizeInput parameter names (conc, viability, Hill_fit, conc_as_log, viability_as_pct, trunc, verbose) and assign pars1/pars2 and log_conc1/log_conc2 accordingly.R/methods-drugSensitivitySig.R-432-434 (1)
432-434:⚠️ Potential issue | 🟠 MajorKeep
ddtwo-dimensional when applyingsensitivity.cutoff.
factor(ifelse(dd > sensitivity.cutoff, 1, 0), ...)drops the matrix/data-frame shape. With multiple sensitivity measures, that merges all response columns into one vector, sodrugphenono longer lines up withtype/batchand the downstream fit runs on corrupted input.Proposed fix
if (!is.na(sensitivity.cutoff)) { - dd <- factor(ifelse(dd > sensitivity.cutoff, 1, 0), levels = c(0, 1)) + dd <- as.data.frame(dd, check.names = FALSE) + dd[] <- lapply(dd, function(col) { + factor(ifelse(col > sensitivity.cutoff, 1, 0), levels = c(0, 1)) + }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/methods-drugSensitivitySig.R` around lines 432 - 434, The current conversion `factor(ifelse(dd > sensitivity.cutoff, 1, 0), ...)` collapses dd to a vector and loses its matrix/data.frame shape; instead, preserve dd's 2D structure by creating a same-dimension binary result from dd (using ifelse or equivalent) and then convert each column to a factor with levels = c(0,1) while preserving dimnames; update the block around dd and sensitivity.cutoff so dd remains a matrix/data.frame with the same row/column names (so downstream objects like drugpheno, type, batch still align) — e.g., produce a binary matrix of the same dimensions and apply per-column factorization rather than wrapping the whole matrix in a single factor.R/computeDSS.R-95-117 (1)
95-117:⚠️ Potential issue | 🟠 MajorReturn
0once the clamped DSS interval is empty.After clamping,
x1can legitimately equalx2. In that case there is no response area left to score, but this still flows intocomputeAUC(c(x1, x2), ...), and the fitted AUC path now returnsNA_real_for zero-width bounds.Proposed fix
if (censor) { if (e_inf_pct > 50) { return(NA) } else if (all(concentration < log_ec50)) { return(0) } } + + if (x1 >= x2) { + return(0) + } auc_fraction <- computeAUC( concentration = c(x1, x2),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/computeDSS.R` around lines 95 - 117, After clamping x1 to the concentration bounds in computeDSS, handle the empty interval case: if x1 >= x2 (i.e., the clamped DSS interval has zero width) return 0 immediately before calling computeAUC. This prevents computeAUC being invoked with c(x1, x2) when x1 == x2 and avoids downstream NA_real_ results; reference the variables x1, x2 and the call to computeAUC(...) in computeDSS when adding this early return.R/geneDrugSensitivityPBCorr.R-327-355 (1)
327-355:⚠️ Potential issue | 🟠 MajorRebuild
dddafter filtering collapsed covariates.This in-place assignment breaks when every resampled covariate drops out: the RHS has zero columns, but the LHS still selects the original covariate block. The bootstrap statistic can then fail before the
ncol(ddd)branches handle the reduced design.Proposed fix
pcor.boot <- function(ddd, w) { ddd <- ddd[w, ] ## Taking care of an edge case where only one covariate factor level is left after resampling - ddd[, -c(1, 2)] <- ddd[, -c(1, 2), drop = FALSE][, - apply(ddd[, -c(1, 2), drop = FALSE], 2, function(x) { - return(length(unique(x))) - }) >= - 2 - ] + keep_covariates <- apply( + ddd[, -c(1, 2), drop = FALSE], + 2, + function(x) length(unique(x)) >= 2 + ) + ddd <- ddd[, c(TRUE, TRUE, keep_covariates), drop = FALSE]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/geneDrugSensitivityPBCorr.R` around lines 327 - 355, In pcor.boot, the in-place filtering of covariate columns (ddd[, -c(1, 2)] <- ...) can fail when all resampled covariates collapse; instead build a new covariate block and then reconstruct ddd: extract covars <- ddd[, -c(1,2), drop = FALSE], filter them with covars <- covars[, apply(covars, 2, function(x) length(unique(x)) >= 2), drop = FALSE], and then recreate ddd <- cbind(ddd[, 1:2, drop = FALSE], covars) so downstream ncol(ddd) checks handle zero covariates correctly (function: pcor.boot, variable: ddd).R/geneDrugSensitivity.R-190-264 (1)
190-264:⚠️ Potential issue | 🟠 MajorPreserve the fitted model when a warning is raised.
When
glm()/lm()issues a warning, the current warning handlers returnNULL(in non-verbose mode) or a message string (in verbose mode), replacing the fitted model object. This causes the downstream checks to fail: the condition!is(rr0, "try-error") && !is(rr1, "try-error") & !is(rr0, "character") && !is(rr1, "character")evaluates to TRUE for NULL values, allowing the code to proceed tosummary(rr1)andanova(rr0, rr1), which fail on non-model objects.Suggested direction
- rr0 <- tryCatch( - try(glm( - formula(drugpheno.1 ~ . - x), - data = dd, - model = FALSE, - x = FALSE, - y = FALSE, - family = "binomial" - )), - warning = function(w) { - if (verbose) { - msg <- "Null model did not converge" - message(msg) - if ("type" %in% colnames(dd)) { - tt <- table(dd[, "type"]) - message("Type distribution: ", toString(tt)) - } - return(msg) - } - return(NULL) - } - ) + rr0 <- withCallingHandlers( + try(glm( + formula(drugpheno.1 ~ . - x), + data = dd, + model = FALSE, + x = FALSE, + y = FALSE, + family = "binomial" + )), + warning = function(w) { + if (verbose) { + message("Null model did not converge") + if ("type" %in% colnames(dd)) { + tt <- table(dd[, "type"]) + message("Type distribution: ", toString(tt)) + } + } + invokeRestart("muffleWarning") + } + )Apply the same pattern to the other three fit calls.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/geneDrugSensitivity.R` around lines 190 - 264, The warning handlers for the four fit attempts (rr0 and rr1 for both glm and lm) currently return NULL or a message string on warnings, replacing the fitted model; change each block to capture the result of the try(...) into a temporary variable (e.g. tmp <- try(glm(...)) / tmp <- try(lm(...))) and have the warning handler return that tmp so the actual fitted model is preserved (and still log the messages when verbose). Apply this pattern to rr0 and rr1 for both the binomial (glm) and linear (lm) branches so downstream checks see the model object rather than NULL/character.R/curveFittingPGX.R-77-78 (1)
77-78:⚠️ Potential issue | 🟠 MajorBuild the support grid from sanitized positive doses.
logLogisticRegression()dropsconc == 0, but this path still calls.getSupportVec(log10(conc))on the raw vector. A single control dose of 0 makes the support input-Inf, so curve generation can fail even when the fit itself succeeded.Suggested fix
fit_results <- lapply(experiment_list, function(exp_dt) { conc <- exp_dt$conc viability_pct <- pmin(exp_dt$viability * 100, 100) + positive_conc <- conc[is.finite(conc) & conc > 0] @@ - support <- CoreGx::.getSupportVec(log10(conc)) + support <- CoreGx::.getSupportVec(log10(positive_conc))Also applies to: 107-108
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/curveFittingPGX.R` around lines 77 - 78, logLogisticRegression() currently calls .getSupportVec(log10(conc)) using the raw conc vector, which may contain zeros and produce -Inf; before calling .getSupportVec() filter or sanitize conc to only positive doses (e.g., use conc_pos <- conc[conc > 0] and call .getSupportVec(log10(conc_pos))) so the support grid is built from strictly positive concentrations; apply the same sanitization at the other occurrence (the second call around where viability_pct is computed) so no -Inf values are passed to .getSupportVec().R/PharmacoSet-class.R-344-352 (1)
344-352:⚠️ Potential issue | 🟠 MajorReserve
warning()for failures only.These
warning(ifelse(...))calls now warn on the success path as well ("colData dimension is OK"/"BEST is OK"/ blank warnings when a column exists). Sinceverify = TRUEis the default constructor path, validPharmacoSets will start emitting spurious warnings.Also applies to: 355-372, 379-396
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/PharmacoSet-class.R` around lines 344 - 352, The code currently calls warning(ifelse(...)) which emits warnings on both failure and success; update the checks in PharmacoSet-class.R (the blocks that reference nn and check nrow(colData(profile)) != ncol(assays(profile)[[1]]), and the similar blocks at the other noted locations) to only call warning() on the failure path and use message() or no output on success; implement this by replacing warning(ifelse(...)) with an explicit if/else: if (mismatch) warning(sprintf(...)) else message(sprintf(...)) or nothing, ensuring only the mismatch branch uses warning().R/logLogisticRegression.R-96-98 (1)
96-98:⚠️ Potential issue | 🟠 MajorCheck the number of unique doses, not just rows.
This guard lets three replicates at one or two concentrations reach the optimizer even though the error says “three unique concentrations.” That makes the fit underdetermined and pushes failures deeper into the optimization path.
Suggested fix
- if (length(log_conc) < 3L) { + if (length(unique(log_conc)) < 3L) { stop("At least three unique concentrations are required to fit a curve.") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/logLogisticRegression.R` around lines 96 - 98, The guard currently checks length(log_conc) but should check the number of unique concentrations to prevent fitting on replicated doses; update the check to use length(unique(log_conc)) wherever the code validates concentrations (the variable log_conc in this file, likely inside the logLogisticRegression fitting function) and keep the same stop message so the function halts when fewer than three unique concentrations are present.R/curveFittingPGX.R-53-55 (1)
53-55:⚠️ Potential issue | 🟠 MajorUse a collision-free experiment key.
paste(cell_id, drug_id, sep = "_")can merge distinct pairs when either identifier already contains underscores. That will mix measurements across experiments and corrupt both the fit and the returned tables.Suggested fix
- dt[, experiment := paste(cell_id, drug_id, sep = "_")] + dt[, experiment := as.character( + interaction(cell_id, drug_id, drop = TRUE, lex.order = TRUE) + )]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/curveFittingPGX.R` around lines 53 - 55, The experiment key built with paste(cell_id, drug_id, sep = "_") can collide when IDs contain underscores; replace it with a collision-free key (e.g. use paste0(cell_id, "\u0001", drug_id) or interaction(cell_id, drug_id, sep = "\u0001")) and then sort by the original identifiers instead of the fragile combined field (use data.table::setorder(dt, cell_id, drug_id, conc)); update the creation of dt[, experiment := ...] and the subsequent setorder call accordingly.R/sanityCheck.R-575-588 (1)
575-588:⚠️ Potential issue | 🟠 MajorSort
concbefore derivinglog_conc.This branch computes
log_concfrom the original order and only sortsconcafterwards. Callers receiving bothHill_fitandlog_conctherefore still get an unsorted x-vector, which can break downstream interpolation/integration.Suggested fix
- if (conc_as_log == FALSE) { + if (is.unsorted(conc)) { + conc <- conc[order(conc)] + } + + if (conc_as_log == FALSE) { ii <- which(conc == 0) if (length(ii) > 0) { conc <- conc[-ii] } log_conc <- log10(conc) } else { log_conc <- conc } - if (is.unsorted(conc)) { - myx <- order(conc) - conc <- conc[myx] - } return(list("Hill_fit" = Hill_fit, "log_conc" = log_conc))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/sanityCheck.R` around lines 575 - 588, The branch computes log_conc from the unsorted conc and only sorts conc afterward, so callers get mismatched x order; ensure conc is sorted before deriving log_conc by moving or applying the is.unsorted/order logic (or calling conc <- conc[order(conc)]) before the conc_as_log conditional (or at least before log_conc <- log10(conc) / log_conc <- conc), and keep the zero-removal (ii <- which(conc == 0); conc <- conc[-ii]) applied to the sorted conc so that the returned list("Hill_fit" = Hill_fit, "log_conc" = log_conc) contains log_conc consistent with the sorted conc used elsewhere.R/drugDoseResponseCurve.R-592-611 (1)
592-611:⚠️ Potential issue | 🟠 MajorPlot the fitted 4-parameter curve instead of reconstructing the legacy 3-parameter shape.
The overlay drops
E0and relies on.Hill()’s fallbackE0 = 1, so any fit with a top asymptote different from 100% is drawn incorrectly. The nested fit call also ignores the caller’struncargument.Suggested fix
fit <- try( logLogisticRegression( conc = filtered_dose, - viability = filtered_resp + viability = filtered_resp, + trunc = trunc ), silent = TRUE ) if (inherits(fit, "try-error")) { return() } log10_x_vals <- .getSupportVec(log10(filtered_dose)) - pred <- .Hill( - log10_x_vals, - pars = c( - fit$HS, - fit$E_inf / 100, - log10(fit$EC50) - ) - ) * - 100 + internal <- .normalizeHillPars( + hill_fit = fit, + conc_as_log = FALSE, + viability_as_pct = TRUE + ) + pred <- .pgx_hill_curve( + log10_x_vals, + unname(internal[c("HS", "E0", "E_inf", "log10EC50")]) + ) * 100🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/drugDoseResponseCurve.R` around lines 592 - 611, The overlay is reconstructing a legacy 3-parameter curve and drops the fitted E0 and the caller's trunc handling; call logLogisticRegression with the same trunc argument (so the nested fit uses the caller's trunc) and compute predictions from the fitted 4-parameter model by passing the full set of fitted parameters (including E0) into .Hill (e.g., use pars built from fit$HS, fit$E_inf/100, log10(fit$EC50), and fit$E0/100 or whatever order .Hill expects) instead of assuming E0 = 1 so the plotted curve matches the fitted top asymptote.R/computeSynergy.R-46-64 (1)
46-64:⚠️ Potential issue | 🟠 MajorValidate broadcast lengths and keep the zero-dose boundary valid.
rep_len()will silently recycle any non-scalar parameter vector here, soEC50,HS, orE_inflengths other than1orlength(viability)can corrupt the elementwise pairing without failing fast. Also, the newratio <= 0check turnseffectToDose(viability = 1, ...)intoNA, even though the inverse Hill solution is0dose whenE_inf < 1andHS > 0. Both cases can leak bad inputs intoloeweCI()/computeLoewe().🛠️ Suggested fix
len <- length(viability) - EC50 <- rep_len(EC50, len) - HS <- rep_len(HS, len) - E_inf <- rep_len(E_inf, len) + expand_param <- function(param, name) { + if (length(param) == len) { + return(param) + } + if (length(param) == 1L) { + return(rep_len(param, len)) + } + stop(sprintf("`%s` must have length 1 or %d.", name, len), call. = FALSE) + } + EC50 <- expand_param(EC50, "EC50") + HS <- expand_param(HS, "HS") + E_inf <- expand_param(E_inf, "E_inf") @@ invalid <- invalid_param | !is.finite(denom) | denom == 0 | !is.finite(ratio) | - ratio <= 0 + ratio < 0 | + (ratio == 0 & HS <= 0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/computeSynergy.R` around lines 46 - 64, Ensure EC50, HS, and E_inf are either length 1 or length(viability) and throw an error if any has a non-conforming length instead of silently recycling via rep_len; replace the rep_len(EC50, len)/rep_len(HS, len)/rep_len(E_inf, len) pattern with an explicit length check and only then expand scalars. Preserve the zero-dose boundary: do not treat ratio <= 0 as universally invalid—detect the special case where viability == 1, E_inf < 1, and HS > 0 and set dose = 0 for those indices (e.g., before the invalid/ration checks or by creating a valid_idx_zero mask); keep the existing invalid_param and denom/ratio checks for other cases and compute dose for valid_idx with dose[valid_idx] <- EC50[valid_idx] * ratio[valid_idx]^(1 / HS[valid_idx]).
🧹 Nitpick comments (11)
R/methods-intersect.R (1)
7-7: Link this TODO to a tracked issue.This is fine as a placeholder, but please add an issue/ticket reference (e.g.,
#123) so it doesn’t get lost and has clear ownership.If you want, I can draft a concise GitHub issue template for implementing
LongTableintersection.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/methods-intersect.R` at line 7, Replace the placeholder TODO in methods-intersect.R that currently reads "Implement intersection of LongTable objects" with a TODO that includes a tracked issue reference (e.g., "TODO:: Implement intersection of LongTable objects #<issue-number>") so the work is tied to an owner; if the issue does not exist yet, create a GitHub issue describing the required behavior for LongTable intersection and then update the TODO to reference that issue number (mentioning LongTable and the intended intersection behavior in the issue title/body).tests/testthat/test_intersectPSet.R (1)
6-14: Replace commented-out tests with explicit skipped testsThese blocks are effectively dead code right now. Please convert them to active
test_that(...)blocks with an explicitskip("reason")(or scoped skip condition) so the suite documents intended coverage and debt in a trackable way.Also applies to: 16-25
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test_intersectPSet.R` around lines 6 - 14, Uncomment the disabled test block and convert it into an active test_that() with an explicit skip call (e.g., skip("temporarily skipped: reason")), so the intent and coverage debt are recorded; specifically locate the commented block that calls intersectPSet(list('GDSC' = GDSCsmall, 'CCLE' = CCLEsmall), intersectOn = c("drugs", "cell.lines", "concentrations")) and the subsequent expects that reference sensitivityProfiles(CCLE)$auc_recomputed_star, treatmentNames(common$CCLE)/treatmentNames(common$GDSC) and sampleNames(common$CCLE)/sampleNames(common$GDSC) and wrap them in test_that("...", { skip("explain reason"); ... }) instead of leaving them commented; do the same conversion for the other commented block around lines 16-25 so both become active skipped tests.man/cosinePerm.Rd (1)
17-19: Consider improving grammatical consistency in documentation.While reviewing the corrected typo, a few additional grammatical issues were noticed:
- Lines 17, 19: "is the factors" could be "are the factors" or rephrased to "factor representing"
- Line 38: "in include.perm is set" should be "if include.perm is set"
These are minor documentation quality improvements that could be addressed in a future update.
Also applies to: 38-38
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@man/cosinePerm.Rd` around lines 17 - 19, Update the man page wording for consistency: change the param descriptions for x and y (the \item{x} and \item{y} entries in man/cosinePerm.Rd) from "is the factors" to grammatically correct phrasing such as "are the factors" or "factor representing" (pick one consistent style for both), and correct the sentence using "in include.perm is set" to "if include.perm is set" (refer to the include.perm mention in the same file) so the documentation reads clearly and consistently.R/getRawSensitivityMatrix.R (1)
26-27: Consider usingseq_along()for idiomatic R.
seq_along(exp.id)is the idiomatic R equivalent ofseq_len(length(exp.id))and slightly more readable.The switch from
grep()togrepl(..., fixed = TRUE)is a good change—it returns a boolean directly and avoids regex interpretation.♻️ Optional simplification
- for (i in seq_len(length(exp.id))) { + for (i in seq_along(exp.id)) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/getRawSensitivityMatrix.R` around lines 26 - 27, Replace the non-idiomatic loop iterator seq_len(length(exp.id)) with seq_along(exp.id) in the for loop (the line using for (i in seq_len(length(exp.id)))) to simplify and clarify iteration over exp.id; keep the existing grepl(..., fixed = TRUE) check for drug.id as-is.R/filterNoisyCurves.R (1)
24-30: Consider a deprecation shim for the renamed public argument.Renaming
mean.viablitytomean.viabilityis a breaking API change for existing named callers. A one-release compatibility alias would reduce downstream breakage.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/filterNoisyCurves.R` around lines 24 - 30, Add a one-release deprecation shim inside filterNoisyCurves to accept the old misspelled named argument mean.viablity: detect if mean.viablity was passed (e.g. via missing(mean.viability) and !missing(mean.viablity) or checking list(...) for "mean.viablity"), assign its value to mean.viability, and emit a warning about the deprecated name; keep the canonical parameter mean.viability as the authoritative variable and proceed using it so existing named callers continue to work for one release.R/updateObject-methods.R (1)
18-22: Remove the no-opcolnamesassignment.Line 21 writes identical column names back, which has no migration effect and adds avoidable churn. Consider dropping this branch until an actual rename is needed.
♻️ Proposed cleanup
- if ("treatment" %in% names(curation(pSet))) { - # Column names already match current schema; historically this was a gsub. - # Retain the branch in case future migrations need to adjust names. - colnames(curation(pSet)$treatment) <- colnames(curation(pSet)$treatment) - } + # No treatment-column migration is currently required.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/updateObject-methods.R` around lines 18 - 22, Remove the no-op column assignment in the branch that checks for "treatment" in names(curation(pSet)); specifically delete the line that sets colnames(curation(pSet)$treatment) <- colnames(curation(pSet)$treatment) (or remove the whole branch if nothing else remains), leaving the check out or adding a TODO comment for future schema migrations related to curation(pSet)$treatment so no redundant churn remains.R/methods-[.R (2)
71-74: Minor inconsistency:resolve_namesnot used for cell character indices.The
resolve_nameshelper (lines 44-61) filters outNAand empty strings from character indices, but it's only used for drug indices (line 89). Cell character indices use direct%in%matching without the same filtering. This could lead to subtle differences in behavior whenNAor empty strings are passed as cell indices vs drug indices.Consider using
resolve_namesconsistently for both dimensions:♻️ Suggested consistency improvement
} else if (is.character(i)) { - matched <- i[i %in% samples] - cell_names <- matched + cell_names <- resolve_names(i, samples, "cell") cells_requested_empty <- cells_requested_empty || length(cell_names) == 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/methods-`[.R around lines 71 - 74, Use the existing resolve_names helper to normalize character cell indices the same way drugs are handled: replace the direct matching logic that computes matched <- i[i %in% samples] and assigns cell_names with a call to resolve_names(i, samples) (or similar) and then update cells_requested_empty appropriately (cells_requested_empty || length(cell_names) == 0) so NA and empty-string values are filtered consistently; reference resolve_names, samples, cell_names, and cells_requested_empty when making the change.
82-84: Minor redundancy in empty-check logic.The
cells_requested_emptyanddrugs_requested_emptyflags are already set within each branch (lines 74, 78, 90, 94), making the subsequent checks at lines 82-84 and 98-100 redundant. This doesn't affect correctness but adds unnecessary complexity.♻️ Optional cleanup
- if (!missing(i) && length(cell_names) == 0) { - cells_requested_empty <- TRUE - }- if (!missing(j) && length(drug_names) == 0) { - drugs_requested_empty <- TRUE - }Also applies to: 98-100
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/methods-`[.R around lines 82 - 84, Remove the redundant post-branch empty-checks for cells and drugs: the flags cells_requested_empty and drugs_requested_empty are already assigned inside their respective branches (where i and j are handled), so delete the extra if-blocks that re-set these flags (the checks that test !missing(i) && length(cell_names) == 0 and !missing(j) && length(drug_names) == 0). Leave the original branch logic intact and ensure no other code depends on those duplicate checks.tests/testthat/test_PharmacoSetClass.R (1)
101-102: Remove duplicated assertion to keep the test focused.Both lines assert the exact same condition; keeping one is sufficient and reduces maintenance noise.
✂️ Suggested cleanup
expect_true(all(unique(sensitivityInfo(GDSCsmall)$sampleid) %in% newNames)) - expect_true(all(unique(sensitivityInfo(GDSCsmall)$sampleid) %in% newNames))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test_PharmacoSetClass.R` around lines 101 - 102, The test contains a duplicated assertion: remove one of the identical expect_true lines that check the same condition so only a single assertion remains; specifically keep one expect_true(all(unique(sensitivityInfo(GDSCsmall)$sampleid) %in% newNames)) and delete the duplicate to reduce redundancy and maintenance noise.R/downloadSignatures.R (1)
60-68: Fragile column indexing with magic number.The hardcoded
pSetTable[, 3]relies on the table structure remaining constant. Consider usingpSetTable[["Dataset Name"]]or the column name directly for more robust code.💡 Suggested improvement
# pick a signature from the list - whichx <- match(name, pSetTable[, 3]) + whichx <- match(name, pSetTable[["Dataset Name"]])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/downloadSignatures.R` around lines 60 - 68, Replace the fragile column index pSetTable[, 3] with a column-name lookup to avoid breaking if column order changes: use pSetTable[["Dataset Name"]] (or pSetTable$`Dataset Name`) in the call to match(name, ...) where whichx is set, and ensure the function that uses whichx (the match assignment and the subsequent is.na(whichx) stop) still behaves the same; this keeps the lookup robust and keeps availablePSets() as the authoritative source of column names.tests/testthat/test_bracket_pharmacoset.R (1)
38-48: Cover the treatment-axis error paths too.This block exercises negative/zero/out-of-range failures only for the sample index. The treatment side currently checks
NA_integer_only, so regressions in the drug-index validation branch could still slip through.🧪 Suggested additions
test_that("invalid numeric indices raise informative errors", { data(CCLEsmall) expect_error(CCLEsmall[-1, 1], "positive") expect_error(CCLEsmall[0, 1], "positive") expect_error(CCLEsmall[NA_integer_, 1], "must not contain NA") expect_error( CCLEsmall[length(sampleNames(CCLEsmall)) + 1, 1], "exceed" ) + expect_error(CCLEsmall[1, -1], "positive") + expect_error(CCLEsmall[1, 0], "positive") + expect_error( + CCLEsmall[1, length(treatmentNames(CCLEsmall)) + 1], + "exceed" + ) expect_error(CCLEsmall[1, NA_integer_], "must not contain NA") })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test_bracket_pharmacoset.R` around lines 38 - 48, Add mirror tests for the treatment-axis (column) in the same block: exercise negative index (e.g., CCLEsmall[1, -1]), zero index (CCLEsmall[1, 0]), NA (CCLEsmall[1, NA_integer_]), and out-of-range (CCLEsmall[1, length(treatmentNames(CCLEsmall)) + 1]) and assert the same informative error messages ("positive", "must not contain NA", "exceed") as used for the sample-axis so drug-index validation branches are covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1bef0b89-351f-44d5-b3e4-a013f9df527a
📒 Files selected for processing (114)
.github/workflows/check-bioc.yml.gitignoreDESCRIPTIONNAMESPACER/GR.RR/GWC.RR/PharmacoSet-accessors.RR/PharmacoSet-class.RR/PharmacoSet-utils.RR/adaptiveMatthewCor.RR/callingWaterfall.RR/class-SignatureClass.RR/computeABC.RR/computeAUC.RR/computeAUC_old.RR/computeAmax.RR/computeDSS.RR/computeDrugSensitivity.RR/computeGR.RR/computeIC50.RR/computeICn.RR/computeSlope.RR/computeSynergy.RR/connectivityScore.RR/cosinePerm.RR/curveFittingPGX.RR/datasets.RR/downloadPSet.RR/downloadSignatures.RR/drugDoseResponseCurve.RR/drugPerturbationSig.RR/filterNoisyCurves.RR/geneDrugPerturbation.RR/geneDrugSensitivity.RR/geneDrugSensitivityPBCorr.RR/geneDrugSensitivityPCorr.RR/getRawSensitivityMatrix.RR/globals.RR/intersectPSets.RR/logLogisticRegression.RR/matthewCor.RR/mergePSets.RR/methods-[.RR/methods-drugSensitivitySig.RR/methods-intersect.RR/methods-subsetTo.RR/methods-summarizeMolecularProfiles.RR/methods-summarizeSensitivityProfiles.RR/plotPSig.RR/rankGeneDrugPerturbation.RR/rankGeneDrugSensitivity.RR/sanityCheck.RR/updateObject-methods.RR/zzz.Rcodecover.Rman/CCLEsmall.Rdman/CMAPsmall.Rdman/GDSCsmall.Rdman/PharmacoSet-accessors.Rdman/PharmacoSet-class.Rdman/PharmacoSet2.Rdman/computeAUC.Rdman/computeAmax.Rdman/computeICn.Rdman/computeLoewe.Rdman/computeSlope.Rdman/computeZIPdelta-TreatmentResponseExperiment-method.Rdman/cosinePerm.Rdman/curveFittingPGX.Rdman/dot-computeZIPdelta.Rdman/dot-normalizeBiphasicPars.Rdman/dot-normalizeHillPars.Rdman/dot-pgx_biphasic_curve.Rdman/dot-pgx_hill_curve.Rdman/dot-summarizeSensProfiles.Rdman/downloadPSet.Rdman/downloadPertSig.Rdman/drugDoseResponseCurve.Rdman/drugSensitivitySig-PharmacoSet-method.Rdman/effectToDose.Rdman/filterNoisyCurves.Rdman/fitTwowayZIP.Rdman/hillCurve.Rdman/intersectPSet.Rdman/loeweCI.Rdman/logLogisticRegression.Rdman/mcc.Rdman/plot.PharmacoSig.Rdman/summarizeMolecularProfiles-PharmacoSet-method.Rdtests/testthat/test_Hill.Rtests/testthat/test_PharmacoSetClass.Rtests/testthat/test_PharmacoSet_utils.Rtests/testthat/test_bracket_pharmacoset.Rtests/testthat/test_computeABC.Rtests/testthat/test_computeAmax.Rtests/testthat/test_computeIC50.Rtests/testthat/test_computeSlope.Rtests/testthat/test_downloadPSet.Rtests/testthat/test_drugDoseResponseCurve.Rtests/testthat/test_drugPerturbationSig.Rtests/testthat/test_drugSensitivitySig.Rtests/testthat/test_intersectPSet.Rtests/testthat/test_logLogisticRegression.Rtests/testthat/test_sanitizeInput.Rtests/testthat/test_subsetTo.Rtests/testthat/test_summarizeMolecularProfiles.Rtests/testthat/test_summarizeMolecularProfiles_missing_ucell.Rtests/testthat/test_summarizeSensitivityProfiles.Rtests/testthat/test_unionList.Rvignettes/CreatingPharmacoSet.Rmdvignettes/DetectingDrugSynergyAndAntagonism.Rvignettes/DetectingDrugSynergyAndAntagonism.Rmdvignettes/PharmacoGx.Rmdvignettes/pharmacogx.bib
| drug.responses <- drug.responses[complete.cases(drug.responses), ] | ||
| doses.no <- nrow(drug.responses) | ||
|
|
||
| acceptable <- mclapply(rownames(sensitivityInfo(pSet)), function(xp) { | ||
| #for(xp in rownames(sensitivityInfo(pSet))){ | ||
| drug.responses <- as.data.frame(apply(sensitivityRaw(pSet)[xp , ,], 2, as.numeric), stringsAsFactors=FALSE) | ||
| drug.responses <- drug.responses[complete.cases(drug.responses), ] | ||
| doses.no <- nrow(drug.responses) | ||
| drug.responses[, "delta"] <- .computeDelta(drug.responses$Viability) | ||
|
|
||
| drug.responses[, "delta"] <- .computeDelta(drug.responses$Viability) | ||
| delta.sum <- sum(drug.responses$delta, na.rm = TRUE) | ||
|
|
||
| delta.sum <- sum(drug.responses$delta, na.rm = TRUE) | ||
| max.cum.sum <- .computeCumSumDelta(drug.responses$Viability) | ||
|
|
||
| max.cum.sum <- .computeCumSumDelta(drug.responses$Viability) | ||
|
|
||
| if ( | ||
| (table(drug.responses$delta < epsilon)["TRUE"] >= | ||
| (doses.no * positive.cutoff.percent)) & | ||
| (delta.sum < epsilon) & | ||
| (max.cum.sum < (2 * epsilon)) & | ||
| (mean(drug.responses$Viability) < mean.viablity) | ||
| ) { | ||
| return (xp) | ||
| } | ||
| }, mc.cores=nthread) | ||
| acceptable <- unlist(acceptable) | ||
| noisy <- setdiff(rownames(sensitivityInfo(pSet)), acceptable) | ||
| return(list("noisy"=noisy, "ok"=acceptable)) | ||
| if ( | ||
| (table(drug.responses$delta < epsilon)["TRUE"] >= | ||
| (doses.no * positive.cutoff.percent)) & | ||
| (delta.sum < epsilon) & | ||
| (max.cum.sum < (2 * epsilon)) & | ||
| (mean(drug.responses$Viability) < mean.viability) | ||
| ) { |
There was a problem hiding this comment.
Guard against NA condition in curve acceptability check.
At Line 49, table(drug.responses$delta < epsilon)["TRUE"] can be NA when no TRUE bucket exists (including empty post-filter curves), making the if condition invalid at Line 48.
Proposed fix
drug.responses <- drug.responses[complete.cases(drug.responses), ]
doses.no <- nrow(drug.responses)
+ if (doses.no == 0) {
+ return(NULL)
+ }
drug.responses[, "delta"] <- .computeDelta(drug.responses$Viability)
delta.sum <- sum(drug.responses$delta, na.rm = TRUE)
max.cum.sum <- .computeCumSumDelta(drug.responses$Viability)
+ n.within.epsilon <- sum(drug.responses$delta < epsilon, na.rm = TRUE)
if (
- (table(drug.responses$delta < epsilon)["TRUE"] >=
+ (n.within.epsilon >=
(doses.no * positive.cutoff.percent)) &
(delta.sum < epsilon) &
(max.cum.sum < (2 * epsilon)) &🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/filterNoisyCurves.R` around lines 39 - 54, The condition uses
table(drug.responses$delta < epsilon)["TRUE"] which can be NA if no TRUE bucket
exists; replace this NA-prone expression with a robust count such as
sum(drug.responses$delta < epsilon, na.rm = TRUE) (or use ifelse(is.na(...), 0,
...)) and use that count in the if check (the block using drug.responses,
.computeDelta, .computeCumSumDelta, epsilon, positive.cutoff.percent, doses.no)
so empty or all-FALSE cases evaluate correctly; also ensure doses.no is checked
(>0) before comparing to the cutoff to avoid meaningless comparisons on empty
curves.
| .computeDelta <- function(xx, trunc = TRUE) { | ||
| xx <- as.numeric(xx) | ||
| if (trunc) { | ||
| return(c(pmin(100, xx[seq(2,length(xx))]) - pmin(100, xx[seq_along(xx)-1]), 0)) | ||
| } else { | ||
| return(c(xx[seq(2, length(xx))] - xx[seq_along(xx) - 1]), 0) | ||
| } | ||
| xx <- as.numeric(xx) | ||
| if (trunc) { | ||
| return(c( | ||
| pmin(100, xx[seq(2, length(xx))]) - pmin(100, xx[seq_along(xx) - 1]), | ||
| 0 | ||
| )) | ||
| } else { | ||
| return(c(xx[seq(2, length(xx))] - xx[seq_along(xx) - 1]), 0) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate definition
rg -n '^\s*\.computeDelta\s*<-\s*function'
# Inspect all callsites and specifically trunc=FALSE usage
rg -n '\.computeDelta\s*\(' -C2
rg -n '\.computeDelta\s*\([^)]*trunc\s*=\s*FALSE' -C2Repository: bhklab/PharmacoGx
Length of output: 431
🏁 Script executed:
# First, verify the exact code at lines 65-74
sed -n '65,74p' R/filterNoisyCurves.R
# Check file length to understand context better
wc -l R/filterNoisyCurves.R
# Search for all calls to .computeDelta (more comprehensive)
rg -n '\.computeDelta' R/filterNoisyCurves.R
# Check if there are any calls outside this file
rg -r '\.computeDelta' R/
# Look for any function that sets trunc to FALSE
rg -n 'trunc\s*=\s*FALSE' R/Repository: bhklab/PharmacoGx
Length of output: 6996
🏁 Script executed:
# Check the specific calls in the identified files
sed -n '605,615p' R/computeSynergy.R
sed -n '290,295p' R/computeDrugSensitivity.R
sed -n '320,330p' R/computeDrugSensitivity.R
sed -n '70,80p' R/computeAmax.RRepository: bhklab/PharmacoGx
Length of output: 931
🏁 Script executed:
# Find ALL calls to .computeDelta
rg -n '\.computeDelta\s*\(' -A 3
# Search for any pattern where .computeDelta might receive trunc parameter
rg -n 'computeDelta.*trunc' R/
# Check if computeDelta is called within functions that accept trunc parameter
rg -B 5 -A 5 '\.computeDelta\(' R/filterNoisyCurves.RRepository: bhklab/PharmacoGx
Length of output: 725
🏁 Script executed:
# Create a test script to verify the R syntax
python3 << 'EOF'
# Check if the syntax in line 73 is actually invalid R
test_code = """
return(c(1, 2), 0)
"""
print("Analyzing R syntax in line 73:")
print("Code: return(c(xx[seq(2, length(xx))] - xx[seq_along(xx) - 1]), 0)")
print("\nStructure: return(ARGUMENT1, ARGUMENT2)")
print("- ARGUMENT1: c(xx[seq(2, length(xx))] - xx[seq_along(xx) - 1])")
print("- ARGUMENT2: 0")
print("\nIn R, return() is a control function that accepts exactly ONE argument.")
print("Passing two comma-separated arguments is a syntax error.")
print("\nFor comparison, the if block at line 70-72 has:")
print("return(c(...))")
print("which is valid (one argument to return).")
EOF
# Also check the function structure
echo "=== Verifying code structure ==="
sed -n '65,74p' R/filterNoisyCurves.R | cat -nRepository: bhklab/PharmacoGx
Length of output: 884
🏁 Script executed:
# Comprehensive search: Find all function calls containing .computeDelta
rg -n '\.computeDelta' R/ --type r | grep -v '\.computeDelta\s*<-\s*function' | grep -v '^\s*#'
# Specifically look for any .computeDelta call that passes arguments
rg -B 2 -A 2 '\.computeDelta\(' R/ --type r | head -30Repository: bhklab/PharmacoGx
Length of output: 445
Fix critical syntax error in .computeDelta return statement.
Line 73 contains invalid R syntax: return(c(...), 0) passes two arguments to return() which accepts only one. Move the 0 inside the c() call. While the else block is currently unreachable (only called with default trunc = TRUE), the function's API signature allows trunc = FALSE, making this a latent bug that will crash when invoked with that parameter.
Proposed fix
.computeDelta <- function(xx, trunc = TRUE) {
xx <- as.numeric(xx)
+ if (length(xx) < 2) {
+ return(0)
+ }
if (trunc) {
return(c(
- pmin(100, xx[seq(2, length(xx))]) - pmin(100, xx[seq_along(xx) - 1]),
+ pmin(100, xx[seq.int(2, length(xx))]) - pmin(100, xx[seq_len(length(xx) - 1)]),
0
))
} else {
- return(c(xx[seq(2, length(xx))] - xx[seq_along(xx) - 1]), 0)
+ return(c(xx[seq.int(2, length(xx))] - xx[seq_len(length(xx) - 1)], 0))
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/filterNoisyCurves.R` around lines 65 - 74, The .computeDelta function has
an invalid return in the else branch where return(c(xx[seq(2, length(xx))] -
xx[seq_along(xx) - 1]), 0) supplies two args to return; fix it by moving the
trailing 0 inside the combined vector so the else returns c(..., 0). Update the
else branch in .computeDelta to return a single vector (with the final 0
appended) to match the trunc = TRUE branch.
| if (nn < 3) { | ||
| tt <- c( | ||
| "estimate" = NA, | ||
| "se" = NA, | ||
| "n" = nn, | ||
| "tstat" = NA, | ||
| "fstat" = NA, | ||
| "pvalue" = NA | ||
| ) |
There was a problem hiding this comment.
The sparse-data fallback still dereferences mm0 and mm.
Lines 65-73 build an NA result when nn < 3, but Lines 107-125 still call summary(mm0) and/or return mm. With model = TRUE or multiple type levels, those inputs will error instead of returning the intended NA stub.
Proposed fix
nn <- sum(complete.cases(dd))
+ mm0 <- NULL
+ mm <- NULL
+ ttype <- c("type.fstat" = NA, "type.pvalue" = NA)
if (nn < 3) {
tt <- c(
"estimate" = NA,
@@
mmc <- stats::anova(mm0, mm)
mm_summary <- summary(mm)
## extract statistics
tt <- c(
"estimate" = mm_summary$coefficients["concentration", "Estimate"],
@@
"fstat" = mmc$F[2],
"pvalue" = mmc$'Pr(>F)'[2]
)
+
+ if (length(sort(unique(type))) > 1) {
+ rr <- summary(mm0)
+ ttype <- c(
+ "type.fstat" = rr$fstatistic["value"],
+ "type.pvalue" = pf(
+ q = rr$fstatistic["value"],
+ df1 = rr$fstatistic["numdf"],
+ df2 = rr$fstatistic["dendf"],
+ lower.tail = FALSE
+ )
+ )
+ }
}
names(tt) <- nc
- ## add tissue type/cell line statistics
- if (length(sort(unique(type))) > 1) {
- rr <- summary(mm0)
- ttype <- c(
- "type.fstat" = rr$fstatistic["value"],
- "type.pvalue" = pf(
- q = rr$fstatistic["value"],
- df1 = rr$fstatistic["numdf"],
- df2 = rr$fstatistic["dendf"],
- lower.tail = FALSE
- )
- )
- names(ttype) <- c("type.fstat", "type.pvalue")
- } else {
- ttype <- c("type.fstat" = NA, "type.pvalue" = NA)
- }
tt <- c(tt, ttype)
## add model
if (model) {
tt <- list("stats" = tt, "model" = mm)
}Also applies to: 107-125
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/geneDrugPerturbation.R` around lines 65 - 73, When nn < 3, do not call or
summarize the model objects mm0 or mm; instead, return the NA stub (tt)
augmented to match the function's expected return structure (e.g. include
model=NULL or summaries=NULL) so that requests for model=TRUE or multi-level
type do not attempt to dereference mm0/mm. Update the conditional handling
around mm0/mm creation and the later branches that call summary(mm0) or
return(mm) to first check nn >= 3 (or that mm0/mm are non-NULL) before
summarizing/returning, and ensure the final returned object contains
placeholders (NULLs or NAs) for model-related fields when nn < 3 so callers get
the intended NA result without errors.
| default_lower <- c( | ||
| 0.1, | ||
| 0, | ||
| 0, | ||
| min(log_conc) - 3, | ||
| 0.1, | ||
| 0, | ||
| min(log_conc) - 3, | ||
| 0 | ||
| ) | ||
| default_upper <- c( | ||
| 5, | ||
| 1.5, | ||
| 1.5, | ||
| max(log_conc) + 3, | ||
| 5, | ||
| 1.5, | ||
| max(log_conc) + 3, | ||
| 1 | ||
| ) |
There was a problem hiding this comment.
The biphasic bounds are in the wrong parameter order.
.pgx_initial_guess() and .pgx_biphasic_curve() use (HS1, E0, E_inf1, HS2, E_inf2, log10EC50_1, log10EC50_2, Frac), but these defaults put log10EC50_1 in slot 4 and HS2 in slot 5. That lets HS2 roam over concentration-sized bounds and constrains log10EC50_1 to [0, 1.5], which can easily produce nonsense biphasic fits.
Suggested fix
} else {
default_lower <- c(
0.1,
0,
0,
- min(log_conc) - 3,
0.1,
0,
+ min(log_conc) - 3,
min(log_conc) - 3,
0
)
default_upper <- c(
5,
1.5,
1.5,
- max(log_conc) + 3,
5,
1.5,
+ max(log_conc) + 3,
max(log_conc) + 3,
1
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| default_lower <- c( | |
| 0.1, | |
| 0, | |
| 0, | |
| min(log_conc) - 3, | |
| 0.1, | |
| 0, | |
| min(log_conc) - 3, | |
| 0 | |
| ) | |
| default_upper <- c( | |
| 5, | |
| 1.5, | |
| 1.5, | |
| max(log_conc) + 3, | |
| 5, | |
| 1.5, | |
| max(log_conc) + 3, | |
| 1 | |
| ) | |
| default_lower <- c( | |
| 0.1, | |
| 0, | |
| 0, | |
| 0.1, | |
| 0, | |
| min(log_conc) - 3, | |
| min(log_conc) - 3, | |
| 0 | |
| ) | |
| default_upper <- c( | |
| 5, | |
| 1.5, | |
| 1.5, | |
| 5, | |
| 1.5, | |
| max(log_conc) + 3, | |
| max(log_conc) + 3, | |
| 1 | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/logLogisticRegression.R` around lines 211 - 230, The biphasic default
bounds in default_lower/default_upper are mis-ordered relative to the parameter
order used by .pgx_initial_guess() and .pgx_biphasic_curve (HS1, E0, E_inf1,
HS2, E_inf2, log10EC50_1, log10EC50_2, Frac); fix by reordering the vectors so
slot 4 contains HS2 bounds (use 0.1..5 like HS1), slots 6 and 7 contain the EC50
bounds (use min(log_conc)-3 and max(log_conc)+3 respectively), and E_inf2
remains in slot 5 and Frac in slot 8—update both default_lower and default_upper
accordingly.
| .pgx_biphasic_curve <- function(x, pars) { | ||
| hs1 <- pars[1] | ||
| e0 <- pars[2] | ||
| einf1 <- pars[3] | ||
| hs2 <- pars[4] | ||
| einf2 <- pars[5] | ||
| log_ec50_1 <- pars[6] | ||
| log_ec50_2 <- pars[7] | ||
| frac <- pars[8] | ||
|
|
||
| span <- max(einf1, einf2) - e0 | ||
| section1 <- span * frac / (1 + 10^((log_ec50_1 - x) * hs1)) | ||
| section2 <- span * (1 - frac) / (1 + 10^((log_ec50_2 - x) * hs2)) | ||
| e0 + section1 + section2 |
There was a problem hiding this comment.
Use each component’s asymptote in the biphasic evaluator.
span <- max(einf1, einf2) - e0 collapses both branches onto a single bottom. As written, one of E_inf1/E_inf2 is effectively ignored and the high-dose limit always becomes max(E_inf1, E_inf2) instead of the weighted biphasic asymptote.
Suggested fix
- span <- max(einf1, einf2) - e0
- section1 <- span * frac / (1 + 10^((log_ec50_1 - x) * hs1))
- section2 <- span * (1 - frac) / (1 + 10^((log_ec50_2 - x) * hs2))
- e0 + section1 + section2
+ section1 <- frac * (einf1 - e0) / (1 + 10^((log_ec50_1 - x) * hs1))
+ section2 <- (1 - frac) * (einf2 - e0) / (1 + 10^((log_ec50_2 - x) * hs2))
+ e0 + section1 + section2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/logLogisticRegression.R` around lines 821 - 834, The biphasic evaluator
.pgx_biphasic_curve currently uses a single shared span (span <- max(einf1,
einf2) - e0) which forces both branches toward the same high-dose asymptote;
change it to use each component's own asymptote by computing span1 <- einf1 - e0
and span2 <- einf2 - e0 (or similar names) and use span1 in section1 and span2
in section2 so section1 <- span1 * frac / (1 + 10^((log_ec50_1 - x) * hs1)) and
section2 <- span2 * (1 - frac) / (1 + 10^((log_ec50_2 - x) * hs2)), keeping the
final return as e0 + section1 + section2.
| } else if ( | ||
| (datasetType(object) == 'sensitivity' | datasetType(object) == 'both') & | ||
| (length(drugs) != 0 | length(cells) != 0) | ||
| ) { | ||
| drugs_index <- which(sensitivityInfo(object)[, "treatmentid"] %in% drugs) | ||
| cell_line_index <- which(sensitivityInfo(object)[, "sampleid"] %in% cells) | ||
| if (length(drugs_index) != 0 & length(cell_line_index) != 0) { | ||
| if (length(intersect(drugs_index, cell_line_index)) == 0) { | ||
| stop('This Drug - Cell Line combination was not tested together.') | ||
| } | ||
| sensitivityProfiles(object) <- sensitivityProfiles(object)[exps, , drop=drop] | ||
| rownames(sensitivityProfiles(object)) <- names(exps) | ||
|
|
||
| sensNumber(object) <- .summarizeSensitivityNumbers(object) | ||
| row_indices <- intersect(drugs_index, cell_line_index) | ||
| } else { | ||
| if (length(drugs_index) != 0 & length(cells) == 0) { | ||
| row_indices <- drugs_index | ||
| } else { | ||
| if (length(cell_line_index) != 0 & length(drugs) == 0) { | ||
| row_indices <- cell_line_index | ||
| } else { | ||
| row_indices <- vector() | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Explicit empty dimensions should yield zero sensitivity rows.
Lines 233-239 still fall back to the non-empty axis when the other side is deliberately empty. With allow.empty.drugs = TRUE or allow.empty.cells = TRUE, subsetTo() can therefore keep experiments the caller explicitly excluded, while later slots are trimmed to zero drugs/cells.
Proposed fix
} else if (
(datasetType(object) == 'sensitivity' | datasetType(object) == 'both') &
(length(drugs) != 0 | length(cells) != 0)
) {
- drugs_index <- which(sensitivityInfo(object)[, "treatmentid"] %in% drugs)
- cell_line_index <- which(sensitivityInfo(object)[, "sampleid"] %in% cells)
- if (length(drugs_index) != 0 & length(cell_line_index) != 0) {
- if (length(intersect(drugs_index, cell_line_index)) == 0) {
- stop('This Drug - Cell Line combination was not tested together.')
- }
- row_indices <- intersect(drugs_index, cell_line_index)
+ if ((allow.empty.drugs && length(drugs) == 0) ||
+ (allow.empty.cells && length(cells) == 0)) {
+ row_indices <- integer(0)
} else {
- if (length(drugs_index) != 0 & length(cells) == 0) {
- row_indices <- drugs_index
+ drugs_index <- which(sensitivityInfo(object)[, "treatmentid"] %in% drugs)
+ cell_line_index <- which(sensitivityInfo(object)[, "sampleid"] %in% cells)
+ if (length(drugs_index) != 0 & length(cell_line_index) != 0) {
+ if (length(intersect(drugs_index, cell_line_index)) == 0) {
+ stop('This Drug - Cell Line combination was not tested together.')
+ }
+ row_indices <- intersect(drugs_index, cell_line_index)
} else {
- if (length(cell_line_index) != 0 & length(drugs) == 0) {
- row_indices <- cell_line_index
+ if (length(drugs_index) != 0 & length(cells) == 0) {
+ row_indices <- drugs_index
} else {
- row_indices <- vector()
+ if (length(cell_line_index) != 0 & length(drugs) == 0) {
+ row_indices <- cell_line_index
+ } else {
+ row_indices <- integer(0)
+ }
}
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/methods-subsetTo.R` around lines 221 - 242, The code in subsetTo (method in
methods-subsetTo.R) currently falls back to the non-empty axis when the other
axis is deliberately empty, causing experiments to be kept despite
allow.empty.drugs or allow.empty.cells being TRUE; change the logic around
drugs_index / cell_line_index and row_indices so that if length(drugs)==0 and
allow.empty.drugs is TRUE then row_indices is set to integer(0) (and analogously
if length(cells)==0 and allow.empty.cells is TRUE set row_indices to integer(0))
before doing any intersection logic; otherwise still compute drugs_index <-
which(sensitivityInfo(object)[, "treatmentid"] %in% drugs) and cell_line_index
<- which(sensitivityInfo(object)[, "sampleid"] %in% cells) and then apply the
existing intersection/union logic, ensuring explicit empty-requested axes
produce zero-length row_indices rather than falling back to the other axis.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
R/sanityCheck.R (2)
117-135: Inconsistent handling of unnamed/NA-named parameters.
.normalizeHillPars(lines 33-35) handles the case where all names areNAby converting them to empty strings, but.normalizeBiphasicParsdoes not include this handling. This creates inconsistent behavior between the two normalizers when given an unnamed numeric vector.♻️ Suggested fix for consistency
rsq <- attr(hill_fit, "Rsquare", exact = TRUE) hill <- unlist(hill_fit, use.names = TRUE) lower_names <- tolower(names(hill)) + if (all(is.na(lower_names))) { + lower_names <- rep("", length(hill)) + } get_param <- function(Additionally, the
get_paraminner function is duplicated across.normalizeHillPars,.normalizeBiphasicPars, and.pgx_extract_fit_response_params. Consider extracting it to a shared internal helper to reduce duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/sanityCheck.R` around lines 117 - 135, .normalizeBiphasicPars is missing the all-NA-name handling present in .normalizeHillPars, causing inconsistent handling of unnamed numeric vectors; update .normalizeBiphasicPars to detect when all names(hill) are NA and replace them with empty strings (like in .normalizeHillPars) before using lower_names and get_param logic, and then remove the duplicated get_param inner function from .normalizeHillPars, .normalizeBiphasicPars, and .pgx_extract_fit_response_params by extracting that logic into a single internal helper (e.g., pgx_get_param) that takes (hill, lower_names, candidates, fallback_index, default) and returns the numeric value to avoid duplication.
276-281: Redundant null check ontop_param.
top_paramis assigned viaunname(response_params["E0"]), which returnsNA_real_(notNULL) when "E0" is absent. The!is.null(top_param)check at line 278 will always beTRUE. The code still works correctly becauseis.finite(top_param)handles theNAcase, but the null check is misleading.♻️ Suggested simplification
} else if ( viability_as_pct && - !is.null(top_param) && is.finite(top_param) && top_param <= 1 + tol ) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/sanityCheck.R` around lines 276 - 281, The condition block checking viability_as_pct currently includes a redundant !is.null(top_param) test; since top_param is obtained via unname(response_params["E0"]) which yields NA_real_ when missing, the null check is always true and misleading—remove the !is.null(top_param) clause from the if condition so it reads: if (viability_as_pct && is.finite(top_param) && top_param <= 1 + tol) (leave the surrounding logic in the same block, referencing the same symbols top_param, viability_as_pct, tol).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/sanityCheck.R`:
- Around line 565-569: Fix the typo in the diagnostic message: replace the
incorrect "conC_as_log" string with "conc_as_log" so the message call inside the
conditional that checks conc_as_log, min(conc) and verbose prints the correct
variable name; update the message(...) invocation that currently uses
"conC_as_log" to "conc_as_log" to match the variable used in the condition.
- Around line 461-464: The debug message contains a typo: the second message
prints "conC_as_log" instead of the variable name; update the message call that
references conc_as_log (the message("conC_as_log: ", conc_as_log) invocation) to
use the correct string "conc_as_log" so the logged text matches the variable
name.
- Around line 575-588: The code computes log_conc before ensuring conc is
sorted, causing log_conc to remain unsorted when conc is reordered later; move
the unsorted check and reordering (the is.unsorted(conc) block that creates myx
via order(conc) and reassigns conc) to occur before computing log_conc (the
branch using conc_as_log, conc, and log10(conc)), so that log_conc is derived
from the already-sorted conc; ensure references to conc_as_log, conc, log_conc,
myx, order(), is.unsorted(), and the returned Hill_fit remain intact.
---
Nitpick comments:
In `@R/sanityCheck.R`:
- Around line 117-135: .normalizeBiphasicPars is missing the all-NA-name
handling present in .normalizeHillPars, causing inconsistent handling of unnamed
numeric vectors; update .normalizeBiphasicPars to detect when all names(hill)
are NA and replace them with empty strings (like in .normalizeHillPars) before
using lower_names and get_param logic, and then remove the duplicated get_param
inner function from .normalizeHillPars, .normalizeBiphasicPars, and
.pgx_extract_fit_response_params by extracting that logic into a single internal
helper (e.g., pgx_get_param) that takes (hill, lower_names, candidates,
fallback_index, default) and returns the numeric value to avoid duplication.
- Around line 276-281: The condition block checking viability_as_pct currently
includes a redundant !is.null(top_param) test; since top_param is obtained via
unname(response_params["E0"]) which yields NA_real_ when missing, the null check
is always true and misleading—remove the !is.null(top_param) clause from the if
condition so it reads: if (viability_as_pct && is.finite(top_param) && top_param
<= 1 + tol) (leave the surrounding logic in the same block, referencing the same
symbols top_param, viability_as_pct, tol).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6962e39d-ed21-4b6d-a734-196bf7ac99bf
📒 Files selected for processing (2)
R/sanityCheck.Rtests/testthat/test_sanitizeInput.R
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/testthat/test_sanitizeInput.R
| if (identical(verbose, 2)) { | ||
| message("Concentration input: ", toString(conc)) | ||
| message("conC_as_log flag: ", conc_as_log) | ||
| } |
There was a problem hiding this comment.
Typo in debug message: "conC_as_log" should be "conc_as_log".
✏️ Proposed fix
if (identical(verbose, 2)) {
message("Concentration input: ", toString(conc))
- message("conC_as_log flag: ", conc_as_log)
+ message("conc_as_log flag: ", conc_as_log)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (identical(verbose, 2)) { | |
| message("Concentration input: ", toString(conc)) | |
| message("conC_as_log flag: ", conc_as_log) | |
| } | |
| if (identical(verbose, 2)) { | |
| message("Concentration input: ", toString(conc)) | |
| message("conc_as_log flag: ", conc_as_log) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/sanityCheck.R` around lines 461 - 464, The debug message contains a typo:
the second message prints "conC_as_log" instead of the variable name; update the
message call that references conc_as_log (the message("conC_as_log: ",
conc_as_log) invocation) to use the correct string "conc_as_log" so the logged
text matches the variable name.
| if (conc_as_log == FALSE && min(conc) < 0) { | ||
| if (identical(verbose, 2)) { | ||
| message("Concentration input: ", toString(conc)) | ||
| message("conC_as_log flag: ", conc_as_log) | ||
| } |
There was a problem hiding this comment.
Same typo: "conC_as_log" should be "conc_as_log".
✏️ Proposed fix
if (identical(verbose, 2)) {
message("Concentration input: ", toString(conc))
- message("conC_as_log flag: ", conc_as_log)
+ message("conc_as_log flag: ", conc_as_log)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (conc_as_log == FALSE && min(conc) < 0) { | |
| if (identical(verbose, 2)) { | |
| message("Concentration input: ", toString(conc)) | |
| message("conC_as_log flag: ", conc_as_log) | |
| } | |
| if (conc_as_log == FALSE && min(conc) < 0) { | |
| if (identical(verbose, 2)) { | |
| message("Concentration input: ", toString(conc)) | |
| message("conc_as_log flag: ", conc_as_log) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/sanityCheck.R` around lines 565 - 569, Fix the typo in the diagnostic
message: replace the incorrect "conC_as_log" string with "conc_as_log" so the
message call inside the conditional that checks conc_as_log, min(conc) and
verbose prints the correct variable name; update the message(...) invocation
that currently uses "conC_as_log" to "conc_as_log" to match the variable used in
the condition.
| if (conc_as_log == FALSE) { | ||
| ii <- which(conc == 0) | ||
| if (length(ii) > 0) { | ||
| conc <- conc[-ii] | ||
| } | ||
| log_conc <- log10(conc) | ||
| } else { | ||
| log_conc <- conc | ||
| } | ||
| if (is.unsorted(conc)) { | ||
| myx <- order(conc) | ||
| conc <- conc[myx] | ||
| } | ||
| return(list("Hill_fit" = Hill_fit, "log_conc" = log_conc)) |
There was a problem hiding this comment.
Bug: log_conc not sorted when conc is unsorted.
The sorting check at lines 584-587 occurs after log_conc is computed at lines 575-583. If conc is unsorted, log_conc will be returned unsorted while conc is sorted but discarded.
Compare to the viability branch (lines 486-506) where sorting correctly happens before computing log_conc.
🐛 Proposed fix: move sorting before log_conc computation
+ if (is.unsorted(conc)) {
+ warning(
+ "Concentration Values were unsorted. Sorting concentration values."
+ )
+ conc <- sort(conc)
+ }
+
if (conc_as_log == FALSE) {
ii <- which(conc == 0)
if (length(ii) > 0) {
conc <- conc[-ii]
}
log_conc <- log10(conc)
} else {
log_conc <- conc
}
- if (is.unsorted(conc)) {
- myx <- order(conc)
- conc <- conc[myx]
- }
return(list("Hill_fit" = Hill_fit, "log_conc" = log_conc))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/sanityCheck.R` around lines 575 - 588, The code computes log_conc before
ensuring conc is sorted, causing log_conc to remain unsorted when conc is
reordered later; move the unsorted check and reordering (the is.unsorted(conc)
block that creates myx via order(conc) and reassigns conc) to occur before
computing log_conc (the branch using conc_as_log, conc, and log10(conc)), so
that log_conc is derived from the already-sorted conc; ensure references to
conc_as_log, conc, log_conc, myx, order(), is.unsorted(), and the returned
Hill_fit remain intact.
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (5)
tests/testthat/test_summarizeSensitivityProfiles_longtable.R (1)
87-95: Error-message regex is brittle.The
regexpstitches three tokens with.*into a single fixed expected pattern. Any benign rewording of the error message (e.g., removing the word "explicit", or suggesting "sensitivity measures" plural) will break the test. Consider relaxing to match a single stable token (e.g., just"sensitivity\\.measure"), or useexpect_error(..., class = "…")if a condition class is assigned.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test_summarizeSensitivityProfiles_longtable.R` around lines 87 - 95, The test for summarizeSensitivityProfiles uses a brittle regexp that concatenates multiple tokens with ".*"; update the test to assert a more stable error indicator by matching a single reliable token such as "sensitivity\\.measure" (or prefer using expect_error(..., class = "your_condition_class") if the function signals a specific condition class) so the test only depends on a stable error identifier; locate the expect_error call in test_summarizeSensitivityProfiles_longtable.R and replace the complex regexp with a simpler pattern or switch to the class-based expect_error check referencing summarizeSensitivityProfiles and the "sensitivity.measure" token.tests/testthat/test_drugSensitivitySig.R (1)
56-83: Good coverage — silent fallback whensProfilessupplied.This exercise of the "when
sProfilesis provided, an invalidsensitivity.measureis ignored" path is exactly the sort of behavior that's easy to regress silently. If possible, also assert that no warning/message is emitted in this case (expect_silent(...)orexpect_warning(..., regexp = NA)), so the fallback remains truly silent rather than accidentally becoming noisy.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test_drugSensitivitySig.R` around lines 56 - 83, The test should assert that supplying sProfiles causes the invalid sensitivity.measure to be ignored silently; wrap the drugSensitivitySig invocation (which returns res) in an assertion that no warnings/messages are emitted—e.g. replace the plain call with expect_silent(drugSensitivitySig(...)) or use expect_warning(..., regexp = NA) around the drugSensitivitySig(...) call—so the test explicitly checks that drugSensitivitySig with sProfiles and an invalid sensitivity.measure produces no output.R/computeSynergy.R (1)
1723-1738: Mixed use of.Hill()andhillCurve()within the same function.
viability_1/viability_2(monotherapy baselines) use.Hill()while the projected curves usehillCurve(). After this PR,computeZIP()was migrated tohillCurve()for the exact same monotherapy calculation. For consistency (and so the two baselines used in delta-score computation match the ZIP reference monotherapy baseline exactly), consider usinghillCurve()here too.♻️ Suggested diff
- viability_1 <- .Hill(log10(treatment1dose), c(HS_1, E_inf_1, log10(EC50_1))) - viability_2 <- .Hill(log10(treatment2dose), c(HS_2, E_inf_2, log10(EC50_2))) + viability_1 <- hillCurve( + dose = log10(treatment1dose), + HS = HS_1, EC50 = log10(EC50_1), E_inf = E_inf_1 + ) + viability_2 <- hillCurve( + dose = log10(treatment2dose), + HS = HS_2, EC50 = log10(EC50_2), E_inf = E_inf_2 + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/computeSynergy.R` around lines 1723 - 1738, The monotherapy baselines (viability_1, viability_2) are computed with .Hill() while projected curves use hillCurve(), causing inconsistency with computeZIP; replace the .Hill(...) calls with hillCurve(...) so both baselines use the same implementation: call hillCurve(dose = log10(treatment1dose), HS = HS_1, EC50 = log10(EC50_1), E_ninf = 1, E_inf = E_inf_1) for viability_1 and similarly for viability_2 (use HS_2, EC50_2, E_inf_2), keeping the variable names viability_1 and viability_2 unchanged so downstream delta-score logic and computeZIP align.R/curveFittingPGX.R (1)
44-78: Minor: one-sided viability cap and bare data.table column references.
- Line 78:
pmin(exp_dt$viability * 100, 100)clamps only the upper bound. Negative viabilities (a common QC artefact) pass through silently. Considerpmin(pmax(exp_dt$viability * 100, 0), 100)for symmetry, even though downstreamtrunc = TRUEwill truncate during fitting.- Lines 54, 68:
dt[, experiment := paste(cell_id, drug_id, ...)]anddt[, .N, by = experiment][N >= 3, experiment]use bare names that will triggerR CMD check"no visible binding for global variable" NOTEs forcell_id,drug_id,experiment, andN. Either declare them viautils::globalVariables(c("cell_id", "drug_id", "experiment", "N"))at the top of the file or reference them defensively (e.g. through..required_cols/.SD).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/curveFittingPGX.R` around lines 44 - 78, The code currently only clamps viability upward and uses bare data.table columns causing R CMD check NOTEs; change the viability calculation in the fit loop to clamp both bounds (use pmin(pmax(exp_dt$viability * 100, 0), 100) instead of pmin(...,100)) and eliminate bare-variable NOTES by either declaring utils::globalVariables(c("cell_id","drug_id","experiment","N")) at the top of the file or switching the data.table calls to defensive referencing (e.g. use .SD/..required_cols or explicit column access) for the places using dt[, experiment := paste(cell_id, drug_id, ...)] and dt[, .N, by = experiment][N >= 3, experiment].R/methods-drugSensitivitySig.R (1)
378-385: Optional: validatemProfilesdimensions before subscripting.
SummarizedExperiment::assay(molecular_profiles) <- mProfiles[features, colnames(molecular_profiles), drop = FALSE]will throw a rawsubscript out of boundsif the caller-suppliedmProfilesis missing any of the requested features or sample columns. A short up-frontcheckmate::assert_subset(or similar) gives a much clearer error for this advanced-user override path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/methods-drugSensitivitySig.R` around lines 378 - 385, Validate caller-supplied mProfiles before subscripting to avoid raw "subscript out of bounds" errors: check that all features exist in rownames(mProfiles) and that all sample column names (colnames(molecular_profiles)) exist in colnames(mProfiles) (e.g., with checkmate::assert_subset or similar), and if the checks fail emit a clear error explaining the missing rows/columns; only after these assertions assign SummarizedExperiment::assay(molecular_profiles) <- mProfiles[features, colnames(molecular_profiles), drop = FALSE].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/computeDrugSensitivity.R`:
- Around line 139-145: Add a clear comment at both assignment sites where
auc_recomputed_star and the returned "AUC" list element are set to AAC (the
lines assigning sensitivityProfiles(... )$auc_recomputed_star <-
aac_recomputed_star and the return(list("AAC" = AAC, "AUC" = AAC, ...))),
stating this is an intentional one-release back-compat alias and documenting the
semantic difference (AAC vs AUC). In addition, update the resolution path in
.resolveSensitivityMeasureName to emit a run-time deprecation warning (e.g., via
.Deprecated or warning()) whenever a value is resolved using the legacy alias
(not just the argument name), so callers accessing the old column/element get a
clear signal; reference the symbols sensitivityProfiles, auc_recomputed_star,
and the returned "AUC" list key when implementing the warning.
In `@R/computeDSS.R`:
- Around line 73-86: The code currently only treats fractional t_param as
deprecated when !viability_as_pct; instead detect t_param <= 1 regardless of
viability_as_pct and handle it consistently: if t_param <= 1, emit a warning
(mentioning viability_as_pct when TRUE) and set t_param_pct <- t_param * 100 so
downstream t_param_fraction <- t_param_pct / 100 is correct; update the warning
text and the condition (replace the !viability_as_pct && t_param <= 1 check with
t_param <= 1 and include viability_as_pct in the message) so calls with
viability_as_pct = TRUE and t_param = 0.1 are not silently misinterpreted.
- Around line 73-102: The change shifts threshold semantics by calling
computeACn with viability_as_pct = FALSE and n = t_param_fraction (see
t_param_fraction, t_param_pct, hill_external and the computeACn(...) call in
computeDSS), so add a regression test that compares DSS outputs before-and-after
for representative curves (including cases where E0 != 1 and E_inf != 0) to
detect numeric shifts, and update NEWS.md (or relevant changelog) describing
that t_param is now treated as an absolute % inhibition (affecting DSS on
existing PSets); ensure tests live in tests/testthat/test_computeDSS.R and
reference computeDSS behavior for reproducibility.
In `@R/computeSynergy.R`:
- Around line 40-67: The function effectToDose currently uses rep_len(EC50,
len)/rep_len(HS, len)/rep_len(E_inf, len) which silently truncates vectors
longer than viability; update effectToDose to validate input lengths up front by
asserting that length(EC50), length(HS), and length(E_inf) are each either 1 or
equal to length(viability) (len) and throw a clear error if not, then expand
scalars to length len (or use a helper like computeLoewe::expand_param) before
continuing; reference the effectToDose parameters EC50, HS, E_inf and the
rep_len usage when locating the change.
In `@R/rankGeneDrugSensitivity.R`:
- Around line 185-189: The ccix boolean is inverted: it currently marks rows as
usable if any data exists or if type/batch are NA; change it to mark rows usable
only when the row is not entirely NA in both data and drugpheno AND type and
batch are present. Concretely, replace the ccix assignment so it requires
non-all-NA in data and drugpheno and non-missing type and batch (for example:
ccix <- !(rowSums(is.na(data)) == ncol(data) & rowSums(is.na(drugpheno)) ==
ncol(drugpheno)) & !is.na(type) & !is.na(batch)) and then keep ccix <-
ccix[iix]; this ensures sum(ccix) correctly reflects the number of complete
cases used by geneDrugSensitivity* and the subsequent if (sum(ccix) < 3) guard
works as intended.
- Around line 73-86: Pre-ranking drugpheno globally causes mismatched
denominators vs per-feature ranks; instead compute ranks on the intersected
complete-case pairs inside the correlation routine: remove or skip the global
ranking in the spearman branch (the block that transforms drugpheno) and move
the rank() calls into geneDrugSensitivityPCorr (and/or fit_feature) so that for
each feature you first subset both vectors to complete.cases and then rank each
vector using ties.method="average" and na.last="keep" before computing Spearman;
reference the variables drugpheno and functions fit_feature and
geneDrugSensitivityPCorr when making this change.
In `@tests/testthat/test_computeAUC.R`:
- Around line 62-64: The test assumes computeIC50() uses reference="relative";
make this explicit to avoid fragility: when computing metrics inside
test_computeAUC.R call computeIC50(..., reference="relative") (or otherwise
compute ic50_relative first) and assert metrics$ic50_relative == metrics$EC50
(tolerance=0.1), or alternatively change the dose-response fit parameters to
symmetric values (e.g., E0=1, E_inf=0) so relative and absolute IC50 are
identical before keeping the existing equality assertion; reference computeIC50
and the test variables metrics / EC50 to locate the change.
In `@tests/testthat/test_summarizeSensitivityProfiles_longtable.R`:
- Around line 26-48: Add a parallel test fixture in
tests/testthat/test_summarizeSensitivityProfiles_longtable.R that mirrors the
existing profiles case but uses NA_character_ for treatment2id in monotherapy
rows instead of "" so summarizeSensitivityProfiles' monotherapy detection (the
mono_rows check that uses is.na(...) | == "") is exercised for both forms;
update or add a new test case that builds an assay with treatment2id =
c(NA_character_, NA_character_, "drugX", "drugX") (and corresponding
sampleid/SCORE) and assert the same expected summarization outputs as the
empty-string variant.
---
Nitpick comments:
In `@R/computeSynergy.R`:
- Around line 1723-1738: The monotherapy baselines (viability_1, viability_2)
are computed with .Hill() while projected curves use hillCurve(), causing
inconsistency with computeZIP; replace the .Hill(...) calls with hillCurve(...)
so both baselines use the same implementation: call hillCurve(dose =
log10(treatment1dose), HS = HS_1, EC50 = log10(EC50_1), E_ninf = 1, E_inf =
E_inf_1) for viability_1 and similarly for viability_2 (use HS_2, EC50_2,
E_inf_2), keeping the variable names viability_1 and viability_2 unchanged so
downstream delta-score logic and computeZIP align.
In `@R/curveFittingPGX.R`:
- Around line 44-78: The code currently only clamps viability upward and uses
bare data.table columns causing R CMD check NOTEs; change the viability
calculation in the fit loop to clamp both bounds (use pmin(pmax(exp_dt$viability
* 100, 0), 100) instead of pmin(...,100)) and eliminate bare-variable NOTES by
either declaring utils::globalVariables(c("cell_id","drug_id","experiment","N"))
at the top of the file or switching the data.table calls to defensive
referencing (e.g. use .SD/..required_cols or explicit column access) for the
places using dt[, experiment := paste(cell_id, drug_id, ...)] and dt[, .N, by =
experiment][N >= 3, experiment].
In `@R/methods-drugSensitivitySig.R`:
- Around line 378-385: Validate caller-supplied mProfiles before subscripting to
avoid raw "subscript out of bounds" errors: check that all features exist in
rownames(mProfiles) and that all sample column names
(colnames(molecular_profiles)) exist in colnames(mProfiles) (e.g., with
checkmate::assert_subset or similar), and if the checks fail emit a clear error
explaining the missing rows/columns; only after these assertions assign
SummarizedExperiment::assay(molecular_profiles) <- mProfiles[features,
colnames(molecular_profiles), drop = FALSE].
In `@tests/testthat/test_drugSensitivitySig.R`:
- Around line 56-83: The test should assert that supplying sProfiles causes the
invalid sensitivity.measure to be ignored silently; wrap the drugSensitivitySig
invocation (which returns res) in an assertion that no warnings/messages are
emitted—e.g. replace the plain call with expect_silent(drugSensitivitySig(...))
or use expect_warning(..., regexp = NA) around the drugSensitivitySig(...)
call—so the test explicitly checks that drugSensitivitySig with sProfiles and an
invalid sensitivity.measure produces no output.
In `@tests/testthat/test_summarizeSensitivityProfiles_longtable.R`:
- Around line 87-95: The test for summarizeSensitivityProfiles uses a brittle
regexp that concatenates multiple tokens with ".*"; update the test to assert a
more stable error indicator by matching a single reliable token such as
"sensitivity\\.measure" (or prefer using expect_error(..., class =
"your_condition_class") if the function signals a specific condition class) so
the test only depends on a stable error identifier; locate the expect_error call
in test_summarizeSensitivityProfiles_longtable.R and replace the complex regexp
with a simpler pattern or switch to the class-based expect_error check
referencing summarizeSensitivityProfiles and the "sensitivity.measure" token.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a60c17ec-4557-4970-b205-d7a5f3336a17
📒 Files selected for processing (30)
NAMESPACER/computeAUC.RR/computeDSS.RR/computeDrugSensitivity.RR/computeIC50.RR/computeICn.RR/computeSynergy.RR/curveFittingPGX.RR/drugDoseResponseCurve.RR/globals.RR/methods-drugSensitivitySig.RR/methods-summarizeMolecularProfiles.RR/methods-summarizeSensitivityProfiles.RR/rankGeneDrugSensitivity.Rman/computeAUC.Rdman/computeICn.Rdman/dot-summarizeSensProfiles.Rdman/drugDoseResponseCurve.Rdman/drugSensitivitySig-PharmacoSet-method.Rdman/summarizeSensitivityProfiles-PharmacoSet-method.Rdtests/testthat/test_computeAUC.Rtests/testthat/test_computeDSS.Rtests/testthat/test_computeIC50.Rtests/testthat/test_computeZIPdelta.Rtests/testthat/test_drugSensitivitySig.Rtests/testthat/test_summarizeSensitivityProfiles.Rtests/testthat/test_summarizeSensitivityProfiles_longtable.Rvignettes/DetectingDrugSynergyAndAntagonism.Rvignettes/DetectingDrugSynergyAndAntagonism.Rmdvignettes/PharmacoGx.Rmd
✅ Files skipped from review due to trivial changes (3)
- man/summarizeSensitivityProfiles-PharmacoSet-method.Rd
- vignettes/DetectingDrugSynergyAndAntagonism.R
- tests/testthat/test_computeDSS.R
🚧 Files skipped from review as they are similar to previous changes (10)
- man/drugSensitivitySig-PharmacoSet-method.Rd
- NAMESPACE
- man/computeAUC.Rd
- R/methods-summarizeMolecularProfiles.R
- man/computeICn.Rd
- tests/testthat/test_summarizeSensitivityProfiles.R
- R/globals.R
- man/drugDoseResponseCurve.Rd
- tests/testthat/test_computeIC50.R
- R/computeIC50.R
| sensitivityProfiles(pSets[[study]])$aac_recomputed_star <- | ||
| aac_recomputed_star | ||
| sensitivityProfiles(pSets[[study]])$auc_recomputed_star <- | ||
| aac_recomputed_star | ||
| } | ||
| return(pSets) | ||
| } |
There was a problem hiding this comment.
Silent AAC-for-AUC aliasing in legacy output columns deserves a comment and/or runtime deprecation signal.
Two places quietly overwrite the "AUC" output with AAC values:
- Lines 139–142:
sensitivityProfiles(...)$auc_recomputed_star <- aac_recomputed_star. - Line 301:
return(list("AAC" = AAC, "AUC" = AAC, "IC50" = IC50, "pars" = pars)).
Downstream code (or user scripts) reading the auc_recomputed_star column or the AUC list element on a freshly-recomputed PSet will now receive response-area (AAC) values instead of viability-area (AUC) values. Because this reverses the sign/scale meaning of the number, consumers comparing against previously-recomputed or published AUC values will silently get different semantics.
Suggested mitigations:
- Add an explicit comment at both sites documenting that these are intentional aliases for one-release back-compat.
- Consider emitting a
.Deprecated-style warning from.resolveSensitivityMeasureNamethat fires when the value (not just the argument name) is resolved from the legacy alias, so users get a signal on read paths that query the old column name.
Also applies to: 298-302
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/computeDrugSensitivity.R` around lines 139 - 145, Add a clear comment at
both assignment sites where auc_recomputed_star and the returned "AUC" list
element are set to AAC (the lines assigning sensitivityProfiles(...
)$auc_recomputed_star <- aac_recomputed_star and the return(list("AAC" = AAC,
"AUC" = AAC, ...))), stating this is an intentional one-release back-compat
alias and documenting the semantic difference (AAC vs AUC). In addition, update
the resolution path in .resolveSensitivityMeasureName to emit a run-time
deprecation warning (e.g., via .Deprecated or warning()) whenever a value is
resolved using the legacy alias (not just the argument name), so callers
accessing the old column/element get a clear signal; reference the symbols
sensitivityProfiles, auc_recomputed_star, and the returned "AUC" list key when
implementing the warning.
| if (!viability_as_pct && t_param <= 1) { | ||
| warning( | ||
| paste( | ||
| "Fractional `t_param` values are deprecated when", | ||
| "`viability_as_pct = FALSE`.", | ||
| "Interpreting the input as a fractional inhibition threshold." | ||
| ), | ||
| call. = FALSE | ||
| ) | ||
| t_param_pct <- t_param * 100 | ||
| } else { | ||
| t_param_pct <- t_param | ||
| } | ||
|
|
||
| x2 = max(concentration) | ||
| x1 = computeICn(concentration = concentration, Hill_fit = unlist(pars), n = t_param, conc_as_log = TRUE, viability_as_pct = TRUE) | ||
| if(!is.finite(x1)){return(0)} | ||
|
|
||
| x1 <- max(x1, min(concentration)) | ||
|
|
||
| t_param_fraction <- t_param_pct / 100 |
There was a problem hiding this comment.
Fractional t_param silently misinterpreted when viability_as_pct = TRUE.
The deprecation branch is gated on !viability_as_pct && t_param <= 1. A user calling with the default viability_as_pct = TRUE but passing t_param = 0.1 (intending 10 %) falls through to t_param_pct <- 0.1 and t_param_fraction <- 0.001 — i.e. a 0.1 % threshold, with no warning. Consider also warning (or rejecting) t_param <= 1 when viability_as_pct = TRUE, since no realistic DSS analysis wants a sub-1 % cut-off.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/computeDSS.R` around lines 73 - 86, The code currently only treats
fractional t_param as deprecated when !viability_as_pct; instead detect t_param
<= 1 regardless of viability_as_pct and handle it consistently: if t_param <= 1,
emit a warning (mentioning viability_as_pct when TRUE) and set t_param_pct <-
t_param * 100 so downstream t_param_fraction <- t_param_pct / 100 is correct;
update the warning text and the condition (replace the !viability_as_pct &&
t_param <= 1 check with t_param <= 1 and include viability_as_pct in the
message) so calls with viability_as_pct = TRUE and t_param = 0.1 are not
silently misinterpreted.
| if (!viability_as_pct && t_param <= 1) { | ||
| warning( | ||
| paste( | ||
| "Fractional `t_param` values are deprecated when", | ||
| "`viability_as_pct = FALSE`.", | ||
| "Interpreting the input as a fractional inhibition threshold." | ||
| ), | ||
| call. = FALSE | ||
| ) | ||
| t_param_pct <- t_param * 100 | ||
| } else { | ||
| t_param_pct <- t_param | ||
| } | ||
|
|
||
| x2 = max(concentration) | ||
| x1 = computeICn(concentration = concentration, Hill_fit = unlist(pars), n = t_param, conc_as_log = TRUE, viability_as_pct = TRUE) | ||
| if(!is.finite(x1)){return(0)} | ||
|
|
||
| x1 <- max(x1, min(concentration)) | ||
|
|
||
| t_param_fraction <- t_param_pct / 100 | ||
|
|
||
| hill_external <- list( | ||
| HS = pars[["HS"]], | ||
| E0 = pars[["E0"]], | ||
| E_inf = pars[["E_inf"]], | ||
| log10EC50 = log_ec50 | ||
| ) | ||
|
|
||
| x2 <- max(concentration) | ||
| x1 <- computeACn( | ||
| concentration = concentration, | ||
| Hill_fit = hill_external, | ||
| n = t_param_fraction, | ||
| conc_as_log = TRUE, | ||
| viability_as_pct = FALSE | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Look for DSS expected-value tests that may pin pre-PR numeric outputs.
rg -nP -C3 'computeDSS\s*\(' --type=r
rg -nP -C3 '(expect_equal|expect_identical)\s*\([^)]*dss' --type=r -iRepository: bhklab/PharmacoGx
Length of output: 2105
🏁 Script executed:
cat -n tests/testthat/test_computeDSS.RRepository: bhklab/PharmacoGx
Length of output: 1622
🏁 Script executed:
# Check the implementation of computeICn to understand the old "relative" semantics
rg -A 20 "function.*computeICn" R/Repository: bhklab/PharmacoGx
Length of output: 43
🏁 Script executed:
# Check the implementation of computeACn to understand the new "absolute" semantics
rg -A 20 "function.*computeACn" R/Repository: bhklab/PharmacoGx
Length of output: 43
🏁 Script executed:
# Search for changelog/NEWS files that might document this change
fd -i "news|changelog" .Repository: bhklab/PharmacoGx
Length of output: 70
🏁 Script executed:
# Find files containing computeICn or computeACn
fd -i "compute" R/ | head -20Repository: bhklab/PharmacoGx
Length of output: 248
🏁 Script executed:
# Check NEWS.md to see if this change is documented
head -100 NEWS.mdRepository: bhklab/PharmacoGx
Length of output: 2886
🏁 Script executed:
# Search for ICn or ACn in the R directory to find these functions
rg "computeICn|computeACn" R/Repository: bhklab/PharmacoGx
Length of output: 975
🏁 Script executed:
cat -n R/computeICn.R | head -150Repository: bhklab/PharmacoGx
Length of output: 4458
🏁 Script executed:
cat -n R/computeDSS.R | head -140Repository: bhklab/PharmacoGx
Length of output: 4573
Threshold semantics changed from relative to absolute — please confirm regression test coverage and changelog documentation.
The new code uses computeACn(..., viability_as_pct=FALSE, n=t_param_fraction) (line 96), which targets an absolute viability threshold 1 - t_param_fraction. The previous implementation would have used relative-reference calculation: E0 − frac · (E0 − E_inf).
For any curve where E0 ≠ 1 or E_inf ≠ 0, the concentration at which the threshold is reached — and therefore the resulting DSS — will differ from prior releases. This aligns with DSS literature definitions of t_param as absolute % inhibition, but it is a behavior change that affects downstream recomputed DSS values for existing PSets.
The current test suite (tests/testthat/test_computeDSS.R) verifies internal consistency only; it does not pin numeric outputs against pre-PR values and would not catch this semantic shift. Additionally, this change is not documented in NEWS.md.
Please confirm:
- Regression tests exist comparing DSS values to pre-PR baselines (or document why this is not needed)
- Changelog entry exists to notify downstream users of shifted DSS values on existing PSets
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/computeDSS.R` around lines 73 - 102, The change shifts threshold semantics
by calling computeACn with viability_as_pct = FALSE and n = t_param_fraction
(see t_param_fraction, t_param_pct, hill_external and the computeACn(...) call
in computeDSS), so add a regression test that compares DSS outputs
before-and-after for representative curves (including cases where E0 != 1 and
E_inf != 0) to detect numeric shifts, and update NEWS.md (or relevant changelog)
describing that t_param is now treated as an absolute % inhibition (affecting
DSS on existing PSets); ensure tests live in tests/testthat/test_computeDSS.R
and reference computeDSS behavior for reproducibility.
| effectToDose <- function(viability, EC50, HS, E_inf, is_pct = FALSE) { | ||
| assertLogical(is_pct, len = 1) | ||
| if (is_pct) { | ||
| viability <- viability / 100 | ||
| E_inf <- E_inf / 100 | ||
| } | ||
| EC50 * ((1 - viability) / (viability - E_inf))^(1 / HS) | ||
| assertLogical(is_pct, len = 1) | ||
| if (is_pct) { | ||
| viability <- viability / 100 | ||
| E_inf <- E_inf / 100 | ||
| } | ||
| len <- length(viability) | ||
| EC50 <- rep_len(EC50, len) | ||
| HS <- rep_len(HS, len) | ||
| E_inf <- rep_len(E_inf, len) | ||
|
|
||
| dose <- rep(NA_real_, len) | ||
|
|
||
| invalid_param <- !is.finite(EC50) | EC50 <= 0 | !is.finite(HS) | HS == 0 | ||
| denom <- viability - E_inf | ||
| ratio <- (1 - viability) / denom | ||
| invalid <- invalid_param | | ||
| !is.finite(denom) | | ||
| denom == 0 | | ||
| !is.finite(ratio) | | ||
| ratio <= 0 | ||
| valid_idx <- !invalid | ||
|
|
||
| if (any(valid_idx)) { | ||
| dose[valid_idx] <- EC50[valid_idx] * ratio[valid_idx]^(1 / HS[valid_idx]) | ||
| } | ||
| dose | ||
| } |
There was a problem hiding this comment.
rep_len silently truncates parameter vectors longer than viability.
If a caller passes scalar viability with vector EC50/HS/E_inf, rep_len(EC50, len) will silently drop all but the first element rather than error. The previous behaviour (scalar arithmetic with recycling) would at least warn. Consider asserting length(EC50) %in% c(1L, len) (and same for HS, E_inf) like computeLoewe::expand_param does, to fail loudly on mismatched lengths.
🛠️ Suggested guard
len <- length(viability)
- EC50 <- rep_len(EC50, len)
- HS <- rep_len(HS, len)
- E_inf <- rep_len(E_inf, len)
+ expand <- function(x, name) {
+ if (length(x) == 1L || length(x) == len) return(rep_len(x, len))
+ stop(sprintf("`%s` must have length 1 or %d.", name, len))
+ }
+ EC50 <- expand(EC50, "EC50")
+ HS <- expand(HS, "HS")
+ E_inf <- expand(E_inf, "E_inf")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| effectToDose <- function(viability, EC50, HS, E_inf, is_pct = FALSE) { | |
| assertLogical(is_pct, len = 1) | |
| if (is_pct) { | |
| viability <- viability / 100 | |
| E_inf <- E_inf / 100 | |
| } | |
| EC50 * ((1 - viability) / (viability - E_inf))^(1 / HS) | |
| assertLogical(is_pct, len = 1) | |
| if (is_pct) { | |
| viability <- viability / 100 | |
| E_inf <- E_inf / 100 | |
| } | |
| len <- length(viability) | |
| EC50 <- rep_len(EC50, len) | |
| HS <- rep_len(HS, len) | |
| E_inf <- rep_len(E_inf, len) | |
| dose <- rep(NA_real_, len) | |
| invalid_param <- !is.finite(EC50) | EC50 <= 0 | !is.finite(HS) | HS == 0 | |
| denom <- viability - E_inf | |
| ratio <- (1 - viability) / denom | |
| invalid <- invalid_param | | |
| !is.finite(denom) | | |
| denom == 0 | | |
| !is.finite(ratio) | | |
| ratio <= 0 | |
| valid_idx <- !invalid | |
| if (any(valid_idx)) { | |
| dose[valid_idx] <- EC50[valid_idx] * ratio[valid_idx]^(1 / HS[valid_idx]) | |
| } | |
| dose | |
| } | |
| effectToDose <- function(viability, EC50, HS, E_inf, is_pct = FALSE) { | |
| assertLogical(is_pct, len = 1) | |
| if (is_pct) { | |
| viability <- viability / 100 | |
| E_inf <- E_inf / 100 | |
| } | |
| len <- length(viability) | |
| expand <- function(x, name) { | |
| if (length(x) == 1L || length(x) == len) return(rep_len(x, len)) | |
| stop(sprintf("`%s` must have length 1 or %d.", name, len)) | |
| } | |
| EC50 <- expand(EC50, "EC50") | |
| HS <- expand(HS, "HS") | |
| E_inf <- expand(E_inf, "E_inf") | |
| dose <- rep(NA_real_, len) | |
| invalid_param <- !is.finite(EC50) | EC50 <= 0 | !is.finite(HS) | HS == 0 | |
| denom <- viability - E_inf | |
| ratio <- (1 - viability) / denom | |
| invalid <- invalid_param | | |
| !is.finite(denom) | | |
| denom == 0 | | |
| !is.finite(ratio) | | |
| ratio <= 0 | |
| valid_idx <- !invalid | |
| if (any(valid_idx)) { | |
| dose[valid_idx] <- EC50[valid_idx] * ratio[valid_idx]^(1 / HS[valid_idx]) | |
| } | |
| dose | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/computeSynergy.R` around lines 40 - 67, The function effectToDose currently
uses rep_len(EC50, len)/rep_len(HS, len)/rep_len(E_inf, len) which silently
truncates vectors longer than viability; update effectToDose to validate input
lengths up front by asserting that length(EC50), length(HS), and length(E_inf)
are each either 1 or equal to length(viability) (len) and throw a clear error if
not, then expand scalars to length len (or use a helper like
computeLoewe::expand_param) before continuing; reference the effectToDose
parameters EC50, HS, E_inf and the rep_len usage when locating the change.
| if (modeling.method == "spearman") { | ||
| if (any(unlist(lapply(drugpheno, is.factor)))) { | ||
| stop( | ||
| "Spearman modeling requires continuous sensitivity inputs. Use 'pearson', 'anova', or 'lm' for discrete sensitivity data." | ||
| ) | ||
| } | ||
| drugpheno <- data.frame( | ||
| lapply(drugpheno, function(x) { | ||
| rank(as.numeric(x), na.last = "keep", ties.method = "average") | ||
| }), | ||
| check.names = FALSE | ||
| ) | ||
| rownames(drugpheno) <- rownames(data) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm geneDrugSensitivityPCorr does not re-rank inputs internally.
ast-grep --pattern $'geneDrugSensitivityPCorr <- function($$$) {
$$$
}'Repository: bhklab/PharmacoGx
Length of output: 43
🏁 Script executed:
find . -name "*.R" -type f | head -20Repository: bhklab/PharmacoGx
Length of output: 458
🏁 Script executed:
rg "geneDrugSensitivityPCorr" -A 5 -B 2Repository: bhklab/PharmacoGx
Length of output: 2311
🏁 Script executed:
rg "fit_feature" -A 20 -B 2 R/rankGeneDrugSensitivity.RRepository: bhklab/PharmacoGx
Length of output: 1327
🏁 Script executed:
cat -n R/geneDrugSensitivityPCorr.R | head -100Repository: bhklab/PharmacoGx
Length of output: 4395
🏁 Script executed:
sed -n '95,140p' R/rankGeneDrugSensitivity.RRepository: bhklab/PharmacoGx
Length of output: 1199
🏁 Script executed:
sed -n '100,150p' R/geneDrugSensitivityPCorr.RRepository: bhklab/PharmacoGx
Length of output: 1507
Ranking bias in Spearman modeling: pre-ranked vectors use different denominators than paired analysis.
The global pre-rank of drugpheno (lines 73-86) and per-feature rank in fit_feature (line ~115) are computed on their full vectors with potentially different missing patterns. When geneDrugSensitivityPCorr later filters both vectors to complete cases (line 78), the paired Spearman correlation is computed on ranks derived from different denominators—introducing bias relative to standard paired Spearman on the intersected complete cases.
For example, if feature_data has 10 NAs and drugpheno has 5 NAs, ranks in feature_data reflect the 90-value denominator while ranks in drugpheno reflect the 95-value denominator. The subsequent intersection to complete cases (~85 values) pairs these incomparable ranks.
Consider ranking both vectors together on their intersected complete-case rows inside geneDrugSensitivityPCorr after the complete.cases filter, or document this semantic difference.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/rankGeneDrugSensitivity.R` around lines 73 - 86, Pre-ranking drugpheno
globally causes mismatched denominators vs per-feature ranks; instead compute
ranks on the intersected complete-case pairs inside the correlation routine:
remove or skip the global ranking in the spearman branch (the block that
transforms drugpheno) and move the rank() calls into geneDrugSensitivityPCorr
(and/or fit_feature) so that for each feature you first subset both vectors to
complete.cases and then rank each vector using ties.method="average" and
na.last="keep" before computing Spearman; reference the variables drugpheno and
functions fit_feature and geneDrugSensitivityPCorr when making this change.
| ccix <- rowSums(!is.na(data)) > 0 | | ||
| rowSums(!is.na(drugpheno)) > 0 | | ||
| is.na(type) | | ||
| is.na(batch) | ||
| ccix <- ccix[iix] |
There was a problem hiding this comment.
ccix logic is inverted — the guard almost never triggers.
The variable is named ccix (complete-case index) but this computes the opposite polarity:
rowSums(!is.na(data)) > 0 | rowSums(!is.na(drugpheno)) > 0⇒ TRUE if any column has data (instead of all)is.na(type) | is.na(batch)⇒ keeps rows with missing type/batch (should reject them)
Compare with the commented alternative at Lines 190-192, which correctly rejects a row if all of data/drugpheno are NA or type/batch are NA. As written, sum(ccix) will essentially always meet the >= 3 check, so the if (sum(ccix) < 3) short-circuit to an all-NA result block (Lines 196-202) is effectively dead. Result correctness is preserved only because geneDrugSensitivity* re-apply complete.cases internally, but any experiment with too few usable rows will now go through the full parallel fit path and may return noise-level estimates instead of the all-NA placeholder.
🛠️ Suggested fix
- ccix <- rowSums(!is.na(data)) > 0 |
- rowSums(!is.na(drugpheno)) > 0 |
- is.na(type) |
- is.na(batch)
- ccix <- ccix[iix]
+ ccix <- rowSums(!is.na(data)) > 0 &
+ rowSums(!is.na(drugpheno)) > 0 &
+ !is.na(type) &
+ !is.na(batch)
+ ccix <- ccix[iix]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ccix <- rowSums(!is.na(data)) > 0 | | |
| rowSums(!is.na(drugpheno)) > 0 | | |
| is.na(type) | | |
| is.na(batch) | |
| ccix <- ccix[iix] | |
| ccix <- rowSums(!is.na(data)) > 0 & | |
| rowSums(!is.na(drugpheno)) > 0 & | |
| !is.na(type) & | |
| !is.na(batch) | |
| ccix <- ccix[iix] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/rankGeneDrugSensitivity.R` around lines 185 - 189, The ccix boolean is
inverted: it currently marks rows as usable if any data exists or if type/batch
are NA; change it to mark rows usable only when the row is not entirely NA in
both data and drugpheno AND type and batch are present. Concretely, replace the
ccix assignment so it requires non-all-NA in data and drugpheno and non-missing
type and batch (for example: ccix <- !(rowSums(is.na(data)) == ncol(data) &
rowSums(is.na(drugpheno)) == ncol(drugpheno)) & !is.na(type) & !is.na(batch))
and then keep ccix <- ccix[iix]; this ensures sum(ccix) correctly reflects the
number of complete cases used by geneDrugSensitivity* and the subsequent if
(sum(ccix) < 3) guard works as intended.
| expect_true(all(c("auc", "aac", "EC50", "ic50") %in% colnames(metrics))) | ||
| expect_equal(metrics$auc + metrics$aac, 100, tolerance = 1e-6) | ||
| expect_equal(metrics$ic50, metrics$EC50, tolerance = 0.1) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm what ic50 column curveFittingPGX produces
ast-grep --pattern $'curveFittingPGX <- function($$$) {
$$$
}'
rg -nP -C2 '\bic50\b' R/Repository: bhklab/PharmacoGx
Length of output: 922
🏁 Script executed:
# Get the full computeIC50 call in curveFittingPGX.R
sed -n '119,135p' R/curveFittingPGX.R
# Find computeIC50 function definition
rg -n 'computeIC50\s*<-\s*function' R/Repository: bhklab/PharmacoGx
Length of output: 604
🏁 Script executed:
# Get the computeIC50 function signature and initial lines
sed -n '7,50p' R/computeIC50.RRepository: bhklab/PharmacoGx
Length of output: 1094
🏁 Script executed:
# Check test file context around lines 62-64
sed -n '50,75p' tests/testthat/test_computeAUC.R
# Look for any comments about ic50 or EC50 in test file
rg -n '(ic50|EC50)' tests/testthat/test_computeAUC.R -B2 -A2Repository: bhklab/PharmacoGx
Length of output: 895
🏁 Script executed:
# Check what EC50 column curveFittingPGX returns
sed -n '90,210p' R/curveFittingPGX.R | head -100
# Search for EC50 assignment in curveFittingPGX
rg -n 'EC50.*=' R/curveFittingPGX.R | head -20Repository: bhklab/PharmacoGx
Length of output: 2630
Test relies on implicit assumption that computeIC50 uses reference="relative" by default.
The assertion metrics$ic50 == metrics$EC50 (tolerance 0.1) passes only because computeIC50() defaults to reference="relative". For the fit with HS=1, E0=1, E_inf=0.2, EC50=1:
- Relative IC50 (50% of dynamic range): solving the Hill equation gives exactly
x=1, matchingEC50. - Absolute IC50 (50% viability): solving gives
x ≈ 1.667, which would exceed the tolerance of 0.1.
If computeIC50's default ever changes to reference="absolute", this test will fail. Consider:
- Explicitly computing absolute IC50 and comparing if that becomes the intended semantic, or
- Switching to symmetric parameters (e.g.,
E0=1, E_inf=0) where relative and absolute IC50 are identical regardless.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/testthat/test_computeAUC.R` around lines 62 - 64, The test assumes
computeIC50() uses reference="relative"; make this explicit to avoid fragility:
when computing metrics inside test_computeAUC.R call computeIC50(...,
reference="relative") (or otherwise compute ic50_relative first) and assert
metrics$ic50_relative == metrics$EC50 (tolerance=0.1), or alternatively change
the dose-response fit parameters to symmetric values (e.g., E0=1, E_inf=0) so
relative and absolute IC50 are identical before keeping the existing equality
assertion; reference computeIC50 and the test variables metrics / EC50 to locate
the change.
| treatment1id = c("drugA", "drugB"), | ||
| treatment2id = c("", "drugX"), | ||
| stringsAsFactors = FALSE | ||
| ), | ||
| rowIDs = c("treatment1id", "treatment2id"), | ||
| colData = data.frame( | ||
| sampleid = c("cell1", "cell2"), | ||
| stringsAsFactors = FALSE | ||
| ), | ||
| colIDs = "sampleid", | ||
| assays = list( | ||
| profiles = data.frame( | ||
| treatment1id = c("drugA", "drugA", "drugB", "drugB"), | ||
| treatment2id = c("", "", "drugX", "drugX"), | ||
| sampleid = c("cell1", "cell2", "cell1", "cell2"), | ||
| SCORE = c(1, 2, 3, 4), | ||
| stringsAsFactors = FALSE | ||
| ) | ||
| ), | ||
| assayIDs = list( | ||
| profiles = c("treatment1id", "treatment2id", "sampleid") | ||
| ) | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Look for monotherapy detection logic in summarizeSensitivityProfiles
rg -nP -C3 'treatment2id' R/methods-summarizeSensitivityProfiles.RRepository: bhklab/PharmacoGx
Length of output: 802
Add a test case with NA_character_ for monotherapy rows to improve coverage.
The fixture correctly uses treatment2id = "" to mark monotherapy. However, summarizeSensitivityProfiles() explicitly accepts both empty strings and NA values as monotherapy markers:
mono_rows <- is.na(sens_profiles[["treatment2id"]]) |
sens_profiles[["treatment2id"]] == ""Consider adding a parallel test case that uses NA_character_ instead of "" for treatment2id to ensure the monotherapy detection logic is exercised for both valid forms.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/testthat/test_summarizeSensitivityProfiles_longtable.R` around lines 26
- 48, Add a parallel test fixture in
tests/testthat/test_summarizeSensitivityProfiles_longtable.R that mirrors the
existing profiles case but uses NA_character_ for treatment2id in monotherapy
rows instead of "" so summarizeSensitivityProfiles' monotherapy detection (the
mono_rows check that uses is.na(...) | == "") is exercised for both forms;
update or add a new test case that builds an assay with treatment2id =
c(NA_character_, NA_character_, "drugX", "drugX") (and corresponding
sampleid/SCORE) and assert the same expected summarization outputs as the
empty-string variant.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
R/computeAUC.R (1)
216-218: 💤 Low valueParam docs inherited via
@describeIndescribe viability-area semantics; AAC needs its own scale clarification.
@describeIn computeAUCcausescomputeAACto inherit the@param viability_as_pctdescription from line 166–169, which speaks of "the normalized area on the 0-1 scale" referring to the viability area. ForcomputeAACthe result is the response area (1 - viability_area), so a reader of?computeAACmay still infer they're getting the viability quantity. Consider adding an AAC-specific note (e.g., a short@detailsfor the AAC entry, or overriding via@param viability_as_pcton thecomputeAACblock) clarifying that the same scaling applies but the metric is1 - <viability area>.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@R/computeAUC.R` around lines 216 - 218, The computeAAC documentation currently inherits viability-area wording from `@describeIn` computeAUC which can mislead readers; update the computeAAC Rd block (the documentation for computeAAC) to override or add a short `@details` or a specific `@param` viability_as_pct note clarifying that AAC = 1 - (viability area) and that the same 0–1 scaling applies (i.e., viability_as_pct controls whether inputs are treated as percentages) so users know computeAAC returns the response-area rather than the viability-area; target the computeAAC doc block that uses `@describeIn` computeAUC and add the explicit clarification there.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@R/computeAUC.R`:
- Around line 142-214: computeAUC now returns the viability-area (not the old
response-area), which silently flips semantics; update callers by warning users
and documenting the change: add a one-release warning at the start of computeAUC
(use .Deprecated or a packageStartupMessage/warning inside computeAUC) that
states the function now returns viability-area and directs users to computeAAC
for the old response-area semantics, update documentation (man/computeAUC.Rd)
and NEWS.md with a prominent "Breaking change" note and migration instructions,
and confirm references to .compute_normalized_viability_area and computeAAC in
the message so users can find the new API.
In `@tests/testthat/test_computeIC50.R`:
- Around line 22-29: Test currently expects an error from computeIC50 for a
conc_as_log mismatch but the code emits a warning (from sanityCheck.R), so
change the assertion in tests/testthat/test_computeIC50.R to use
expect_warning(...) instead of expect_error(...), keeping the same call to
computeIC50 and the same matching message "'conc_as_log' flag may be set
incorrectly" to validate the warning path for computeIC50.
---
Nitpick comments:
In `@R/computeAUC.R`:
- Around line 216-218: The computeAAC documentation currently inherits
viability-area wording from `@describeIn` computeAUC which can mislead readers;
update the computeAAC Rd block (the documentation for computeAAC) to override or
add a short `@details` or a specific `@param` viability_as_pct note clarifying that
AAC = 1 - (viability area) and that the same 0–1 scaling applies (i.e.,
viability_as_pct controls whether inputs are treated as percentages) so users
know computeAAC returns the response-area rather than the viability-area; target
the computeAAC doc block that uses `@describeIn` computeAUC and add the explicit
clarification there.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bc78c9ee-9b5d-4d1d-9ffd-7aee651cf681
📒 Files selected for processing (7)
R/computeAUC.RR/computeIC50.RR/computeICn.Rman/computeAUC.Rdman/computeICn.Rdtests/testthat/test_computeAUC.Rtests/testthat/test_computeIC50.R
✅ Files skipped from review due to trivial changes (1)
- R/computeIC50.R
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/testthat/test_computeAUC.R
- man/computeAUC.Rd
- man/computeICn.Rd
- R/computeICn.R
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests