Skip to content

Commit ad7df79

Browse files
Add CSF/PV exclusion to detection + reconcile fallback SD multiplier (#114)
Change A — CSF / partial-volume exclusion (posterior-fossa false positives): - The FSL FAST CSF PVE map (*_csf_prob.nii.gz) was computed but never used. Posterior-fossa CSF pulsation/inflow (4th ventricle, basal cisterns) is the dominant false-positive source for brainstem FLAIR. - New build_csf_exclusion_mask() builds a binary CSF mask (PVE > CSF_PVE_THRESHOLD) ONCE per subject in FLAIR space (region-independent → no per-region resampling). - New apply_csf_pv_exclusion() erodes each region mask by a partial-volume band (PV_EROSION_MM) and subtracts the CSF mask before z-scoring/GMM, using safe_fslmaths and logging voxels excluded per region. - Wired into apply_per_region_gmm_analysis() with a post-exclusion voxel guard. - Gated by CSF_EXCLUSION_ENABLED (default true); pass-through when disabled or when the CSF map is missing. - New config: CSF_EXCLUSION_ENABLED, CSF_PVE_THRESHOLD, PV_EROSION_MM. Change B — single authoritative fallback SD multiplier: - gmm_threshold.py DEFAULTS['fallback_threshold'] 1.5 -> 1.2 to match config THRESHOLD_WM_SD_MULTIPLIER; corrected the "must match config" comment. - analysis.sh: replaced inline 1.25 (analyze_region_modality, talairach report) and 1.5 GMM-path fallbacks with 1.2 to match the config authoritative value. - Added pytest test_fallback_threshold_matches_config asserting the Python default equals config THRESHOLD_WM_SD_MULTIPLIER so they can't silently drift. Also moved connectivity-weighting SD multipliers (2.0/1.5) into config vars CONNECTIVITY_HIGH_SD_MULT / CONNECTIVITY_CONNECTED_SD_MULT. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 87c2185 commit ad7df79

4 files changed

Lines changed: 235 additions & 14 deletions

File tree

config/default_config.sh

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,26 @@ export REG_LABEL_INTERPOLATION="GenericLabel"
243243
export THRESHOLD_WM_SD_MULTIPLIER=1.2 # SD multiplier from local norm; used by GMM fallback + legacy path
244244
export MIN_HYPERINTENSITY_SIZE=3 # Minimum cluster size in voxels (FSL cluster --minextent)
245245

246+
# ---------------------------------------------------------------------------
247+
# CSF / partial-volume exclusion (posterior-fossa false-positive reduction)
248+
# ---------------------------------------------------------------------------
249+
# Posterior-fossa CSF pulsation/inflow around the 4th ventricle and basal
250+
# cisterns is the dominant FALSE-POSITIVE source for brainstem FLAIR. FSL FAST
251+
# already produces a CSF PVE map (fast_pve_0 -> *_csf_prob.nii.gz) that was
252+
# previously computed but never used for detection. When enabled, the per-region
253+
# GMM/z-score path removes high-CSF-probability voxels and the CSF-parenchyma
254+
# partial-volume boundary band from each region mask BEFORE z-scoring/GMM.
255+
export CSF_EXCLUSION_ENABLED=true # Master switch; false = legacy behaviour
256+
export CSF_PVE_THRESHOLD=0.5 # Voxels with CSF PVE > this are excluded from detection
257+
export PV_EROSION_MM=1 # Erode region mask by this many mm to drop CSF-parenchyma PV boundary
258+
259+
# Connectivity-weighting SD multipliers (apply_connectivity_weighting in analysis.sh).
260+
# A voxel is kept if it is connected to a hyperintense seed AND above
261+
# CONNECTIVITY_CONNECTED_SD_MULT, OR is very high intensity (above
262+
# CONNECTIVITY_HIGH_SD_MULT) regardless of connectivity.
263+
export CONNECTIVITY_HIGH_SD_MULT=2.0 # mean + this*std: standalone "very high" threshold
264+
export CONNECTIVITY_CONNECTED_SD_MULT=1.5 # mean + this*std: lower threshold for connected voxels
265+
246266
# ---------------------------------------------------------------------------
247267
# GMM per-region thresholding (gmm_threshold.py --help for full docs)
248268
# ---------------------------------------------------------------------------

src/modules/analysis.sh

Lines changed: 172 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -612,7 +612,8 @@ apply_gaussian_mixture_thresholding() {
612612

613613
if [ "$mask_volume" -lt "${GMM_MIN_VOXELS:-20}" ]; then
614614
log_formatted "ERROR" "Region mask too small ($mask_volume voxels) for meaningful GMM analysis"
615-
local bash_fallback="${GMM_FALLBACK_THRESHOLD:-${THRESHOLD_WM_SD_MULTIPLIER:-1.5}}"
615+
# Literal must equal config THRESHOLD_WM_SD_MULTIPLIER (single authoritative fallback)
616+
local bash_fallback="${GMM_FALLBACK_THRESHOLD:-${THRESHOLD_WM_SD_MULTIPLIER:-1.2}}"
616617
fslmaths "$zscore_image" -mas "$region_mask" -thr "$bash_fallback" -bin "$output_mask"
617618
return 1
618619
fi
@@ -647,7 +648,7 @@ apply_gaussian_mixture_thresholding() {
647648
--moderate-weight-sd "${GMM_MODERATE_WEIGHT_SD:-2.0}" \
648649
--floor-percentile "${GMM_FLOOR_PERCENTILE:-95}" \
649650
--fallback-percentile "${GMM_FALLBACK_PERCENTILE:-97.5}" \
650-
--fallback-threshold "${GMM_FALLBACK_THRESHOLD:-${THRESHOLD_WM_SD_MULTIPLIER:-1.5}}" \
651+
--fallback-threshold "${GMM_FALLBACK_THRESHOLD:-${THRESHOLD_WM_SD_MULTIPLIER:-1.2}}" \
651652
2>"${gmm_temp_dir}/gmm_stderr.log")
652653
local gmm_exit=$?
653654

@@ -683,7 +684,7 @@ apply_gaussian_mixture_thresholding() {
683684
# Validate threshold
684685
if [ -z "$threshold" ] || ! echo "$threshold" | grep -E '^[0-9]+\.?[0-9]*$' >/dev/null; then
685686
log_formatted "WARNING" "Invalid threshold value '$threshold', using fallback"
686-
threshold="${GMM_FALLBACK_THRESHOLD:-${THRESHOLD_WM_SD_MULTIPLIER:-1.5}}"
687+
threshold="${GMM_FALLBACK_THRESHOLD:-${THRESHOLD_WM_SD_MULTIPLIER:-1.2}}"
687688
fi
688689

689690
if [ "$gmm_failed" = "true" ]; then
@@ -738,25 +739,162 @@ apply_gaussian_mixture_thresholding() {
738739

739740

740741

742+
# Build a binary CSF exclusion mask (in the target/FLAIR space) from the FSL FAST
743+
# CSF PVE map. The CSF map is region-independent, so this is computed ONCE per
744+
# subject and reused for every region (avoids re-resampling the whole-brain PVE
745+
# map per region). Posterior-fossa CSF (4th ventricle, basal cisterns) is the
746+
# dominant FALSE-POSITIVE source for brainstem FLAIR.
747+
#
748+
# Usage: build_csf_exclusion_mask <csf_prob_map> <reference_image> <output_mask>
749+
# Returns 0 on success (output written), 1 if no usable exclusion mask could be
750+
# built (output not guaranteed to exist; caller must handle).
751+
build_csf_exclusion_mask() {
752+
local csf_prob="$1"
753+
local reference_image="$2"
754+
local output_mask="$3"
755+
756+
log_message "Building CSF exclusion mask from FAST CSF PVE map..."
757+
758+
if [ -z "$csf_prob" ] || [ ! -f "$csf_prob" ]; then
759+
log_formatted "WARNING" "CSF PVE map not available ($csf_prob); CSF subtraction will be skipped"
760+
return 1
761+
fi
762+
763+
local csf_pve_threshold="${CSF_PVE_THRESHOLD:-0.5}"
764+
765+
local work_dir
766+
work_dir=$(mktemp -d)
767+
768+
# CSF PVE map may be in a different space than the reference; resample if so.
769+
local csf_resampled="$csf_prob"
770+
local ref_dims csf_dims
771+
ref_dims=$(fslinfo "$reference_image" | grep -E "^dim[123]" | awk '{print $2}' | tr '\n' 'x' | sed 's/x$//')
772+
csf_dims=$(fslinfo "$csf_prob" | grep -E "^dim[123]" | awk '{print $2}' | tr '\n' 'x' | sed 's/x$//')
773+
if [ "$ref_dims" != "$csf_dims" ]; then
774+
csf_resampled="${work_dir}/csf_resampled.nii.gz"
775+
log_message "Resampling CSF PVE map to reference space..."
776+
if ! flirt -in "$csf_prob" -ref "$reference_image" -out "$csf_resampled" -interp trilinear -dof 6; then
777+
log_formatted "WARNING" "CSF PVE resampling failed; CSF subtraction will be skipped"
778+
rm -rf "$work_dir"
779+
return 1
780+
fi
781+
fi
782+
783+
# CSF voxels: PVE > threshold.
784+
if ! safe_fslmaths "Build CSF exclusion mask" \
785+
"$csf_resampled" -thr "$csf_pve_threshold" -bin "$output_mask"; then
786+
log_formatted "WARNING" "Failed to threshold CSF PVE map; CSF subtraction will be skipped"
787+
rm -rf "$work_dir"
788+
return 1
789+
fi
790+
791+
rm -rf "$work_dir"
792+
log_message "✓ CSF exclusion mask built (PVE > ${csf_pve_threshold})"
793+
return 0
794+
}
795+
796+
# Function to remove CSF and CSF-parenchyma partial-volume voxels from a region
797+
# mask before z-scoring/GMM. Excluding posterior-fossa CSF materially reduces
798+
# spurious brainstem-FLAIR detections. Gated by CSF_EXCLUSION_ENABLED.
799+
#
800+
# Usage: apply_csf_pv_exclusion <region_mask_in> <csf_exclusion_mask> <region_mask_out> [<region_name>]
801+
# <csf_exclusion_mask> is the pre-built binary CSF mask (build_csf_exclusion_mask)
802+
# in the SAME space as <region_mask_in>; pass "" to skip CSF subtraction and
803+
# only apply partial-volume erosion.
804+
# Returns 0 and writes the cleaned mask to <region_mask_out>. On any failure (or
805+
# when disabled) it copies the input through unchanged so the caller always has a
806+
# usable mask.
807+
apply_csf_pv_exclusion() {
808+
local region_mask_in="$1"
809+
local csf_exclusion_mask="$2"
810+
local region_mask_out="$3"
811+
local region_name="${4:-region}"
812+
813+
log_message "Applying CSF / partial-volume exclusion for $region_name..."
814+
815+
# Feature gate: when disabled, pass the mask through unchanged.
816+
if [ "${CSF_EXCLUSION_ENABLED:-true}" != "true" ]; then
817+
log_message "CSF/PV exclusion disabled (CSF_EXCLUSION_ENABLED=${CSF_EXCLUSION_ENABLED:-true}); using region mask unchanged"
818+
cp "$region_mask_in" "$region_mask_out"
819+
return 0
820+
fi
821+
822+
local pv_erosion_mm="${PV_EROSION_MM:-1}"
823+
824+
local voxels_before
825+
voxels_before=$(fslstats "$region_mask_in" -V | awk '{print $1}')
826+
# Coerce to a safe integer so downstream arithmetic/comparisons can't abort
827+
[[ "$voxels_before" =~ ^[0-9]+$ ]] || voxels_before=0
828+
829+
local work_dir
830+
work_dir=$(mktemp -d)
831+
local working_mask="${work_dir}/region_pv_eroded.nii.gz"
832+
833+
# 1) Erode the region mask by the partial-volume band (mm) to drop
834+
# CSF-parenchyma boundary voxels. -kernel sphere uses mm radius.
835+
if ! safe_fslmaths "Erode $region_name region mask by PV band" \
836+
"$region_mask_in" -kernel sphere "$pv_erosion_mm" -ero "$working_mask"; then
837+
log_formatted "WARNING" "PV-band erosion failed for $region_name; using un-eroded mask"
838+
cp "$region_mask_in" "$working_mask"
839+
fi
840+
841+
# 2) Subtract the pre-built CSF exclusion mask (same space as region mask).
842+
if [ -n "$csf_exclusion_mask" ] && [ -f "$csf_exclusion_mask" ]; then
843+
if ! safe_fslmaths "Subtract CSF from $region_name region mask" \
844+
"$working_mask" -sub "$csf_exclusion_mask" -thr 0 -bin "$region_mask_out"; then
845+
log_formatted "WARNING" "CSF subtraction failed for $region_name; using PV-eroded mask only"
846+
cp "$working_mask" "$region_mask_out"
847+
fi
848+
else
849+
log_message "No CSF exclusion mask for $region_name; applying PV erosion only"
850+
cp "$working_mask" "$region_mask_out"
851+
fi
852+
853+
rm -rf "$work_dir"
854+
855+
local voxels_after
856+
voxels_after=$(fslstats "$region_mask_out" -V | awk '{print $1}')
857+
[[ "$voxels_after" =~ ^[0-9]+$ ]] || voxels_after=0
858+
local excluded=$(( voxels_before - voxels_after ))
859+
log_message "✓ CSF/PV exclusion for $region_name: ${voxels_before}${voxels_after} voxels (${excluded} excluded)"
860+
861+
return 0
862+
}
863+
741864
# Function to apply per-region GMM analysis to all atlas regions
742865
apply_per_region_gmm_analysis() {
743866
local flair_image="$1"
744867
local -n regions_ref=$2
745868
local temp_dir="$3"
746869
local out_prefix="$4"
747-
870+
871+
# CSF PVE map produced by FSL FAST in detect_hyperintensities() (fast_pve_0).
872+
# Used to exclude CSF / partial-volume voxels from each region before GMM.
873+
local csf_prob="${out_prefix}_csf_prob.nii.gz"
874+
748875
log_message "Applying per-region GMM analysis to ${#regions_ref[@]} atlas regions..."
749-
876+
750877
# Create PERMANENT per-region analysis directory for debugging
751878
local per_region_dir="${RESULTS_DIR}/per_region_analysis"
752879
mkdir -p "$per_region_dir"
753-
880+
754881
# Store results for each region
755882
local region_results=()
756883
local combined_result="${per_region_dir}/atlas_gmm_combined.nii.gz"
757-
884+
758885
# Initialize combined result as zeros
759886
fslmaths "$flair_image" -mul 0 "$combined_result"
887+
888+
# Build the CSF exclusion mask ONCE (FLAIR space) since it is region-independent.
889+
# Region masks are resampled to FLAIR space below, so this mask aligns with all
890+
# of them. Empty string => CSF subtraction is skipped (PV erosion still applies).
891+
local csf_exclusion_mask=""
892+
if [ "${CSF_EXCLUSION_ENABLED:-true}" = "true" ]; then
893+
local csf_exclusion_candidate="${per_region_dir}/csf_exclusion_mask.nii.gz"
894+
if build_csf_exclusion_mask "$csf_prob" "$flair_image" "$csf_exclusion_candidate"; then
895+
csf_exclusion_mask="$csf_exclusion_candidate"
896+
fi
897+
fi
760898

761899
# Create or find brain mask for filtering segmentation masks
762900
local brain_mask=""
@@ -873,7 +1011,22 @@ apply_per_region_gmm_analysis() {
8731011

8741012
# Use brain-masked region for all subsequent analysis
8751013
region_resampled="$region_brain_masked"
876-
1014+
1015+
# Remove CSF and CSF-parenchyma partial-volume voxels (posterior-fossa
1016+
# false-positive reduction) BEFORE z-scoring/GMM. No-op when disabled.
1017+
local region_csf_excluded="${region_work_dir}/${region_base}_csf_excluded.nii.gz"
1018+
if apply_csf_pv_exclusion "$region_resampled" "$csf_exclusion_mask" "$region_csf_excluded" "$region_base"; then
1019+
local csf_excluded_voxels
1020+
csf_excluded_voxels=$(fslstats "$region_csf_excluded" -V | awk '{print $1}')
1021+
# Coerce to a safe integer so the -lt comparison can't error out
1022+
[[ "$csf_excluded_voxels" =~ ^[0-9]+$ ]] || csf_excluded_voxels=0
1023+
if [ "$csf_excluded_voxels" -lt 50 ]; then
1024+
log_formatted "WARNING" "$region_base has insufficient voxels ($csf_excluded_voxels) after CSF/PV exclusion - skipping"
1025+
continue
1026+
fi
1027+
region_resampled="$region_csf_excluded"
1028+
fi
1029+
8771030
# Perform region-specific z-score normalization
8781031
local region_zscore="${region_work_dir}/${region_base}_zscore.nii.gz"
8791032
log_message "Normalizing FLAIR intensities for $region_base using region-specific statistics..."
@@ -993,9 +1146,11 @@ apply_connectivity_weighting() {
9931146
local region_mean=$(echo "$region_stats" | awk '{print $1}')
9941147
local region_std=$(echo "$region_stats" | awk '{print $2}')
9951148

996-
# Calculate adaptive thresholds
997-
local base_threshold=$(echo "$region_mean + 2.0 * $region_std" | bc -l)
998-
local connected_threshold=$(echo "$region_mean + 1.5 * $region_std" | bc -l)
1149+
# Calculate adaptive thresholds (SD multipliers are configurable)
1150+
local high_sd_mult="${CONNECTIVITY_HIGH_SD_MULT:-2.0}"
1151+
local connected_sd_mult="${CONNECTIVITY_CONNECTED_SD_MULT:-1.5}"
1152+
local base_threshold=$(echo "$region_mean + $high_sd_mult * $region_std" | bc -l)
1153+
local connected_threshold=$(echo "$region_mean + $connected_sd_mult * $region_std" | bc -l)
9991154

10001155
# Validate thresholds are valid numbers
10011156
if ! echo "$base_threshold" | grep -E '^-?[0-9]+\.?[0-9]*$' >/dev/null; then
@@ -1741,7 +1896,9 @@ analyze_region_modality() {
17411896
log_message "$region ${modality} statistics - Mean: $region_mean, StdDev: $region_std"
17421897

17431898
# Apply modality-specific thresholding with different multipliers for T1 vs FLAIR
1744-
local base_threshold_multiplier="${THRESHOLD_WM_SD_MULTIPLIER:-1.25}"
1899+
# Fallback literal must equal config THRESHOLD_WM_SD_MULTIPLIER (the single
1900+
# authoritative fallback) so the legacy path agrees with the GMM path.
1901+
local base_threshold_multiplier="${THRESHOLD_WM_SD_MULTIPLIER:-1.2}"
17451902
local threshold_multiplier
17461903
local threshold_val
17471904
local abnormality_mask="${region_output_dir}/${region}_${modality}_${intensity_type}intensities.nii.gz"
@@ -2047,7 +2204,7 @@ analyze_talairach_hyperintensities() {
20472204

20482205
echo ""
20492206
echo "Analysis Parameters:"
2050-
echo " Threshold multiplier: ${THRESHOLD_WM_SD_MULTIPLIER:-1.25}"
2207+
echo " Threshold multiplier: ${THRESHOLD_WM_SD_MULTIPLIER:-1.2}"
20512208
echo " Minimum cluster size: ${MIN_HYPERINTENSITY_SIZE:-4} voxels"
20522209
echo " FLAIR thresholding: mean + multiplier × std (above threshold)"
20532210
echo " T1 thresholding: mean - multiplier × std (below threshold)"
@@ -2918,6 +3075,8 @@ EOF
29183075
export -f detect_hyperintensities
29193076
export -f create_supratentorial_mask
29203077
export -f find_all_atlas_regions
3078+
export -f build_csf_exclusion_mask
3079+
export -f apply_csf_pv_exclusion
29213080
export -f apply_per_region_gmm_analysis
29223081
export -f normalize_flair_brainstem_zscore
29233082
export -f apply_gaussian_mixture_thresholding

src/modules/gmm_threshold.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,12 @@
4747
# ---------------------------------------------------------------------------
4848
# Defaults — must match config/default_config.sh GMM_* variables
4949
# ---------------------------------------------------------------------------
50+
# The fallback threshold has ONE authoritative source: config/default_config.sh
51+
# THRESHOLD_WM_SD_MULTIPLIER (1.2). In the pipeline, analysis.sh always passes
52+
# this value explicitly via --fallback-threshold, so the literal below is only a
53+
# last-resort default for standalone invocation. It MUST equal that config
54+
# value; tests/test_gmm_threshold.py enforces this so the two can't silently
55+
# drift apart again.
5056
DEFAULTS = {
5157
"max_components": 3,
5258
"min_voxels": 20,
@@ -59,7 +65,7 @@
5965
"moderate_weight_sd": 2.0,
6066
"floor_percentile": 95.0,
6167
"fallback_percentile": 97.5,
62-
"fallback_threshold": 1.5,
68+
"fallback_threshold": 1.2,
6369
}
6470

6571

tests/test_gmm_threshold.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
"""
1212

1313
import os
14+
import re
1415
import subprocess
1516
import sys
1617

@@ -72,6 +73,28 @@ def default_cfg(**overrides):
7273
return cfg
7374

7475

76+
def read_config_threshold_wm_sd_multiplier():
77+
"""Parse THRESHOLD_WM_SD_MULTIPLIER from config/default_config.sh.
78+
79+
Parsed with a regex (not by sourcing bash) so the test has no FSL/ANTs
80+
dependency. Returns the value as a float.
81+
"""
82+
config_path = os.path.join(
83+
os.path.dirname(__file__), "..", "config", "default_config.sh"
84+
)
85+
with open(config_path, encoding="utf-8") as handle:
86+
for line in handle:
87+
match = re.match(
88+
r"^\s*export\s+THRESHOLD_WM_SD_MULTIPLIER=([0-9]+(?:\.[0-9]+)?)",
89+
line,
90+
)
91+
if match:
92+
return float(match.group(1))
93+
raise AssertionError(
94+
"THRESHOLD_WM_SD_MULTIPLIER not found in config/default_config.sh"
95+
)
96+
97+
7598
# ---------------------------------------------------------------------------
7699
# extract_values
77100
# ---------------------------------------------------------------------------
@@ -442,6 +465,19 @@ def test_percentiles_in_range(self):
442465
for key in ["floor_percentile", "fallback_percentile"]:
443466
assert 0 < DEFAULTS[key] < 100, f"{key} should be in (0, 100)"
444467

468+
def test_fallback_threshold_matches_config(self):
469+
"""The Python fallback default MUST equal config THRESHOLD_WM_SD_MULTIPLIER.
470+
471+
There is one authoritative fallback SD multiplier. This guards against
472+
the two values silently drifting apart (the historical 1.5 vs 1.2 bug).
473+
"""
474+
config_value = read_config_threshold_wm_sd_multiplier()
475+
assert DEFAULTS["fallback_threshold"] == pytest.approx(config_value), (
476+
f"gmm_threshold.py DEFAULTS['fallback_threshold']="
477+
f"{DEFAULTS['fallback_threshold']} must equal config "
478+
f"THRESHOLD_WM_SD_MULTIPLIER={config_value}"
479+
)
480+
445481

446482
# ---------------------------------------------------------------------------
447483
# Standalone runner

0 commit comments

Comments
 (0)