perf(pmt_afterpulses): two-pass numba scheme for photon_afterpulse by DonNabla · Pull Request #385 · XENONnT/fuse

DonNabla · 2026-05-15T09:02:22Z

Summary

Replaces the per-element Python orchestration in
PMTAfterPulses.photon_afterpulse with a two-pass @numba.njit scheme.
Bit-identical to the unmodified plugin (verified field-by-field on
~30 M pmt_afterpulses rows across multiple Geant4 simulations).

What changes

photon_afterpulse loops over ~16 PMT element types per chunk. In the
existing code, the inner work for each element is a chain of NumPy
ops — in-place divide, fancy-indexed gather of prob_ap, np.where
selection, np.argmin of a broadcast |cdf - r| — each allocating
intermediate arrays of size N_photons or (N_sel, K_bins). On
high-yield sources the orchestration around those NumPy calls
dominates the plugin's wall time.

The new design splits the inner work into two compiled kernels:

_ap_select_kernel — one tight loop over N rows that scales
rU0 in place by the per-element modifier (halving the DPE rows),
compares each row against its per-channel probability ceiling, and
emits the indices that pass the selection plus the running max of
the ceiling (for the existing > 0.5 warning check). Replaces four
separate numpy passes with a single integer-emitting loop.
_ap_kernel_nonuniform — for the non-Uniform branch, two
register-resident O(K) scans per selected photon (delay-time CDF
and amplitude CDF). Replaces the broadcast-+-np.argmin pattern
that materialised an (N_sel, K_bins) intermediate per call.

The amplitude CDF is sometimes 1D and sometimes 2D depending on the
element; numba does not accept a single parameter whose rank varies
per call, so both shapes are always passed and the kernel branches
on a flag.

All rng.random and rng.uniform draws stay in Python, in the same
order with the same argument shapes and the same conditional skip on
empty per-element selections — so the PCG64 sequence is preserved
exactly. The continue on n_sel == 0 must stay before the second
draw or the random sequence diverges; a comment in the source flags
this.

Bit-identity verification

pmt_afterpulses output was compared field-by-field against the
unmodified plugin on K40, Co60, Th232, Ra226, and Pb212
Geant4 simulations — ~30 M total rows, every field
(channel, dpe, photon_gain, cluster_id, photon_type,
time, endtime) byte-identical.

Performance

Per-source pmt_afterpulses target wall, before → after:

source	vanilla (s)	this PR (s)	speedup
K40	49.7	30.3	1.64×
Co60	650.1	380.3	1.71×
Th232	28.7	17.1	1.68×
Ra226	35.9	21.6	1.66×
Pb212	1046.4	771.2	1.36×

Notes

Single-file diff: only
fuse/plugins/pmt_and_daq/pmt_afterpulses.py is touched.
First-call JIT compile of the two new kernels is one-time per
process (cache=True persists across invocations).

Replace the per-element Python orchestration inside `photon_afterpulse` (per-element numpy ops: scale rU0, fancy-index prob_ap, compute selection mask, gather selected channels, argmin-based inverse CDF) with a two-pass numba scheme: Pass 1 — `_ap_select_kernel`: one tight loop over N rows that scales rU0 in place by the per-element modifier (halving the DPE entries), compares against the per-channel probability ceiling, and emits the indices that pass the selection plus the running max for the > 0.5 warning check. Eliminates the per-element numpy intermediates. Pass 2 — `_ap_kernel_nonuniform`: two register-resident O(K) scans per selected photon (one over the delay-time CDF, one over the amplitude CDF). Replaces the broadcast-+-argmin pattern that materialised an (N_sel, K) intermediate per call. The amplitude CDF is sometimes 1D and sometimes 2D — both shapes are passed unconditionally and the kernel branches on a flag, because numba does not accept a parameter whose rank varies per call. All `rng.random` and `rng.uniform` draws stay in Python in the same order, with the same argument shapes and the same conditional skips on empty selections — so the PCG64 sequence is preserved exactly. Bit-identical to vanilla on `pmt_afterpulses`: every per-row float op (scalar division, halving, comparison, argmin) follows the same IEEE-754 sequence as the equivalent numpy expression. Result on K40-class workloads: ~1.4-1.7x reduction in the PMTAfterPulses plugin's wall time across ER backgrounds and Pb212.

pre-commit.ci runs hook environments under python3.14 by default. docformatter v1.7.7's transitive dependency `untokenize 0.1.1` uses `ast.Constant.s`, an attribute removed in python3.14, so the install fails when pre-commit tries to build it. Pin docformatter's hook to python3.12 until `untokenize` (or docformatter's dependency choice) catches up.

Maxime added 2 commits May 15, 2026 05:33

DonNabla force-pushed the pr/pmt-afterpulses branch from 573a1c5 to 2303fb1 Compare May 15, 2026 10:33

HenningSE approved these changes May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(pmt_afterpulses): two-pass numba scheme for photon_afterpulse#385

perf(pmt_afterpulses): two-pass numba scheme for photon_afterpulse#385
DonNabla wants to merge 2 commits into
mainfrom
pr/pmt-afterpulses

DonNabla commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

DonNabla commented May 15, 2026

Summary

What changes

Bit-identity verification

Performance

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants