perf(pmt_afterpulses): two-pass numba scheme for photon_afterpulse#385
Open
DonNabla wants to merge 2 commits into
Open
perf(pmt_afterpulses): two-pass numba scheme for photon_afterpulse#385DonNabla wants to merge 2 commits into
DonNabla wants to merge 2 commits into
Conversation
added 2 commits
May 15, 2026 05:33
Replace the per-element Python orchestration inside `photon_afterpulse`
(per-element numpy ops: scale rU0, fancy-index prob_ap, compute selection
mask, gather selected channels, argmin-based inverse CDF) with a two-pass
numba scheme:
Pass 1 — `_ap_select_kernel`: one tight loop over N rows that scales
rU0 in place by the per-element modifier (halving the DPE entries),
compares against the per-channel probability ceiling, and emits the
indices that pass the selection plus the running max for the > 0.5
warning check. Eliminates the per-element numpy intermediates.
Pass 2 — `_ap_kernel_nonuniform`: two register-resident O(K) scans per
selected photon (one over the delay-time CDF, one over the amplitude
CDF). Replaces the broadcast-+-argmin pattern that materialised an
(N_sel, K) intermediate per call. The amplitude CDF is sometimes 1D
and sometimes 2D — both shapes are passed unconditionally and the
kernel branches on a flag, because numba does not accept a parameter
whose rank varies per call.
All `rng.random` and `rng.uniform` draws stay in Python in the same
order, with the same argument shapes and the same conditional skips on
empty selections — so the PCG64 sequence is preserved exactly.
Bit-identical to vanilla on `pmt_afterpulses`: every per-row float op
(scalar division, halving, comparison, argmin) follows the same IEEE-754
sequence as the equivalent numpy expression.
Result on K40-class workloads: ~1.4-1.7x reduction in the PMTAfterPulses
plugin's wall time across ER backgrounds and Pb212.
pre-commit.ci runs hook environments under python3.14 by default. docformatter v1.7.7's transitive dependency `untokenize 0.1.1` uses `ast.Constant.s`, an attribute removed in python3.14, so the install fails when pre-commit tries to build it. Pin docformatter's hook to python3.12 until `untokenize` (or docformatter's dependency choice) catches up.
573a1c5 to
2303fb1
Compare
HenningSE
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the per-element Python orchestration in
PMTAfterPulses.photon_afterpulsewith a two-pass@numba.njitscheme.Bit-identical to the unmodified plugin (verified field-by-field on
~30 M
pmt_afterpulsesrows across multiple Geant4 simulations).What changes
photon_afterpulseloops over ~16 PMT element types per chunk. In theexisting code, the inner work for each element is a chain of NumPy
ops — in-place divide, fancy-indexed gather of
prob_ap,np.whereselection,
np.argminof a broadcast|cdf - r|— each allocatingintermediate arrays of size
N_photonsor(N_sel, K_bins). Onhigh-yield sources the orchestration around those NumPy calls
dominates the plugin's wall time.
The new design splits the inner work into two compiled kernels:
_ap_select_kernel— one tight loop overNrows that scalesrU0in place by the per-element modifier (halving the DPE rows),compares each row against its per-channel probability ceiling, and
emits the indices that pass the selection plus the running max of
the ceiling (for the existing
> 0.5warning check). Replaces fourseparate numpy passes with a single integer-emitting loop.
_ap_kernel_nonuniform— for the non-Uniform branch, tworegister-resident O(K) scans per selected photon (delay-time CDF
and amplitude CDF). Replaces the broadcast-+-
np.argminpatternthat materialised an
(N_sel, K_bins)intermediate per call.The amplitude CDF is sometimes 1D and sometimes 2D depending on the
element; numba does not accept a single parameter whose rank varies
per call, so both shapes are always passed and the kernel branches
on a flag.
All
rng.randomandrng.uniformdraws stay in Python, in the sameorder with the same argument shapes and the same conditional skip on
empty per-element selections — so the PCG64 sequence is preserved
exactly. The
continueonn_sel == 0must stay before the seconddraw or the random sequence diverges; a comment in the source flags
this.
Bit-identity verification
pmt_afterpulsesoutput was compared field-by-field against theunmodified plugin on
K40,Co60,Th232,Ra226, andPb212Geant4 simulations — ~30 M total rows, every field
(
channel,dpe,photon_gain,cluster_id,photon_type,time,endtime) byte-identical.Performance
Per-source
pmt_afterpulsestarget wall, before → after:Notes
fuse/plugins/pmt_and_daq/pmt_afterpulses.pyis touched.process (
cache=Truepersists across invocations).