Skip to content

perf(pmt_afterpulses): two-pass numba scheme for photon_afterpulse#385

Open
DonNabla wants to merge 2 commits into
mainfrom
pr/pmt-afterpulses
Open

perf(pmt_afterpulses): two-pass numba scheme for photon_afterpulse#385
DonNabla wants to merge 2 commits into
mainfrom
pr/pmt-afterpulses

Conversation

@DonNabla

Copy link
Copy Markdown

Summary

Replaces the per-element Python orchestration in
PMTAfterPulses.photon_afterpulse with a two-pass @numba.njit scheme.
Bit-identical to the unmodified plugin (verified field-by-field on
~30 M pmt_afterpulses rows across multiple Geant4 simulations).

What changes

photon_afterpulse loops over ~16 PMT element types per chunk. In the
existing code, the inner work for each element is a chain of NumPy
ops — in-place divide, fancy-indexed gather of prob_ap, np.where
selection, np.argmin of a broadcast |cdf - r| — each allocating
intermediate arrays of size N_photons or (N_sel, K_bins). On
high-yield sources the orchestration around those NumPy calls
dominates the plugin's wall time.

The new design splits the inner work into two compiled kernels:

  • _ap_select_kernel — one tight loop over N rows that scales
    rU0 in place by the per-element modifier (halving the DPE rows),
    compares each row against its per-channel probability ceiling, and
    emits the indices that pass the selection plus the running max of
    the ceiling (for the existing > 0.5 warning check). Replaces four
    separate numpy passes with a single integer-emitting loop.
  • _ap_kernel_nonuniform — for the non-Uniform branch, two
    register-resident O(K) scans per selected photon (delay-time CDF
    and amplitude CDF). Replaces the broadcast-+-np.argmin pattern
    that materialised an (N_sel, K_bins) intermediate per call.

The amplitude CDF is sometimes 1D and sometimes 2D depending on the
element; numba does not accept a single parameter whose rank varies
per call, so both shapes are always passed and the kernel branches
on a flag.

All rng.random and rng.uniform draws stay in Python, in the same
order with the same argument shapes and the same conditional skip on
empty per-element selections — so the PCG64 sequence is preserved
exactly. The continue on n_sel == 0 must stay before the second
draw or the random sequence diverges; a comment in the source flags
this.

Bit-identity verification

pmt_afterpulses output was compared field-by-field against the
unmodified plugin on K40, Co60, Th232, Ra226, and Pb212
Geant4 simulations — ~30 M total rows, every field
(channel, dpe, photon_gain, cluster_id, photon_type,
time, endtime) byte-identical.

Performance

Per-source pmt_afterpulses target wall, before → after:

source vanilla (s) this PR (s) speedup
K40 49.7 30.3 1.64×
Co60 650.1 380.3 1.71×
Th232 28.7 17.1 1.68×
Ra226 35.9 21.6 1.66×
Pb212 1046.4 771.2 1.36×

Notes

  • Single-file diff: only
    fuse/plugins/pmt_and_daq/pmt_afterpulses.py is touched.
  • First-call JIT compile of the two new kernels is one-time per
    process (cache=True persists across invocations).

Maxime added 2 commits May 15, 2026 05:33
Replace the per-element Python orchestration inside `photon_afterpulse`
(per-element numpy ops: scale rU0, fancy-index prob_ap, compute selection
mask, gather selected channels, argmin-based inverse CDF) with a two-pass
numba scheme:

  Pass 1 — `_ap_select_kernel`: one tight loop over N rows that scales
    rU0 in place by the per-element modifier (halving the DPE entries),
    compares against the per-channel probability ceiling, and emits the
    indices that pass the selection plus the running max for the > 0.5
    warning check. Eliminates the per-element numpy intermediates.

  Pass 2 — `_ap_kernel_nonuniform`: two register-resident O(K) scans per
    selected photon (one over the delay-time CDF, one over the amplitude
    CDF). Replaces the broadcast-+-argmin pattern that materialised an
    (N_sel, K) intermediate per call. The amplitude CDF is sometimes 1D
    and sometimes 2D — both shapes are passed unconditionally and the
    kernel branches on a flag, because numba does not accept a parameter
    whose rank varies per call.

All `rng.random` and `rng.uniform` draws stay in Python in the same
order, with the same argument shapes and the same conditional skips on
empty selections — so the PCG64 sequence is preserved exactly.

Bit-identical to vanilla on `pmt_afterpulses`: every per-row float op
(scalar division, halving, comparison, argmin) follows the same IEEE-754
sequence as the equivalent numpy expression.

Result on K40-class workloads: ~1.4-1.7x reduction in the PMTAfterPulses
plugin's wall time across ER backgrounds and Pb212.
pre-commit.ci runs hook environments under python3.14 by default.
docformatter v1.7.7's transitive dependency `untokenize 0.1.1` uses
`ast.Constant.s`, an attribute removed in python3.14, so the install
fails when pre-commit tries to build it. Pin docformatter's hook to
python3.12 until `untokenize` (or docformatter's dependency choice)
catches up.
@DonNabla DonNabla force-pushed the pr/pmt-afterpulses branch from 573a1c5 to 2303fb1 Compare May 15, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants