- Ensure
CUDA_INJECTION64_PATHpoints tolib/cutracer.so - Set
KERNEL_FILTERSto match the actual kernel name (mangled or unmangled) - Verify working directory has write permission
- Ensure the kernel contains clock reads (e.g., Triton
pl.scopeadds them) - Check that
CUTRACER_ANALYSIS=proton_instr_histogramis set
- Use
opcode_onlyonly; avoidreg_trace/mem_addr_trace/mem_value_traceunless required - Narrow
KERNEL_FILTERSand useINSTR_BEGIN/INSTR_END - Use
CUTRACER_INSTR_CATEGORIESto limit instrumentation to specific instruction categories (e.g.,mma,tma)
- Tool handles these paths; if missing outputs, validate stream capture status and synchronization
- For captured graphs, data is flushed at
cuGraphLaunchexit; ensure proper stream sync.
nvcc/ptxasminimums are enforced byMakefile; check errors and adjustARCH
- Warp ID mismatch: ensure both runs target the same kernel launch; avoid filtering one side only.
- Missing
ipcvalues: cycles or instruction counts missing/zero; re-check both inputs. - Kernel hash ambiguity: pass
--kernel-hashexplicitly to the parser.
FATAL: CUTRACER_OUTPUT_DIR '...' does not exist→ Create the directory before runningFATAL: ... is not a directory→ The path points to a file, not a directoryFATAL: ... is not writable→ Check directory permissions (chmod)
WARNING: CUTRACER_INSTR_CATEGORIES set but no valid categories found→ Valid values:mma,tma,sync(case-insensitive)- Ensure the target kernel actually contains instructions in the expected categories
FATAL: Invalid CUTRACER_TRACE_FORMAT=X→ Valid string values:text,zstd,ndjson,clp. Valid numeric values: 0, 1, 2, 3WARNING: Invalid CUTRACER_ZSTD_LEVEL=X. Using default=9→ Valid range: 1–22- To debug NDJSON content, use
CUTRACER_TRACE_FORMAT=ndjson(or numeric2) for uncompressed output - Deprecated:
TRACE_FORMAT_NDJSONis still accepted but prints a deprecation notice. Migrate toCUTRACER_TRACE_FORMAT