These are the exact harnesses behind ../docs/BENCHMARKS.md.
Run them on your board and submit your numbers (PR to ../RESULTS.md
or open a "Benchmark result" issue) so we build a cross-board baseline.
Needs the vendor GPU stack in place (
../install.sh vendor) for the GPU/Vulkan ones.
# GPU — GLES FBO ALU-loop shader, Mpix/s
gcc glbench.c -o glbench -lEGL -lGLESv2 -lgbm -ldl
LD_LIBRARY_PATH=/usr/local/lib ./glbench /dev/dri/renderD128 <loop> <frames>
# <loop> = ALU iterations per pixel (try 4 16 64 256), <frames> e.g. 300
# CPU — same math, OpenMP 8-core, for the comparison baseline
gcc -O3 -fopenmp -march=native cpubench.c -o cpubench -lm
./cpubench <loop> <frames>gcc vkprobe.c -o vkprobe -ldl
LD_LIBRARY_PATH=/usr/local/lib XDG_RUNTIME_DIR=/run/user/$(id -u) ./vkprobe
# expect: "PowerVR B-Series BXM-4-64 MC1", INTEGRATED_GPU, Vulkan 1.3, device-local MBProves the patched pvrsrvkm actually imports foreign dma-bufs and renders them.
# build lines are in each file header:
gcc dmabuf_render_test.c -o dmabuf_render_test $(pkg-config --cflags --libs gbm) -lvulkan -ldl
gcc dmabuf_foreign_test.c -o dmabuf_foreign_test $(pkg-config --cflags --libs libdrm gbm) -lvulkan
LD_LIBRARY_PATH=/usr/local/lib VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/img_icd.json ./dmabuf_render_test
LD_LIBRARY_PATH=/usr/local/lib VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/img_icd.json ./dmabuf_foreign_test
# expect: drmPrimeFDToHandle OK (not ENOSYS) + GPU readback matches (e.g. 65536/65536)
# foreign_test allocates from /dev/dma_heap/system to exercise the non-self-import path.gcc enctest_h264.c -o enctest_h264 -lva -lva-drm # adjust libs to your VAAPI setup
./enctest_h264sysbench --test=cpu --cpu-max-prime=20000 --num-threads=1 run # single-core
sysbench --test=cpu --cpu-max-prime=20000 --num-threads=8 run # all cores
sysbench --test=memory --memory-block-size=1M --num-threads=8 run
fio --name=r --rw=read --bs=1M --direct=1 --size=2G --filename=/tmp/fio.bin
fio --name=w --rw=randwrite --bs=4k --direct=1 --size=1G --filename=/tmp/fio.binGPU_BENCHMARK.md is the original raw GPU run for reference.
Methodology note: these are indicative single-board numbers; report your kernel,
DDK version (strings /usr/lib/libVK_IMG.so* | grep -m1 24.), board, and ambient temp.
x86_64-linux-gnu-gcc -O2 -static uatomic.c -o uatomic_x86 # cross-compile the x86 bench
FEXInterpreter ./uatomic_x86 30000000 0 # aligned (baseline)
FEXInterpreter ./uatomic_x86 30000000 2 # unaligned (<16B)
FEXInterpreter ./uatomic_x86 30000000 14 # split-lock (crosses 16B)
# STOCK upstream FEX on A733: aligned ~154 Mops, unaligned ~0.70 (≈190x slower, ≈1430 ns/op)
# — it SIGBUS-traps per op (no FEAT_LSE2/uscat). Config knobs do NOT change this.
# A local FEX codegen patch (Arm64.cpp backpatch fix) cuts unaligned to ~61 Mops (~2.5x).
# So your number depends on which FEX you run — report stock vs patched.x86_64-linux-gnu-gcc -O2 -static x87l.c -o x87l_x86 # fldl/faddl (64-bit on x87 stack)
# toggle via config file (env FEX_ vars are overridden by ~/.fex-emu/Config.json):
for r in 0 1; do printf '{"Config":{"X87ReducedPrecision":"%s"}}' $r > ~/.fex-emu/Config.json
FEXInterpreter ./x87l_x86 20000000; done
# X87RP=0 ~75 ns/iter -> X87RP=1 ~4 ns/iter = ~18x (only for 64-bit-on-x87; not true 80-bit long double)