Add C API wrapper for HNSW index operations by dario-fumarola · Pull Request #1 · dario-fumarola/graphann

dario-fumarola · 2026-02-20T15:16:45Z

Summary

add a stable C ABI in include/graphann/c_api.h
implement exception-safe C wrappers in src/c_api.cpp
add C smoke test (tests/test_c_api.c) and register it in CTest
add C example (examples/basic_c.c)
update CMake to enable C language targets and compile .c examples

C API surface

create/destroy index
default build/search params helpers
add vectors (with optional tags)
remove by label
search with optional match-all tags filter
stats retrieval
thread-local graphann_last_error() and graphann_free() for result buffers

Validation

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
ctest --test-dir build --output-on-failure

All tests passed, including c_api_smoke.

dario-fumarola · 2026-02-20T15:30:26Z

Added adaptive filtered-search on top of HNSW in commit 5668400.

What changed

SearchParams now supports filter-aware adaptation:
- adaptive_filter
- filter_adaptive_alpha
- max_filter_ef
- bridge_expansion_factor
- selectivity_samples
Query path now estimates filter selectivity online (bounded sampling over alive IDs), and scales ef as:
- ef' = clamp(ef / p^alpha, ef, max_filter_ef)
Filtered layer search now uses dual frontiers:
- pass frontier (nodes satisfying filter)
- bounded bridge frontier (filtered-out nodes for connectivity)
C API search params were extended to expose these knobs and defaults.
Added test: Adaptive filtered search improves recall on low-selectivity filters.
Added benchmark scenarios for low-selectivity filtering with recall counters:
- BM_FilteredSearchRare_Baseline
- BM_FilteredSearchRare_Adaptive

Validation

Full build + tests:
- cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
- cmake --build build -j
- ctest --test-dir build --output-on-failure
- result: 20/20 tests passed

Benchmark snapshot (synthetic low-selectivity, ef=20)

BM_FilteredSearchRare_Baseline/20:
- ~53.5 us, recall@10=0.0616
BM_FilteredSearchRare_Adaptive/20:
- ~257.9 us, recall@10=0.2273

Real-data sanity check (outside test suite)

Using UCI Letter Recognition (20,000 vectors, 16 dims), treating class Q as rare filter (~3.9%):

baseline (ef=20, adaptive off): recall@5=0.3765, avg_ms=0.09
adaptive (ef=20, adaptive on): recall@5=0.62, avg_ms=0.28

So adaptive mode trades extra query time for substantially improved filtered recall under sparse filters.

dario-fumarola · 2026-02-20T15:40:16Z

Added a latency-focused filtered-search optimization pass in 5e5baee.

What changed

1) Bitmap-aware sparse fallback (major latency win)

If a bitmap-filter allowlist is tiny, search now skips graph traversal and directly scans only allowed IDs.
Trigger condition (configurable):
- allowed_alive <= min(sparse_scan_max_candidates, sparse_scan_threshold_multiplier * max(k, ef))
This is enabled by default via SearchParams.sparse_filter_fallback = true.

2) Exact selectivity for bitmap filters

When searching with a bitmap filter, selectivity is computed from exact alive cardinality instead of sampling.
Avoids per-query sampling overhead and stabilizes adaptive behavior.

3) Adaptive gating

Adaptive expansion now only activates when selectivity is sufficiently low:
- selectivity <= adaptive_selectivity_threshold (default 0.2)
For high-selectivity filters, we avoid unnecessary ef expansion.

4) API additions

SearchParams gained:
- adaptive_selectivity_threshold
- sparse_filter_fallback
- sparse_scan_threshold_multiplier
- sparse_scan_max_candidates
C API search params were extended with matching fields.
Added bitmap-overload in C++ search path and updated C API tag-filter search to use it.

5) Tests

Added Index bitmap-filter search matches callback path
Added Sparse bitmap fallback returns exact top-k among allowed IDs
Existing adaptive-filter test retained (with sparse fallback disabled there to specifically exercise adaptive logic).

Validation

cmake --build build -j
ctest --test-dir build --output-on-failure
Result: 22/22 tests passed

Benchmarks (synthetic low-selectivity, ~0.6%, ef=20)

BM_FilteredSearchRare_Baseline/20 (adaptive off, sparse fallback off):
- ~53.5 us, recall@10=0.0615
BM_FilteredSearchRare_Adaptive/20 (adaptive on, sparse fallback off):
- ~259.1 us, recall@10=0.2273
BM_FilteredSearchRare_Auto/20 (defaults: adaptive + sparse fallback on):
- ~5.37 us, recall@10=1.0

Real-data sanity check (UCI Letter Recognition, 20k vectors, rare filter class `Q`, selectivity ~3.9%)

baseline (bitmap, adaptive off, sparse off): recall@5=0.3765, avg_ms=0.0925
adaptive (bitmap, adaptive on, sparse off): recall@5=0.6165, avg_ms=0.255
auto defaults (bitmap, adaptive+sparse on): recall@5=0.536, avg_ms=0.175

This keeps the high-recall adaptive mode available while making default behavior much faster on sparse filters.

dario-fumarola · 2026-02-20T22:03:55Z

Pushed 0184b3d (thread-local epoch scratch for visited tracking + query norm buffer reuse).

What changed

Replaced per-search unordered_set visited tracking with a thread-local epoch-mark array.
Reused per-thread scratch buffers for cosine query normalization.
Wired scratch usage through both construction-time and query-time layer traversals.

Validation

ctest --test-dir build --output-on-failure -> 22/22 passed.
Ran full benchmark sweep (3 reps) and a controlled baseline-vs-optimized A/B micro benchmark.

Controlled A/B micro benchmark (5 reps each, same load window)

Command:

--benchmark_filter='BM_Search_10k/100$|BM_FilteredSearch_10k/100$|BM_FilteredSearchRare_(Baseline|Adaptive|Auto)/20$' \
--benchmark_min_time=0.05s --benchmark_repetitions=5 --benchmark_report_aggregates_only=true

CPU mean deltas (lower is better):

BM_Search_10k/100: 228,368 ns -> 103,984 ns (-54.5%)
BM_FilteredSearch_10k/100: 1,218,576 ns -> 728,457 ns (-40.2%)
BM_FilteredSearchRare_Baseline/20: 59,565 ns -> 25,468 ns (-57.2%)
BM_FilteredSearchRare_Adaptive/20: 280,921 ns -> 120,818 ns (-57.0%)
BM_FilteredSearchRare_Auto/20: 5,091 ns -> 5,162 ns (+1.4%, effectively flat)

Recall behavior on rare-filter mode remained stable in console output (baseline/adaptive near previous values; auto remained recall@10=1).

dario-fumarola added 2 commits February 20, 2026 10:16

Add C API wrapper for index create/add/search/remove

4221f85

Add adaptive filtered HNSW search with selectivity scaling

5668400

Optimize filtered query latency with bitmap-aware fallback

5e5baee

Speed up graph traversal with thread-local epoch scratch

0184b3d

dario-fumarola added 3 commits February 20, 2026 17:11

Add comprehensive project README

74aa905

Harden insert rollback and optimize tag deletion

d7a1eb9

Add star-growth repo assets and benchmark pipeline

4940384

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C API wrapper for HNSW index operations#1

Add C API wrapper for HNSW index operations#1
dario-fumarola wants to merge 7 commits into
mainfrom
feat/c-api-wrapper

dario-fumarola commented Feb 20, 2026

Uh oh!

dario-fumarola commented Feb 20, 2026

Uh oh!

dario-fumarola commented Feb 20, 2026

Uh oh!

dario-fumarola commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dario-fumarola commented Feb 20, 2026

Summary

C API surface

Validation

Uh oh!

dario-fumarola commented Feb 20, 2026

What changed

Validation

Benchmark snapshot (synthetic low-selectivity, ef=20)

Real-data sanity check (outside test suite)

Uh oh!

dario-fumarola commented Feb 20, 2026

What changed

1) Bitmap-aware sparse fallback (major latency win)

2) Exact selectivity for bitmap filters

3) Adaptive gating

4) API additions

5) Tests

Validation

Benchmarks (synthetic low-selectivity, ~0.6%, ef=20)

Real-data sanity check (UCI Letter Recognition, 20k vectors, rare filter class Q, selectivity ~3.9%)

Uh oh!

dario-fumarola commented Feb 20, 2026

What changed

Validation

Controlled A/B micro benchmark (5 reps each, same load window)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Real-data sanity check (UCI Letter Recognition, 20k vectors, rare filter class `Q`, selectivity ~3.9%)