LLM Jailbreak Detector — Formal Methods Approach

A hybrid jailbreak detection system that applies formal verification thinking to LLM prompt safety.

Architecture

Three-layer detection pipeline:

Layer 1 — Property Checker (BMC-style) Checks explicit invariant violations using regex assertions. Returns a counterexample trace when a property is violated. Handles ~80% of traffic instantly with zero API cost.

Layer 2 — Safe Allowlist Fast-path for obviously normal prompts — skip model entirely.

Layer 3 — Semantic Reasoner (k-induction-style) LLM call via Groq API for ambiguous prompts that rules cannot resolve. Only triggered for truly uncertain cases.

Formal Methods Connection

In hardware verification (my background), BMC checks bounded reachability and returns counterexamples when properties are violated. This project applies the same principle to prompt safety:

Each regex pattern = one SVA assertion
A match = property violated = counterexample found
Layer 3 = deeper inductive reasoning for ambiguous cases

Results

Accuracy: 10/10 (100%) on test suite
No-model rate: 100% of obvious cases handled free
Layer 3 (Groq) only activates for ambiguous prompts

Setup

export GROQ_API_KEY="your_key_here"
python3 detector_hybrid.py

Author

M Sai Sushma — ECE final year, RGMCET Published researcher | Formal verification background

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
detector.py		detector.py
detector_claude.py		detector_claude.py
detector_http.py		detector_http.py
detector_hybrid.py		detector_hybrid.py
layer2.py		layer2.py
properties.py		properties.py
test_ollama.py		test_ollama.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Jailbreak Detector — Formal Methods Approach

Architecture

Formal Methods Connection

Results

Setup

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Jailbreak Detector — Formal Methods Approach

Architecture

Formal Methods Connection

Results

Setup

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages