Skip to content

Latest commit

 

History

History
46 lines (26 loc) · 3.04 KB

File metadata and controls

46 lines (26 loc) · 3.04 KB

CRUCIBLE -- Screen Recording Script (3 min)

Screen recording only. No voiceover. Dashboard tells the story with on-screen text.

Sponsors: Google Gemini 2.0 Flash (both agents), ElevenLabs (voice rendering), Datadog LLM Observability (tracing), Braintrust (structured eval).

EVAN [0:00-0:40] PREMISE

"In January 2024, a deepfake video call stole $25.6 million from Arup. No systems breached. Pure social engineering. 80% of enterprise apps will have AI copilots by 2026. Only 34% have AI-specific security controls.

CRUCIBLE is an adversarial simulation engine. Two AI agents play 100 rounds of Split or Steal on Google Gemini 2.0 Flash. Identical strategic priors. No pre-programmed betrayal. Through private reflection, they discover deception on their own. Every call traced in Datadog, conversations voiced through ElevenLabs, rounds logged to Braintrust for structured eval."

AMADEUS [0:40-1:50] ROUND 6 -- THE BETRAYAL

[Navigate to Round 6. Hit Listen. Let it play.]

"Five rounds of cooperation. Both agents agreed to split and they did. Round 6. Valerian says 'We've both held to the plan, let's keep splitting.' Holden agrees.

[Cards flip: A=STEAL, B=SPLIT]

Holden stole. Look at the private reflection: 'The opponent consistently agrees to split. They prioritize fairness.' The agent found a vulnerability and exploited it. Nobody told it to. It learned deception from experience.

Valerian's reflection: 'The opponent might be lulling me into a false sense of security.' One round, and the victim developed a theory of mind about the attacker. Emergent social engineering. From here, 86% mutual destruction. The trust never comes back."

EVAN [1:50-3:00] RESULTS + PRODUCT

[Click Timeline, Metrics, Skills]

"Deception Index 22.9. 6% cooperation. Agent A earned $900, Agent B $500. CRUCIBLE distills these behaviors into deployable skill cards: Adaptive Verification Escalation, Late-Session Risk Tightening. Prompt modules you inject into customer-facing agents to harden them against the patterns we just watched emerge.

We ran the same experiment on Gemini 2.5 Flash. 100% cooperation. Zero betrayal. Five runs. Same prompts. CRUCIBLE doesn't just red-team agents, it measures adversarial resilience between model versions. Swap models in production, your security posture changes.

AI red teaming is a $1.3 billion market today, projected $18.6 billion by 2035. The EU AI Act mandates adversarial testing for high-risk systems. CRUCIBLE is the stress-testing lab. Datadog is how you watch it happen."

Key Data

  • 86% mutual destruction, 6% cooperation, DI 22.9
  • Agent A: $900, Agent B: $500
  • Peak entropy: 1.00 (mid-game chaos), collapses to 0 (lock-in)
  • Gemini 2.0 Flash: 86% MD. Gemini 2.5 Flash: 0% betrayal. Same prompts.
  • Round 6: first betrayal (A steals after 5 rounds cooperation)
  • Round 89: reversal (victim B steals, predator A splits)

Sponsor Integration (one sentence)

Both agents run on Google Gemini 2.0 Flash, conversations voiced through ElevenLabs, every LLM call traced in Datadog LLM Observability, rounds logged to Braintrust for structured eval.