AI AgentsSecurityCode ReviewOpen Source

xFire: AI Agents Debate Your Code So You Don't Ship Vulnerabilities and Bugs

Security review is too important for a single opinion and too noisy to tolerate false positives. xFire uses multi-agent adversarial debate to fix both.

2026-03-023 min read|GitHub PyPI ClawHub

Security review is too important for a single opinion and too noisy to tolerate false positives. Three independent agents review code blindly, then cross-examine findings through a prosecution-defense-judge model.

The Problem: False Positives Drowning Real Issues

Existing static analysis tools cannot distinguish between vulnerabilities and intended features. A deployment script using subprocess.call() gets flagged as command injection. A cryptocurrency wallet handling private keys triggers false alerts. Developers receive reports with twelve false positives for every real finding, causing them to ignore the entire output.

Architecture: Seven Processing Stages

Context Building — Collects diffs, metadata, related files, git history, configs, and dependencies
Intent Inference — Establishes what the codebase is designed to do before review begins
Independent Blind Review — Claude, Codex, and Gemini analyze code separately
Finding Extraction — Normalizes outputs into structured findings with severity and CWE tags
Synthesis & Debate Routing — Clusters similar findings; cross-validated ones gain confidence
Adversarial Debate — Disputed findings enter prosecution-defense-judge proceedings
Verdict & Report — Judge's ruling feeds consensus algorithm; final report generated

Intent Inference: Teaching Context

Before agents review anything, the system builds a purpose profile using:

Heuristic layer (ten signal passes): dependency mapping (40+ mappings), directory structure patterns (15 patterns), security control detection (12 regex patterns), PR intent classification
LLM enrichment layer (optional): Claude Sonnet validates heuristics and extends capability recognition

This prevents flagging intended capabilities. A coding agent that reads/writes files and executes shell commands should be flagged only for leaking API keys or accepting untrusted instructions — not for its core functionality.

Debate Engine Structure

Role Assignment: Prosecutor (original finder), Defense (dissenter or missing agent), Judge (neutral third party)

Debate Flow:

Round 1: Prosecution presents evidence; defense counters
Defense concession check: if both agree, debate ends immediately
Round 2 (if disputed): Judge asks clarifying questions; both sides respond in parallel
Judge issues final ruling with position, confidence, and cited evidence

Silent Dissent Detection

When two agents find an issue and one doesn't, the system checks whether the third agent explicitly rejected it (informed dissent) versus simply never analyzing those files (ignorance). Informed dissent signals the finding needs debate.

Beyond Vulnerabilities: Dangerous Production Bugs

xFire hunts 16 categories of dangerous bugs traditional scanners miss:

Race conditions corrupting shared state
Destructive operations without safeguards
Resource exhaustion paths
Partial state updates leaving data inconsistent
Broken error recovery swallowing exceptions
Connection leaks under error paths

Why Three Agents Beat One

Blind spots cancel out — Different training data and architectures mean each model catches what others miss
Independent review prevents anchoring — Parallel isolation ensures genuine analysis, not herding
Adversarial debate forces evidence — Prosecution must cite specific files/lines; defense must produce counter-evidence; judges evaluate evidence quality
Cross-validation provides signal — Multiple agents identifying the same issue indicates real findings

xFire is open source on PyPI: github.com/Har1sh-k/xfire

All posts