What Is RALF?
RALF (Response Accountability Llama Framework) is the heartbeat verification pipeline that ensures the agent actually does what it claims to do. Every heartbeat cycle, the agent receives a task contract, executes it, and produces a response. RALF then audits that response using a secondary model — preferring a local Llama model via Ollama (free, fast) with a cloud fallback to Claude Haiku.The agent does not control the verification. The harness owns it.
What Is ANGEL?
ANGEL (“The Angel on the Shoulder”) is the verification sidecar within RALF. It takes the task contract + agent response and produces per-task verdicts:verified, not_verified, or unclear. The name comes from having an independent observer watching over the agent’s shoulder, checking every claim.
Architecture
The Heartbeat Loop
The heartbeat runs on a configurable interval (default: 15 minutes). Each cycle:Interval Overrides
| Score Level | Interval |
|---|---|
| Lockdown (critically low) | 8 minutes |
| Escalated (negative score) | 10 minutes |
| Tightened (low score) | 12 minutes |
| Normal | Config default |
| Outstanding (high score + streak) | 20 minutes (earned autonomy) |
ANGEL: The Verification Sidecar
Model Selection
ANGEL uses a two-tier model strategy:- Local Ollama (primary) — Uses
qwen3:1.7bby default (tiny, fast, free). Verification is binary classification so a small model works well. - Claude Haiku (fallback) — If Ollama is unavailable, falls back to Haiku via the Anthropic API.
Verdicts
| Verdict | Meaning | Score Impact |
|---|---|---|
verified | Clear evidence the task was completed | +10 (required) or +5 (optional) |
not_verified | No evidence, or agent only mentioned it | -15 |
unclear | Ambiguous, partial evidence | -2 |
Verification Standards
The verifier is instructed to be strict:- “I’ll check X” is NOT the same as having checked X
- Evidence means: tool calls, specific data retrieved, actions taken, content created
- Ground truth overrides the agent’s self-reporting
Ground Truth System
Before ANGEL runs, the heartbeat runner collects actual state from real APIs. This data is injected into the verification prompt so the verifier can catch fabrication. Example: The agent says “Checked inbox, 0 new messages.” Ground truth shows 3 unread emails. The verifier sees both claims and marks the task asnot_verified with a ground truth contradiction.
Ground truth contradiction carries a severe -30 point penalty (stacking with the -15 not_verified penalty, for -45 total). This is the system’s strongest anti-fabrication mechanism.
API Key Resolution
Ground truth checks need API keys, resolved via:~/.argentos/service-keys.json(dashboard-managed, primary)process.env(gateway plist environment)argent.json env.vars(config fallback)
If no key is available for a ground truth source, that check is silently skipped. The system degrades gracefully.
Configuration
| Key | Default | Description |
|---|---|---|
enabled | true | Enable/disable heartbeat |
every | "15m" | Base interval between heartbeats |
verifier.enabled | true | Enable/disable ANGEL verification |
verifier.model | "qwen3:1.7b" | Ollama model for local verification |
activeHours.start | — | When heartbeats start (HH:MM) |
activeHours.end | — | When heartbeats stop (HH:MM) |
Design Philosophy
- Trust but verify: The agent has full autonomy to work, but every claim is independently checked
- Free first: Local Llama handles verification at zero cost. Cloud is only a fallback.
- Ground truth over self-reporting: Real API data always overrides what the agent says
- Consequences, not just monitoring: Score impacts the agent’s autonomy (interval, required tasks)
- Anti-gaming by design: Moving target ratchet, ground truth checks, and strict verification make gaming futile
- Graceful degradation: If Ollama is down, fall back to Haiku. If Haiku is down, skip verification. Nothing crashes.
