Skip to main content

What Is RALF?

RALF (Response Accountability Llama Framework) is the heartbeat verification pipeline that ensures the agent actually does what it claims to do. Every heartbeat cycle, the agent receives a task contract, executes it, and produces a response. RALF then audits that response using a secondary model — preferring a local Llama model via Ollama (free, fast) with a cloud fallback to Claude Haiku.
The agent does not control the verification. The harness owns it.

What Is ANGEL?

ANGEL (“The Angel on the Shoulder”) is the verification sidecar within RALF. It takes the task contract + agent response and produces per-task verdicts: verified, not_verified, or unclear. The name comes from having an independent observer watching over the agent’s shoulder, checking every claim.

Architecture

The Heartbeat Loop

The heartbeat runs on a configurable interval (default: 15 minutes). Each cycle:
1

Load contract

Parse HEARTBEAT.md into structured tasks
2

Initialize progress

Carry over retry state, reset verified/skipped tasks
3

Check score state

Load accountability score, determine penalty/reward levels
4

Build prompt

Inject contract tasks, retry feedback, and score section
5

Agent executes

Main agent processes the heartbeat, calls tools, produces response
6

Collect ground truth

Query real APIs (email, social, etc.) for actual system state
7

Run ANGEL

Send (contract + response + ground truth) to verification sidecar
8

Apply verdicts

Update progress tracker, record score, detect contradictions
9

Persist

Save progress and score state to disk
10

Schedule next

Score may override the interval (shorter for poor performance, longer for outstanding)

Interval Overrides

Score LevelInterval
Lockdown (critically low)8 minutes
Escalated (negative score)10 minutes
Tightened (low score)12 minutes
NormalConfig default
Outstanding (high score + streak)20 minutes (earned autonomy)

ANGEL: The Verification Sidecar

Model Selection

ANGEL uses a two-tier model strategy:
  1. Local Ollama (primary) — Uses qwen3:1.7b by default (tiny, fast, free). Verification is binary classification so a small model works well.
  2. Claude Haiku (fallback) — If Ollama is unavailable, falls back to Haiku via the Anthropic API.
The key insight: verification doesn’t need a powerful model. It’s checking “did the agent do what it said?” — a classification task, not a creative one.

Verdicts

VerdictMeaningScore Impact
verifiedClear evidence the task was completed+10 (required) or +5 (optional)
not_verifiedNo evidence, or agent only mentioned it-15
unclearAmbiguous, partial evidence-2

Verification Standards

The verifier is instructed to be strict:
  • “I’ll check X” is NOT the same as having checked X
  • Evidence means: tool calls, specific data retrieved, actions taken, content created
  • Ground truth overrides the agent’s self-reporting

Ground Truth System

Before ANGEL runs, the heartbeat runner collects actual state from real APIs. This data is injected into the verification prompt so the verifier can catch fabrication. Example: The agent says “Checked inbox, 0 new messages.” Ground truth shows 3 unread emails. The verifier sees both claims and marks the task as not_verified with a ground truth contradiction.
Ground truth contradiction carries a severe -30 point penalty (stacking with the -15 not_verified penalty, for -45 total). This is the system’s strongest anti-fabrication mechanism.

API Key Resolution

Ground truth checks need API keys, resolved via:
  1. ~/.argentos/service-keys.json (dashboard-managed, primary)
  2. process.env (gateway plist environment)
  3. argent.json env.vars (config fallback)
If no key is available for a ground truth source, that check is silently skipped. The system degrades gracefully.

Configuration

{
  "agents": {
    "defaults": {
      "heartbeat": {
        "enabled": true,
        "every": "15m",
        "activeHours": {
          "start": "07:00",
          "end": "23:00",
          "timezone": "America/Chicago"
        },
        "verifier": {
          "enabled": true,
          "model": "qwen3:1.7b"
        }
      }
    }
  }
}
KeyDefaultDescription
enabledtrueEnable/disable heartbeat
every"15m"Base interval between heartbeats
verifier.enabledtrueEnable/disable ANGEL verification
verifier.model"qwen3:1.7b"Ollama model for local verification
activeHours.startWhen heartbeats start (HH:MM)
activeHours.endWhen heartbeats stop (HH:MM)

Design Philosophy

  1. Trust but verify: The agent has full autonomy to work, but every claim is independently checked
  2. Free first: Local Llama handles verification at zero cost. Cloud is only a fallback.
  3. Ground truth over self-reporting: Real API data always overrides what the agent says
  4. Consequences, not just monitoring: Score impacts the agent’s autonomy (interval, required tasks)
  5. Anti-gaming by design: Moving target ratchet, ground truth checks, and strict verification make gaming futile
  6. Graceful degradation: If Ollama is down, fall back to Haiku. If Haiku is down, skip verification. Nothing crashes.