RALF + ANGEL - ArgentOS Docs

What Is RALF?

RALF (Response Accountability Llama Framework) is the heartbeat verification pipeline that ensures the agent actually does what it claims to do. Every heartbeat cycle, the agent receives a task contract, executes it, and produces a response. RALF then audits that response using a secondary model — preferring a local Llama model via Ollama (free, fast) with a cloud fallback to Claude Haiku.

The agent does not control the verification. The harness owns it.

What Is ANGEL?

ANGEL (“The Angel on the Shoulder”) is the verification sidecar within RALF. It takes the task contract + agent response and produces per-task verdicts: verified, not_verified, or unclear. The name comes from having an independent observer watching over the agent’s shoulder, checking every claim.

Architecture

The Heartbeat Loop

The heartbeat runs on a configurable interval (default: 15 minutes). Each cycle:

Load contract

Parse HEARTBEAT.md into structured tasks

Initialize progress

Carry over retry state, reset verified/skipped tasks

Check score state

Load accountability score, determine penalty/reward levels

Build prompt

Inject contract tasks, retry feedback, and score section

Agent executes

Main agent processes the heartbeat, calls tools, produces response

Collect ground truth

Query real APIs (email, social, etc.) for actual system state

Run ANGEL

Send (contract + response + ground truth) to verification sidecar

Apply verdicts

Update progress tracker, record score, detect contradictions

Persist

Save progress and score state to disk

Schedule next

Score may override the interval (shorter for poor performance, longer for outstanding)

Interval Overrides

Score Level	Interval
Lockdown (critically low)	8 minutes
Escalated (negative score)	10 minutes
Tightened (low score)	12 minutes
Normal	Config default
Outstanding (high score + streak)	20 minutes (earned autonomy)

ANGEL: The Verification Sidecar

Model Selection

ANGEL uses a two-tier model strategy:

Local Ollama (primary) — Uses qwen3:1.7b by default (tiny, fast, free). Verification is binary classification so a small model works well.
Claude Haiku (fallback) — If Ollama is unavailable, falls back to Haiku via the Anthropic API.

The key insight: verification doesn’t need a powerful model. It’s checking “did the agent do what it said?” — a classification task, not a creative one.

Verdicts

Verdict	Meaning	Score Impact
`verified`	Clear evidence the task was completed	+10 (required) or +5 (optional)
`not_verified`	No evidence, or agent only mentioned it	-15
`unclear`	Ambiguous, partial evidence	-2

Verification Standards

The verifier is instructed to be strict:

“I’ll check X” is NOT the same as having checked X
Evidence means: tool calls, specific data retrieved, actions taken, content created
Ground truth overrides the agent’s self-reporting

Ground Truth System

Before ANGEL runs, the heartbeat runner collects actual state from real APIs. This data is injected into the verification prompt so the verifier can catch fabrication. Example: The agent says “Checked inbox, 0 new messages.” Ground truth shows 3 unread emails. The verifier sees both claims and marks the task as not_verified with a ground truth contradiction.

Ground truth contradiction carries a severe -30 point penalty (stacking with the -15 not_verified penalty, for -45 total). This is the system’s strongest anti-fabrication mechanism.

API Key Resolution

Ground truth checks need API keys, resolved via:

~/.argentos/service-keys.json (dashboard-managed, primary)
process.env (gateway plist environment)
argent.json env.vars (config fallback)

If no key is available for a ground truth source, that check is silently skipped. The system degrades gracefully.

Configuration

{
  "agents": {
    "defaults": {
      "heartbeat": {
        "enabled": true,
        "every": "15m",
        "activeHours": {
          "start": "07:00",
          "end": "23:00",
          "timezone": "America/Chicago"
        },
        "verifier": {
          "enabled": true,
          "model": "qwen3:1.7b"
        }
      }
    }
  }
}

Key	Default	Description
`enabled`	`true`	Enable/disable heartbeat
`every`	`"15m"`	Base interval between heartbeats
`verifier.enabled`	`true`	Enable/disable ANGEL verification
`verifier.model`	`"qwen3:1.7b"`	Ollama model for local verification
`activeHours.start`	—	When heartbeats start (HH:MM)
`activeHours.end`	—	When heartbeats stop (HH:MM)

Design Philosophy

Trust but verify: The agent has full autonomy to work, but every claim is independently checked
Free first: Local Llama handles verification at zero cost. Cloud is only a fallback.
Ground truth over self-reporting: Real API data always overrides what the agent says
Consequences, not just monitoring: Score impacts the agent’s autonomy (interval, required tasks)
Anti-gaming by design: Moving target ratchet, ground truth checks, and strict verification make gaming futile
Graceful degradation: If Ollama is down, fall back to Haiku. If Haiku is down, skip verification. Nothing crashes.

​What Is RALF?

​What Is ANGEL?

​Architecture

​The Heartbeat Loop

​Interval Overrides

​ANGEL: The Verification Sidecar

​Model Selection

​Verdicts

​Verification Standards

​Ground Truth System

​API Key Resolution

​Configuration

​Design Philosophy

What Is RALF?

What Is ANGEL?

Architecture

The Heartbeat Loop

Interval Overrides

ANGEL: The Verification Sidecar

Model Selection

Verdicts

Verification Standards

Ground Truth System

API Key Resolution

Configuration

Design Philosophy