Skip to main content
ArgentOS Business — This feature is part of ArgentOS Business. The architecture is documented here for all users, but full functionality requires a Business license. Learn more about Business

Overview

The Accountability System is ArgentOS’s mechanism for ensuring agent reliability. It combines a structured heartbeat contract (HEARTBEAT.md), automated task verification, a running accountability score with penalty and reward tiers, and dynamic interval adjustments. The system motivates the agent to complete its commitments honestly while providing the operator with quantitative trust metrics.

HEARTBEAT.md Contract Format

The agent writes and maintains a HEARTBEAT.md file in its workspace that defines what it should check during each heartbeat cycle. The file has two sections:

Structured Tasks

Tasks use a specific markdown format inside the ## Tasks section:
## Tasks

- [ ] check_email | Check for new important emails | required | verify: email_count
- [ ] review_tasks | Review and update task priorities | required | verify: task_list_updated
- [x] weather_brief | Prepare morning weather brief | optional | verify: weather_sent
- [ ] memory_cleanup | Run memory deduplication | optional | verify: dedup_count | max_attempts: 5

Task Line Format

- [x] task_id | Description | required/optional | verify: hint | max_attempts: N
FieldDescription
Checkbox [ ] or [x]Whether the agent marked it as already done
task_idUnique slug identifier (auto-slugified from text)
DescriptionHuman-readable action description
required / optionalWhether completion is mandatory
verify: hintHint for the verification sidecar on how to check
max_attempts: NMaximum retry attempts (default: 3)

Task Verification

After the agent processes each heartbeat task, a verification sidecar evaluates the outcome:
VerdictDescription
verifiedTask was completed and verification confirms it
not_verifiedTask was claimed complete but verification found otherwise
unclearVerification was inconclusive
A special flag groundTruthContradiction is set when the agent explicitly claimed a result that verification proves false. This carries the harshest penalty (-30 points).

Scoring System

Point Values

EventPointsDescription
Verified required task+10Core obligation met
Verified optional task+5Extra credit
Not verified (lied/skipped)-15Failed obligation
Unclear verdict-2Inconclusive (slight penalty)
Ground truth contradiction-30Stacks with not_verified — agent demonstrably lied
Human thumbs up+3Operator positive feedback
Human thumbs down-10Operator negative feedback

Moving Target (Ratchet)

The daily target is not fixed — it rises as the agent performs well and can never drop:
dailyTarget = max(
  7-day rolling average of positive days,
  lifetime ratchet floor,
  BASE_MINIMUM_TARGET (50)
)
ConstraintValueDescription
BASE_MINIMUM_TARGET50Absolute floor — target can never go below
MAX_DAILY_TARGET500Ceiling — prevents runaway target inflation
Rolling window7 daysOnly recent performance matters
Positive days onlyBad days don’t artificially lower the target
Key principle: The agent’s reward for doing well today is a higher bar tomorrow. Coasting is structurally impossible.

Example Progression

DayScoreHistory AvgRatchet FloorTarget
15050
275757575
390828282
460758282
5110848484
6120919191
730819191
Day 7: The agent had a bad day (30 points). The rolling average dropped to 81, but the ratchet floor stays at 91.

Penalty Tiers

When the daily score drops, escalating penalties apply:
LevelConditionEffects
NoneScore >= 25% of targetNormal operation
WarningScore < 25% of targetWarning message injected into prompt
TightenedScore < 15% of target12-min heartbeat interval
EscalatedScore < 010-min interval, all tasks forced required
LockdownScore < -20% of target8-min interval, all tasks forced required, operator notification

Reward Tiers

High scores earn autonomy rewards:
LevelConditionEffects
NoneScore < 50% of targetNo reward
GoodScore >= 50% of targetOn-track message
ExcellentScore >= 70% of targetPositive reinforcement message
OutstandingScore >= 90% of target, or >= 70% with 3+ day streak20-min heartbeat interval (earned autonomy)

Interval Adjustments

The heartbeat interval dynamically adjusts based on the accountability score:
Score RangeIntervalRationale
Very low (lockdown)~8 minutesFrequent check-ins, tight oversight
Low (tightened)~10 minutesIncreased monitoring
Normal~15 minutesStandard operation
Good~17 minutesSlight autonomy reward
Excellent/Outstanding~20 minutesMaximum autonomy, agent has earned trust
The interval override is applied per-cycle — it does not permanently change the configured interval. When the score returns to normal, the interval returns to the configured default.

Dashboard Display

The accountability score is shown in the dashboard StatusBar as a pill with:
  • Shield icon — color changes based on score state
  • Green score — current points (positive = emerald, negative = red)
  • Red failures — count of failed verifications today

Score API Endpoints

EndpointMethodDescription
/api/scoreGETCurrent score, dynamic target, verified/failed counts, lifetime stats
/api/score/historyGETToday + 7-day history for leaderboard display
/api/score/feedbackPOSTRecord thumbs up/down, returns points delta and new score