ArgentOS Business — This feature is part of ArgentOS Business. The architecture is documented here for all users, but full functionality requires a Business license. Learn more about Business
Overview
The Accountability System is ArgentOS’s mechanism for ensuring agent reliability. It combines a structured heartbeat contract (HEARTBEAT.md), automated task verification, a running accountability score with penalty and reward tiers, and dynamic interval adjustments. The system motivates the agent to complete its commitments honestly while providing the operator with quantitative trust metrics.
The agent writes and maintains a HEARTBEAT.md file in its workspace that defines what it should check during each heartbeat cycle. The file has two sections:
Structured Tasks
Tasks use a specific markdown format inside the ## Tasks section:
## Tasks
- [ ] check_email | Check for new important emails | required | verify: email_count
- [ ] review_tasks | Review and update task priorities | required | verify: task_list_updated
- [x] weather_brief | Prepare morning weather brief | optional | verify: weather_sent
- [ ] memory_cleanup | Run memory deduplication | optional | verify: dedup_count | max_attempts: 5
- [x] task_id | Description | required/optional | verify: hint | max_attempts: N
| Field | Description |
|---|
Checkbox [ ] or [x] | Whether the agent marked it as already done |
task_id | Unique slug identifier (auto-slugified from text) |
| Description | Human-readable action description |
required / optional | Whether completion is mandatory |
verify: hint | Hint for the verification sidecar on how to check |
max_attempts: N | Maximum retry attempts (default: 3) |
Task Verification
After the agent processes each heartbeat task, a verification sidecar evaluates the outcome:
| Verdict | Description |
|---|
verified | Task was completed and verification confirms it |
not_verified | Task was claimed complete but verification found otherwise |
unclear | Verification was inconclusive |
A special flag groundTruthContradiction is set when the agent explicitly claimed a result that verification proves false. This carries the harshest penalty (-30 points).
Scoring System
Point Values
| Event | Points | Description |
|---|
| Verified required task | +10 | Core obligation met |
| Verified optional task | +5 | Extra credit |
| Not verified (lied/skipped) | -15 | Failed obligation |
| Unclear verdict | -2 | Inconclusive (slight penalty) |
| Ground truth contradiction | -30 | Stacks with not_verified — agent demonstrably lied |
| Human thumbs up | +3 | Operator positive feedback |
| Human thumbs down | -10 | Operator negative feedback |
Moving Target (Ratchet)
The daily target is not fixed — it rises as the agent performs well and can never drop:
dailyTarget = max(
7-day rolling average of positive days,
lifetime ratchet floor,
BASE_MINIMUM_TARGET (50)
)
| Constraint | Value | Description |
|---|
BASE_MINIMUM_TARGET | 50 | Absolute floor — target can never go below |
MAX_DAILY_TARGET | 500 | Ceiling — prevents runaway target inflation |
| Rolling window | 7 days | Only recent performance matters |
| Positive days only | — | Bad days don’t artificially lower the target |
Key principle: The agent’s reward for doing well today is a higher bar tomorrow. Coasting is structurally impossible.
Example Progression
| Day | Score | History Avg | Ratchet Floor | Target |
|---|
| 1 | — | — | 50 | 50 |
| 2 | 75 | 75 | 75 | 75 |
| 3 | 90 | 82 | 82 | 82 |
| 4 | 60 | 75 | 82 | 82 |
| 5 | 110 | 84 | 84 | 84 |
| 6 | 120 | 91 | 91 | 91 |
| 7 | 30 | 81 | 91 | 91 |
Day 7: The agent had a bad day (30 points). The rolling average dropped to 81, but the ratchet floor stays at 91.
Penalty Tiers
When the daily score drops, escalating penalties apply:
| Level | Condition | Effects |
|---|
| None | Score >= 25% of target | Normal operation |
| Warning | Score < 25% of target | Warning message injected into prompt |
| Tightened | Score < 15% of target | 12-min heartbeat interval |
| Escalated | Score < 0 | 10-min interval, all tasks forced required |
| Lockdown | Score < -20% of target | 8-min interval, all tasks forced required, operator notification |
Reward Tiers
High scores earn autonomy rewards:
| Level | Condition | Effects |
|---|
| None | Score < 50% of target | No reward |
| Good | Score >= 50% of target | On-track message |
| Excellent | Score >= 70% of target | Positive reinforcement message |
| Outstanding | Score >= 90% of target, or >= 70% with 3+ day streak | 20-min heartbeat interval (earned autonomy) |
Interval Adjustments
The heartbeat interval dynamically adjusts based on the accountability score:
| Score Range | Interval | Rationale |
|---|
| Very low (lockdown) | ~8 minutes | Frequent check-ins, tight oversight |
| Low (tightened) | ~10 minutes | Increased monitoring |
| Normal | ~15 minutes | Standard operation |
| Good | ~17 minutes | Slight autonomy reward |
| Excellent/Outstanding | ~20 minutes | Maximum autonomy, agent has earned trust |
The interval override is applied per-cycle — it does not permanently change the configured interval. When the score returns to normal, the interval returns to the configured default.
Dashboard Display
The accountability score is shown in the dashboard StatusBar as a pill with:
- Shield icon — color changes based on score state
- Green score — current points (positive = emerald, negative = red)
- Red failures — count of failed verifications today
Score API Endpoints
| Endpoint | Method | Description |
|---|
/api/score | GET | Current score, dynamic target, verified/failed counts, lifetime stats |
/api/score/history | GET | Today + 7-day history for leaderboard display |
/api/score/feedback | POST | Record thumbs up/down, returns points delta and new score |