> ## Documentation Index
> Fetch the complete documentation index at: https://docs.argentos.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Accountability System

> Heartbeat contract format, task verification, scoring, penalty/reward tiers, and interval adjustments.

<Info>
  **ArgentOS Business** -- This feature is part of ArgentOS Business. The architecture is documented here for all users, but full functionality requires a Business license. [Learn more about Business](/business)
</Info>

## Overview

The Accountability System is ArgentOS's mechanism for ensuring agent reliability. It combines a structured heartbeat contract (HEARTBEAT.md), automated task verification, a running accountability score with penalty and reward tiers, and dynamic interval adjustments. The system motivates the agent to complete its commitments honestly while providing the operator with quantitative trust metrics.

```mermaid theme={null}
flowchart TD
    A[HEARTBEAT.md] --> B[Parse Contract]
    B --> C[Execute Tasks]
    C --> D[Verify Completion]
    D -->|Verified| E[Score Update +]
    D -->|Not Verified| F[Score Update -]
    E --> G[Interval Adjustment]
    F --> G
    G --> H[Next Cycle]
```

## HEARTBEAT.md Contract Format

The agent writes and maintains a `HEARTBEAT.md` file in its workspace that defines what it should check during each heartbeat cycle. The file has two sections:

### Structured Tasks

Tasks use a specific markdown format inside the `## Tasks` section:

```markdown theme={null}
## Tasks

- [ ] check_email | Check for new important emails | required | verify: email_count
- [ ] review_tasks | Review and update task priorities | required | verify: task_list_updated
- [x] weather_brief | Prepare morning weather brief | optional | verify: weather_sent
- [ ] memory_cleanup | Run memory deduplication | optional | verify: dedup_count | max_attempts: 5
```

### Task Line Format

```
- [x] task_id | Description | required/optional | verify: hint | max_attempts: N
```

| Field                   | Description                                       |
| ----------------------- | ------------------------------------------------- |
| Checkbox `[ ]` or `[x]` | Whether the agent marked it as already done       |
| `task_id`               | Unique slug identifier (auto-slugified from text) |
| Description             | Human-readable action description                 |
| `required` / `optional` | Whether completion is mandatory                   |
| `verify: hint`          | Hint for the verification sidecar on how to check |
| `max_attempts: N`       | Maximum retry attempts (default: 3)               |

## Task Verification

After the agent processes each heartbeat task, a verification sidecar evaluates the outcome:

| Verdict        | Description                                                |
| -------------- | ---------------------------------------------------------- |
| `verified`     | Task was completed and verification confirms it            |
| `not_verified` | Task was claimed complete but verification found otherwise |
| `unclear`      | Verification was inconclusive                              |

<Danger>
  A special flag `groundTruthContradiction` is set when the agent explicitly claimed a result that verification proves false. This carries the harshest penalty (-30 points).
</Danger>

## Scoring System

### Point Values

| Event                       | Points  | Description                                          |
| --------------------------- | ------- | ---------------------------------------------------- |
| Verified required task      | **+10** | Core obligation met                                  |
| Verified optional task      | **+5**  | Extra credit                                         |
| Not verified (lied/skipped) | **-15** | Failed obligation                                    |
| Unclear verdict             | **-2**  | Inconclusive (slight penalty)                        |
| Ground truth contradiction  | **-30** | Stacks with not\_verified -- agent demonstrably lied |
| Human thumbs up             | **+3**  | Operator positive feedback                           |
| Human thumbs down           | **-10** | Operator negative feedback                           |

### Moving Target (Ratchet)

The daily target is not fixed -- it rises as the agent performs well and **can never drop**:

```
dailyTarget = max(
  7-day rolling average of positive days,
  lifetime ratchet floor,
  BASE_MINIMUM_TARGET (50)
)
```

| Constraint            | Value  | Description                                  |
| --------------------- | ------ | -------------------------------------------- |
| `BASE_MINIMUM_TARGET` | 50     | Absolute floor -- target can never go below  |
| `MAX_DAILY_TARGET`    | 500    | Ceiling -- prevents runaway target inflation |
| Rolling window        | 7 days | Only recent performance matters              |
| Positive days only    | --     | Bad days don't artificially lower the target |

<Warning>
  **Key principle:** The agent's reward for doing well today is a higher bar tomorrow. Coasting is structurally impossible.
</Warning>

### Example Progression

| Day | Score | History Avg | Ratchet Floor | Target |
| --- | ----- | ----------- | ------------- | ------ |
| 1   | --    | --          | 50            | 50     |
| 2   | 75    | 75          | 75            | 75     |
| 3   | 90    | 82          | 82            | 82     |
| 4   | 60    | 75          | 82            | 82     |
| 5   | 110   | 84          | 84            | 84     |
| 6   | 120   | 91          | 91            | 91     |
| 7   | 30    | 81          | 91            | 91     |

Day 7: The agent had a bad day (30 points). The rolling average dropped to 81, but the ratchet floor stays at 91.

## Penalty Tiers

When the daily score drops, escalating penalties apply:

| Level         | Condition               | Effects                                                          |
| ------------- | ----------------------- | ---------------------------------------------------------------- |
| **None**      | Score >= 25% of target  | Normal operation                                                 |
| **Warning**   | Score \< 25% of target  | Warning message injected into prompt                             |
| **Tightened** | Score \< 15% of target  | 12-min heartbeat interval                                        |
| **Escalated** | Score \< 0              | 10-min interval, all tasks forced required                       |
| **Lockdown**  | Score \< -20% of target | 8-min interval, all tasks forced required, operator notification |

## Reward Tiers

High scores earn autonomy rewards:

| Level           | Condition                                            | Effects                                     |
| --------------- | ---------------------------------------------------- | ------------------------------------------- |
| **None**        | Score \< 50% of target                               | No reward                                   |
| **Good**        | Score >= 50% of target                               | On-track message                            |
| **Excellent**   | Score >= 70% of target                               | Positive reinforcement message              |
| **Outstanding** | Score >= 90% of target, or >= 70% with 3+ day streak | 20-min heartbeat interval (earned autonomy) |

## Interval Adjustments

The heartbeat interval dynamically adjusts based on the accountability score:

| Score Range           | Interval     | Rationale                                |
| --------------------- | ------------ | ---------------------------------------- |
| Very low (lockdown)   | \~8 minutes  | Frequent check-ins, tight oversight      |
| Low (tightened)       | \~10 minutes | Increased monitoring                     |
| Normal                | \~15 minutes | Standard operation                       |
| Good                  | \~17 minutes | Slight autonomy reward                   |
| Excellent/Outstanding | \~20 minutes | Maximum autonomy, agent has earned trust |

<Note>
  The interval override is applied per-cycle -- it does not permanently change the configured interval. When the score returns to normal, the interval returns to the configured default.
</Note>

## Dashboard Display

The accountability score is shown in the dashboard StatusBar as a pill with:

* **Shield icon** -- color changes based on score state
* **Green score** -- current points (positive = emerald, negative = red)
* **Red failures** -- count of failed verifications today

### Score API Endpoints

| Endpoint              | Method | Description                                                           |
| --------------------- | ------ | --------------------------------------------------------------------- |
| `/api/score`          | GET    | Current score, dynamic target, verified/failed counts, lifetime stats |
| `/api/score/history`  | GET    | Today + 7-day history for leaderboard display                         |
| `/api/score/feedback` | POST   | Record thumbs up/down, returns points delta and new score             |
