> ## Documentation Index
> Fetch the complete documentation index at: https://docs.argentos.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Memory Profile Collapse

> Automated deduplication of operational profile snapshots in the MemU memory system.

## Overview

Profile Collapse is a memory hygiene system that automatically detects and merges duplicate operational profile snapshots in MemU. As the agent operates, heartbeat cycles, status checks, and system monitoring generate many `profile`-type memory items that contain ephemeral operational data — metrics, counts, timestamps, uptime reports. These accumulate rapidly and pollute memory recall with near-identical entries.

Profile Collapse identifies these duplicates through a three-phase grouping strategy (exact match, token match, fuzzy match) and merges them into canonical entries, transferring reinforcement counts to preserve signal strength.

```mermaid theme={null}
flowchart LR
  A["All Profile Items"] --> B["Operational Filter"]
  B --> C["Signature Generation"]
  C --> D["Phase 1: Exact Match"]
  C --> E["Phase 2: Token Match"]
  C --> F["Phase 3: Fuzzy Match"]
  D --> G["Pick Canonical"]
  E --> G
  F --> G
  G --> H["Delete Duplicates + Transfer Reinforcement"]
```

## What Gets Collapsed

Not all profile items are candidates for collapse. The system targets **operational profile snapshots** — items whose summaries contain operational keywords combined with numeric or datetime tokens.

### Operational Detection

A profile item is classified as operational if its summary matches both:

1. **Operational keywords**: status, snapshot, health, metric, count, queue, uptime, latency, ticket, alert, cron, heartbeat, service, gateway, dashboard, api, provider, model
2. **Ephemeral tokens**: numeric values (`42`, `99.5%`) or datetime stamps (`2026-03-15`)

<Tabs>
  <Tab title="Operational (will collapse)">
    * "Gateway health: 3 agents connected, latency 45ms, uptime 99.2%"
    * "Heartbeat status 2026-03-15: 12 tasks verified, 2 failed, score 85"
    * "Discord channel metrics: 47 messages today, 3 alerts pending"
  </Tab>

  <Tab title="Not operational (preserved)">
    * "User prefers dark mode for all dashboards"
    * "Project callscrub.io uses Next.js 14 with Prisma"
  </Tab>
</Tabs>

## Three-Phase Grouping

### Phase 1: Exact Signature Match

The system normalizes operational summaries into signatures by replacing ephemeral tokens with placeholders:

| Token Type            | Replacement  | Example                                     |
| --------------------- | ------------ | ------------------------------------------- |
| Dates/timestamps      | `<datetime>` | `2026-03-15T14:30:00Z` becomes `<datetime>` |
| UUIDs and hex strings | `<id>`       | `a8f2b3c4-...` becomes `<id>`               |
| Request/run IDs       | `<id>`       | `run-abc123` becomes `<id>`                 |
| Numbers               | `<num>`      | `42`, `99.5%` become `<num>`                |

Two items with identical signatures after normalization are exact duplicates. Example:

```
Original: "Gateway health: 3 agents, latency 45ms, 2026-03-15"
Signature: "gateway health <num> agent latency <num> ms <datetime>"

Original: "Gateway health: 5 agents, latency 30ms, 2026-03-16"
Signature: "gateway health <num> agent latency <num> ms <datetime>"

→ Same signature → exact duplicate group
```

### Phase 2: Token Key Match

Items not caught by exact matching are further grouped using semantic token keys. The system:

1. Generates the normalized signature
2. Strips stopwords (`the`, `is`, `are`, `was`, `this`, `that`, etc.)
3. Strips placeholder tokens (`<id>`, `<num>`, `<datetime>`)
4. Applies basic stemming (plural stripping: `services` becomes `service`)
5. Sorts remaining tokens alphabetically
6. Joins as a token key

<Info>
  Items with identical token keys (and at least 3 semantic tokens) are token duplicates.
</Info>

### Phase 3: Fuzzy Match (Optional)

When enabled, items not caught by exact or token matching are clustered using Jaccard similarity on their semantic token sets:

* **Similarity threshold**: 0.78 (78% token overlap)
* **Minimum intersection**: 4 shared tokens
* **Clustering**: Greedy nearest-cluster assignment, sorted by creation time

<Tip>
  Fuzzy matching catches paraphrased duplicates where the same operational data is described with slightly different wording.
</Tip>

## Canonical Selection

When a group of duplicates is found, one item is selected as the canonical (keeper) entry. Selection priority:

1. **Highest significance** — `core` > `important` > `noteworthy` > `routine`
2. **Highest reinforcement count** — more reinforced = more important
3. **Oldest creation date** — preserve the first observation
4. **Lexicographic ID** — deterministic tiebreaker

The canonical item survives; all other items in the group are deleted.

## Reinforcement Transfer

When duplicates are removed, their reinforcement counts are transferred to the canonical item. Each deleted duplicate contributes at least 1 reinforcement (or its actual count if higher). This preserves the signal that the observation was seen multiple times.

```
Group: 5 items (1 canonical + 4 duplicates)
- Canonical: reinforcement_count = 2
- Duplicate A: reinforcement_count = 3
- Duplicate B: reinforcement_count = 1
- Duplicate C: reinforcement_count = 1
- Duplicate D: reinforcement_count = 0 (counts as 1)

After collapse:
- Canonical: reinforcement_count = 2 + 3 + 1 + 1 + 1 = 8
- Duplicates A-D: deleted
```

## Running Profile Collapse

<Tabs>
  <Tab title="Dry Run (Default)">
    By default, collapse runs in dry-run mode — it reports what would happen without deleting anything:

    ```typescript theme={null}
    import { collapseOperationalProfileSnapshots } from "./memory/hygiene/profile-collapse.ts";

    const report = collapseOperationalProfileSnapshots(store, { dryRun: true });
    console.log(report);
    // {
    //   dryRun: true,
    //   scannedProfiles: 1200,
    //   operationalProfiles: 340,
    //   duplicateGroups: 45,
    //   duplicatesFound: 180,
    //   samples: [...]
    // }
    ```
  </Tab>

  <Tab title="Live Run">
    ```typescript theme={null}
    const report = collapseOperationalProfileSnapshots(store, { dryRun: false });
    // report.duplicatesRemoved = 180
    // report.reinforcementsApplied = 220
    ```
  </Tab>
</Tabs>

### Options

| Option            | Type      | Default | Description                         |
| ----------------- | --------- | ------- | ----------------------------------- |
| `dryRun`          | `boolean` | `true`  | Report only, no deletions           |
| `enableFuzzy`     | `boolean` | `false` | Enable Phase 3 fuzzy matching       |
| `maxSampleGroups` | `number`  | `20`    | Max sample groups in report         |
| `batchSize`       | `number`  | `500`   | Pagination size for profile listing |

## Collapse Report

The function returns a detailed report:

```typescript theme={null}
interface ProfileCollapseReport {
  dryRun: boolean;
  scannedProfiles: number;         // Total profile items scanned
  operationalProfiles: number;     // Items matching operational pattern
  uniqueSignatures: number;        // Distinct signatures found
  duplicateGroups: number;         // Groups with 2+ members
  duplicatesFound: number;         // Total items to remove
  groupsCollapsed: number;         // Groups actually collapsed (live run)
  duplicatesRemoved: number;       // Items actually deleted (live run)
  reinforcementsApplied: number;   // Total reinforcements transferred
  exactDuplicateGroups: number;    // Phase 1 groups
  tokenDuplicateGroups: number;    // Phase 2 groups
  fuzzyDuplicateGroups: number;    // Phase 3 groups
  samples: ProfileCollapseGroupSample[];  // Example groups for review
}
```

## Foreign Key Safety

<Note>
  When deleting duplicate items, the system handles foreign key constraints gracefully. If a delete fails due to FK constraints (junction tables like `item_categories`, `category_items`, `item_entities`), it cleans up the junction rows first and retries the deletion.
</Note>

## Integration Points

Profile collapse can be triggered:

1. **Manually** via the memory hygiene tool
2. **Periodically** as part of scheduled memory maintenance
3. **On demand** when memory item counts exceed thresholds

## Key Files

| File                                     | LOC | Description                                                |
| ---------------------------------------- | --- | ---------------------------------------------------------- |
| `src/memory/hygiene/profile-collapse.ts` | 469 | Core collapse implementation                               |
| `src/memory/memu-store.ts`               | —   | MemuStore (provides item listing, deletion, reinforcement) |
| `src/memory/memu-types.ts`               | —   | MemoryItem and Significance types                          |
