Skip to main content

Overview

Profile Collapse is a memory hygiene system that automatically detects and merges duplicate operational profile snapshots in MemU. As the agent operates, heartbeat cycles, status checks, and system monitoring generate many profile-type memory items that contain ephemeral operational data — metrics, counts, timestamps, uptime reports. These accumulate rapidly and pollute memory recall with near-identical entries. Profile Collapse identifies these duplicates through a three-phase grouping strategy (exact match, token match, fuzzy match) and merges them into canonical entries, transferring reinforcement counts to preserve signal strength.

What Gets Collapsed

Not all profile items are candidates for collapse. The system targets operational profile snapshots — items whose summaries contain operational keywords combined with numeric or datetime tokens.

Operational Detection

A profile item is classified as operational if its summary matches both:
  1. Operational keywords: status, snapshot, health, metric, count, queue, uptime, latency, ticket, alert, cron, heartbeat, service, gateway, dashboard, api, provider, model
  2. Ephemeral tokens: numeric values (42, 99.5%) or datetime stamps (2026-03-15)
  • “Gateway health: 3 agents connected, latency 45ms, uptime 99.2%”
  • “Heartbeat status 2026-03-15: 12 tasks verified, 2 failed, score 85”
  • “Discord channel metrics: 47 messages today, 3 alerts pending”

Three-Phase Grouping

Phase 1: Exact Signature Match

The system normalizes operational summaries into signatures by replacing ephemeral tokens with placeholders:
Token TypeReplacementExample
Dates/timestamps<datetime>2026-03-15T14:30:00Z becomes <datetime>
UUIDs and hex strings<id>a8f2b3c4-... becomes <id>
Request/run IDs<id>run-abc123 becomes <id>
Numbers<num>42, 99.5% become <num>
Two items with identical signatures after normalization are exact duplicates. Example:
Original: "Gateway health: 3 agents, latency 45ms, 2026-03-15"
Signature: "gateway health <num> agent latency <num> ms <datetime>"

Original: "Gateway health: 5 agents, latency 30ms, 2026-03-16"
Signature: "gateway health <num> agent latency <num> ms <datetime>"

→ Same signature → exact duplicate group

Phase 2: Token Key Match

Items not caught by exact matching are further grouped using semantic token keys. The system:
  1. Generates the normalized signature
  2. Strips stopwords (the, is, are, was, this, that, etc.)
  3. Strips placeholder tokens (<id>, <num>, <datetime>)
  4. Applies basic stemming (plural stripping: services becomes service)
  5. Sorts remaining tokens alphabetically
  6. Joins as a token key
Items with identical token keys (and at least 3 semantic tokens) are token duplicates.

Phase 3: Fuzzy Match (Optional)

When enabled, items not caught by exact or token matching are clustered using Jaccard similarity on their semantic token sets:
  • Similarity threshold: 0.78 (78% token overlap)
  • Minimum intersection: 4 shared tokens
  • Clustering: Greedy nearest-cluster assignment, sorted by creation time
Fuzzy matching catches paraphrased duplicates where the same operational data is described with slightly different wording.

Canonical Selection

When a group of duplicates is found, one item is selected as the canonical (keeper) entry. Selection priority:
  1. Highest significancecore > important > noteworthy > routine
  2. Highest reinforcement count — more reinforced = more important
  3. Oldest creation date — preserve the first observation
  4. Lexicographic ID — deterministic tiebreaker
The canonical item survives; all other items in the group are deleted.

Reinforcement Transfer

When duplicates are removed, their reinforcement counts are transferred to the canonical item. Each deleted duplicate contributes at least 1 reinforcement (or its actual count if higher). This preserves the signal that the observation was seen multiple times.
Group: 5 items (1 canonical + 4 duplicates)
- Canonical: reinforcement_count = 2
- Duplicate A: reinforcement_count = 3
- Duplicate B: reinforcement_count = 1
- Duplicate C: reinforcement_count = 1
- Duplicate D: reinforcement_count = 0 (counts as 1)

After collapse:
- Canonical: reinforcement_count = 2 + 3 + 1 + 1 + 1 = 8
- Duplicates A-D: deleted

Running Profile Collapse

By default, collapse runs in dry-run mode — it reports what would happen without deleting anything:
import { collapseOperationalProfileSnapshots } from "./memory/hygiene/profile-collapse.ts";

const report = collapseOperationalProfileSnapshots(store, { dryRun: true });
console.log(report);
// {
//   dryRun: true,
//   scannedProfiles: 1200,
//   operationalProfiles: 340,
//   duplicateGroups: 45,
//   duplicatesFound: 180,
//   samples: [...]
// }

Options

OptionTypeDefaultDescription
dryRunbooleantrueReport only, no deletions
enableFuzzybooleanfalseEnable Phase 3 fuzzy matching
maxSampleGroupsnumber20Max sample groups in report
batchSizenumber500Pagination size for profile listing

Collapse Report

The function returns a detailed report:
interface ProfileCollapseReport {
  dryRun: boolean;
  scannedProfiles: number;         // Total profile items scanned
  operationalProfiles: number;     // Items matching operational pattern
  uniqueSignatures: number;        // Distinct signatures found
  duplicateGroups: number;         // Groups with 2+ members
  duplicatesFound: number;         // Total items to remove
  groupsCollapsed: number;         // Groups actually collapsed (live run)
  duplicatesRemoved: number;       // Items actually deleted (live run)
  reinforcementsApplied: number;   // Total reinforcements transferred
  exactDuplicateGroups: number;    // Phase 1 groups
  tokenDuplicateGroups: number;    // Phase 2 groups
  fuzzyDuplicateGroups: number;    // Phase 3 groups
  samples: ProfileCollapseGroupSample[];  // Example groups for review
}

Foreign Key Safety

When deleting duplicate items, the system handles foreign key constraints gracefully. If a delete fails due to FK constraints (junction tables like item_categories, category_items, item_entities), it cleans up the junction rows first and retries the deletion.

Integration Points

Profile collapse can be triggered:
  1. Manually via the memory hygiene tool
  2. Periodically as part of scheduled memory maintenance
  3. On demand when memory item counts exceed thresholds

Key Files

FileLOCDescription
src/memory/hygiene/profile-collapse.ts469Core collapse implementation
src/memory/memu-store.tsMemuStore (provides item listing, deletion, reinforcement)
src/memory/memu-types.tsMemoryItem and Significance types