Self-Improving System (SIS)

How ArgentOS learns from its own behavior — lessons, patterns, and feedback loops.

Part of ArgentOS — https://argentos.ai

Overview

The SIS layer enables Argent to:

Observe — Track outcomes of actions
Evaluate — Assess what worked vs. what didn't
Learn — Extract lessons and patterns
Apply — Use lessons to improve future decisions

┌──────────────────────────────────────────────────────────────────────────────┐
│                        SELF-IMPROVING SYSTEM (SIS)                           │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                         FEEDBACK LOOP                                │   │
│   │                                                                      │   │
│   │    ACTION ──► OUTCOME ──► EVALUATION ──► LESSON ──► MEMORY BANK     │   │
│   │       ▲                                                    │         │   │
│   │       └────────────────── RETRIEVAL ◄─────────────────────┘         │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                       MEMORY BANKS                                   │   │
│   │                                                                      │   │
│   │   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │   │
│   │   │   LESSONS    │  │   PATTERNS   │  │  PREFERENCES │             │   │
│   │   │   LEARNED    │  │   DETECTED   │  │   INFERRED   │             │   │
│   │   │              │  │              │  │              │             │   │
│   │   │ • Mistakes   │  │ • Time-based │  │ • User likes │             │   │
│   │   │ • Successes  │  │ • Tool combos│  │ • User hates │             │   │
│   │   │ • Workarounds│  │ • Failure    │  │ • Work style │             │   │
│   │   │ • Best paths │  │   patterns   │  │ • Comm style │             │   │
│   │   └──────────────┘  └──────────────┘  └──────────────┘             │   │
│   │                                                                      │   │
│   │   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │   │
│   │   │    TOOL      │  │   CONTEXT    │  │   MODEL      │             │   │
│   │   │   KNOWLEDGE  │  │   AWARENESS  │  │   FEEDBACK   │             │   │
│   │   │              │  │              │  │              │             │   │
│   │   │ • API quirks │  │ • Project    │  │ • Which model│             │   │
│   │   │ • Rate limits│  │   specific   │  │   worked best│             │   │
│   │   │ • Edge cases │  │ • User-      │  │ • Complexity │             │   │
│   │   │ • Workarounds│  │   specific   │  │   estimates  │             │   │
│   │   └──────────────┘  └──────────────┘  └──────────────┘             │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Memory Bank Types

1. Lessons Learned

Things discovered through experience that should be remembered.

interface Lesson {
  id: string;
  type: 'mistake' | 'success' | 'workaround' | 'discovery';

  // What happened
  context: string;       // "Trying to send WhatsApp message to group"
  action: string;        // "Used sendMessage with group JID"
  outcome: string;       // "Failed - group JID format was wrong"

  // What was learned
  lesson: string;        // "WhatsApp group JIDs must end with @g.us"
  correction?: string;   // "Always append @g.us to group IDs"

  // Metadata
  confidence: number;    // 0-1, increases with repeated validation
  occurrences: number;   // How many times this came up
  lastSeen: number;      // Timestamp

  // Retrieval
  tags: string[];        // ['whatsapp', 'groups', 'jid', 'format']
  relatedTools: string[]; // ['whatsapp_send', 'message']
}

Examples:

MISTAKE: "When user says 'remind me tomorrow', I should ask what time,
          not assume 9am. User prefers afternoon reminders."

SUCCESS: "Using bullet points instead of paragraphs for task summaries
          gets better user engagement."

WORKAROUND: "ElevenLabs API sometimes returns 429. Wait 2 seconds and
             retry up to 3 times before falling back to system TTS."

DISCOVERY: "User's calendar has recurring 'Focus Time' blocks - don't
            schedule interruptions during these."

2. Patterns Detected

Recurring behaviors and correlations.

interface Pattern {
  id: string;
  type: 'temporal' | 'sequential' | 'failure' | 'success' | 'preference';

  // Pattern definition
  description: string;
  triggers: string[];     // What conditions activate this pattern
  frequency: number;      // How often it occurs

  // Evidence
  observations: string[]; // IDs of supporting observations
  confidence: number;

  // Application
  recommendation?: string; // What to do when pattern detected
}

Examples:

TEMPORAL: "User typically asks for weather between 7-8am.
           Proactively check at 6:55am."

SEQUENTIAL: "After 'check email' task, user often asks 'respond to X'.
             Prepare draft responses proactively."

FAILURE: "API calls to silver-prices.com fail on weekends.
          Use backup source (metals-api.com) on Sat/Sun."

3. Tool Knowledge

Accumulated knowledge about how tools actually work in practice.

interface ToolKnowledge {
  toolName: string;

  // Learned constraints
  rateLimits?: {
    requests: number;
    period: string;
    discovered: number; // timestamp
  };

  // Edge cases
  edgeCases: Array<{
    condition: string;
    behavior: string;
    workaround?: string;
  }>;

  // Best practices
  bestPractices: string[];

  // Failure modes
  commonErrors: Array<{
    error: string;
    cause: string;
    solution: string;
  }>;
}

4. Model Feedback

Learning which models work best for which tasks.

interface ModelFeedback {
  taskType: string;        // 'code_review', 'creative_writing', etc.

  // Performance by model
  modelPerformance: Record<string, {
    attempts: number;
    successes: number;
    avgLatency: number;
    avgTokens: number;
    userSatisfaction: number; // inferred from reactions
  }>;

  // Learned routing
  recommendedModel: string;
  complexityAdjustment: number; // Adjust base complexity score
}

Feedback Loop Implementation

1. Observation Collection

// After every action, record the outcome
interface ActionOutcome {
  actionId: string;
  action: {
    type: string;
    tool?: string;
    input: unknown;
  };
  outcome: {
    success: boolean;
    result?: unknown;
    error?: string;
    latency: number;
  };
  context: {
    model: string;
    taskId?: string;
    sessionKey: string;
  };
  userFeedback?: {
    explicit?: 'positive' | 'negative' | 'neutral';
    implicit?: 'continued' | 'abandoned' | 'corrected';
  };
}

2. Evaluation Engine

class EvaluationEngine {
  // Analyze outcome and determine if lesson should be extracted
  async evaluate(outcome: ActionOutcome): Promise<EvaluationResult> {
    const signals: EvaluationSignal[] = [];

    // Check for explicit failure
    if (!outcome.outcome.success) {
      signals.push({
        type: 'failure',
        severity: this.assessSeverity(outcome.outcome.error),
        suggestion: await this.suggestCorrection(outcome)
      });
    }

    // Check for user correction
    if (outcome.userFeedback?.implicit === 'corrected') {
      signals.push({
        type: 'correction',
        severity: 'medium',
        suggestion: 'User had to correct - analyze what was wrong'
      });
    }

    // Check for repeated pattern
    const similar = await this.findSimilarOutcomes(outcome);
    if (similar.length >= 3) {
      signals.push({
        type: 'pattern',
        severity: 'low',
        suggestion: 'Recurring situation - consider extracting pattern'
      });
    }

    return { signals, shouldLearn: signals.length > 0 };
  }
}

3. Lesson Extraction

class LessonExtractor {
  // Use a fast model to extract lessons from outcomes
  async extractLesson(
    outcome: ActionOutcome,
    evaluation: EvaluationResult
  ): Promise<Lesson | null> {

    const prompt = `
Analyze this action outcome and extract a lesson if applicable.

ACTION: ${JSON.stringify(outcome.action)}
OUTCOME: ${JSON.stringify(outcome.outcome)}
EVALUATION: ${JSON.stringify(evaluation)}

If there's a useful lesson to remember for future similar situations,
respond with a JSON lesson object. If not, respond with null.

Focus on:
- What went wrong (or right) and why
- How to handle this better next time
- Any workarounds or best practices discovered
`;

    // Use local model for this (cheap, fast)
    const result = await this.models.run('local', prompt, {});
    return this.parseLesson(result);
  }
}

4. Memory Bank Storage

// Lessons stored in SQLite with FTS5 for search
const LESSONS_SCHEMA = `
CREATE TABLE IF NOT EXISTS lessons (
  id TEXT PRIMARY KEY,
  type TEXT NOT NULL,
  context TEXT NOT NULL,
  action TEXT NOT NULL,
  outcome TEXT NOT NULL,
  lesson TEXT NOT NULL,
  correction TEXT,
  confidence REAL DEFAULT 0.5,
  occurrences INTEGER DEFAULT 1,
  last_seen INTEGER NOT NULL,
  tags TEXT, -- JSON array
  related_tools TEXT, -- JSON array
  created_at INTEGER NOT NULL
);

CREATE VIRTUAL TABLE IF NOT EXISTS lessons_fts USING fts5(
  context, action, outcome, lesson, correction, tags,
  content='lessons',
  content_rowid='rowid'
);
`;

5. Retrieval and Application

class LessonRetriever {
  // Before taking an action, check for relevant lessons
  async getRelevantLessons(context: {
    action: string;
    tool?: string;
    userMessage?: string;
  }): Promise<Lesson[]> {

    // Search by tool
    const toolLessons = context.tool
      ? await this.db.search({ relatedTools: context.tool })
      : [];

    // Search by semantic similarity
    const semanticLessons = await this.db.ftsSearch(
      `${context.action} ${context.userMessage || ''}`
    );

    // Combine and rank by relevance + confidence
    return this.rankLessons([...toolLessons, ...semanticLessons]);
  }

  // Inject lessons into prompt
  formatForPrompt(lessons: Lesson[]): string {
    if (lessons.length === 0) return '';

    return `
## Lessons from Past Experience

${lessons.map(l => `- **${l.type.toUpperCase()}**: ${l.lesson}${
  l.correction ? `\n  → Correction: ${l.correction}` : ''
}`).join('\n')}

Apply these lessons to avoid repeating mistakes.
`;
  }
}

Integration with Always-On Loop

// In the always-on loop, inject lessons before processing
async handleMessage(event: AgentEvent): Promise<void> {
  // ... existing code ...

  // 2.5 Get relevant lessons (NEW)
  const lessons = await this.deps.sis.getRelevantLessons({
    action: 'respond_to_message',
    userMessage: payload.text,
  });
  const lessonContext = this.deps.sis.formatForPrompt(lessons);

  // 3. Get memory context
  const memoryContext = await this.deps.memory.getContext({ maxTokens: 2000 });

  // 4. Build prompt with lessons
  const prompt = `${memoryContext}${lessonContext}${taskContext}\n\nUser: ${payload.text}`;

  // ... rest of processing ...

  // 8. After response, record outcome for learning (NEW)
  await this.deps.sis.recordOutcome({
    actionId: generateId(),
    action: { type: 'message_response', input: payload.text },
    outcome: { success: true, result: response.text, latency: elapsed },
    context: { model: tier, sessionKey: payload.sessionKey },
  });
}

Heartbeat Learning Review

During low-activity periods, Argent reviews and consolidates lessons:

async handleHeartbeat(event: AgentEvent): Promise<void> {
  // ... existing task handling ...

  // If no urgent tasks, do learning maintenance
  if (pendingTasks.length === 0) {
    await this.deps.sis.maintenanceCycle();
  }
}

// In SIS module
async maintenanceCycle(): Promise<void> {
  // 1. Review recent outcomes for patterns
  const recentOutcomes = await this.getRecentOutcomes({ hours: 24 });
  const patterns = await this.detectPatterns(recentOutcomes);

  for (const pattern of patterns) {
    await this.storePattern(pattern);
  }

  // 2. Consolidate similar lessons
  const duplicates = await this.findDuplicateLessons();
  for (const group of duplicates) {
    await this.consolidateLessons(group);
  }

  // 3. Decay old, unvalidated lessons
  await this.decayOldLessons({ olderThan: days(30), minConfidence: 0.3 });

  // 4. Promote frequently-validated lessons
  await this.promoteValidatedLessons({ minOccurrences: 5 });
}

User Feedback Integration

// Explicit feedback (user says "that was wrong" or "perfect!")
async handleUserFeedback(
  feedback: 'positive' | 'negative',
  recentAction: ActionOutcome
): Promise<void> {
  // Update outcome with feedback
  await this.updateOutcome(recentAction.actionId, {
    userFeedback: { explicit: feedback }
  });

  // If negative, trigger immediate lesson extraction
  if (feedback === 'negative') {
    const lesson = await this.extractLesson(recentAction, {
      signals: [{ type: 'user_negative', severity: 'high' }],
      shouldLearn: true
    });

    if (lesson) {
      await this.storeLesson(lesson);

      // Optionally confirm with user
      await this.confirmLesson(lesson);
    }
  }

  // If positive, boost confidence in recent lessons used
  if (feedback === 'positive') {
    const usedLessons = await this.getLessonsUsedInAction(recentAction);
    for (const lesson of usedLessons) {
      await this.boostConfidence(lesson.id, 0.1);
    }
  }
}

Example Lessons Database

-- Example lessons that Argent might accumulate:

INSERT INTO lessons VALUES (
  'les_001',
  'mistake',
  'User asked to check silver prices',
  'Called silver-api.com on Sunday',
  'API returned 503 - service unavailable on weekends',
  'Silver price APIs often have weekend maintenance. Use cached data or backup source.',
  'Check day of week before calling silver-api.com. Use metals-api.com as fallback.',
  0.9,
  7,
  1707145200,
  '["silver", "api", "weekend", "fallback"]',
  '["fetch_silver_price", "web_fetch"]',
  1706886000
);

INSERT INTO lessons VALUES (
  'les_002',
  'success',
  'User asked for meeting summary',
  'Used bullet points with action items highlighted',
  'User responded positively and shared with team',
  'Meeting summaries work best with: 1) Key decisions, 2) Action items with owners, 3) Next steps',
  NULL,
  0.85,
  12,
  1707145200,
  '["meetings", "summary", "format", "action-items"]',
  '["calendar", "message"]',
  1706540000
);

INSERT INTO lessons VALUES (
  'les_003',
  'workaround',
  'Sending long message to Telegram',
  'Message exceeded 4096 character limit',
  'Message was truncated, user missed important info',
  'Telegram has 4096 char limit. Split long messages or use file attachment.',
  'Check message length before sending. If > 4000 chars, split into multiple messages or attach as file.',
  0.95,
  15,
  1707145200,
  '["telegram", "message", "length", "limit"]',
  '["telegram_send", "message"]',
  1706280000
);

Configuration

{
  "sis": {
    "enabled": true,
    "lessonExtraction": {
      "model": "local",           // Use local model for extraction
      "minConfidenceToStore": 0.3,
      "maxLessonsPerDay": 50
    },
    "retrieval": {
      "maxLessonsInPrompt": 5,
      "minConfidenceToUse": 0.5,
      "recencyBias": 0.2          // Prefer recent lessons
    },
    "maintenance": {
      "consolidationInterval": "6h",
      "decayAfterDays": 30,
      "decayRate": 0.1
    },
    "feedback": {
      "askForConfirmation": false, // Don't pester user
      "trackImplicitFeedback": true
    }
  }
}

Summary

The SIS layer makes Argent a learning system that:

Remembers mistakes and doesn't repeat them
Recognizes patterns in user behavior and external systems
Accumulates tool knowledge from experience
Improves model routing based on actual performance
Self-maintains through consolidation and decay

This transforms Argent from a stateless assistant into a growing intelligence that becomes more effective over time.

SIS Architecture designed 2026-02-05

Self-Improving System (SIS)

On this page