ArgentOSDocs

Health Checks

Gateway health monitoring and zombie process reaping.

Overview

The gateway runs periodic health checks to ensure the system is operating correctly. This includes monitoring service health, connection status, and cleaning up orphaned processes that can accumulate during normal operation.

Health Check Timer

The health check runs every 60 seconds and performs:

  1. Channel connection status checks
  2. Memory database accessibility
  3. Zombie process reaping
  4. Disk space monitoring (optional)

Zombie Process Reaper

During normal operation, the gateway spawns claude --stream-json subprocesses for agent interactions. If these processes crash or timeout without cleanup, they become zombies that consume system resources.

The reaper:

  1. Scans for claude processes with the stream-json flag
  2. Checks their age using POSIX etime format
  3. Kills any that are older than 5 minutes
# What the reaper does internally (macOS-compatible)
ps -eo pid,etime,command | grep "stream-json"

The reaper specifically greps for "stream-json" only. It must NEVER grep broadly for "claude" as that would kill active Claude Code sessions running on the same machine.

etime Parsing

The reaper uses the POSIX etime format ([[dd-]hh:]mm:ss) which works on macOS. It does not use etimes (seconds format) which is Linux-only.

Checking Health

Via CLI

argent gateway status

Output includes:

  • Gateway process status (running/stopped)
  • RPC probe result (ok/failed)
  • Listening port
  • Uptime
  • Connected channels

Via Dashboard

The dashboard header shows a connection indicator:

  • Green: Connected and healthy
  • Yellow: Connected with warnings
  • Red: Disconnected or unhealthy

Common Health Issues

RPC Probe Fails

The gateway is running but not responding to RPC calls:

  • Check the gateway logs: argent gateway logs
  • Verify the port is not blocked by a firewall
  • Check for native module ABI mismatches (see Configuration)

High Zombie Count

If you see many zombie processes accumulating:

  • The reaper should handle these automatically
  • If they persist, check if the 5-minute timeout is appropriate for your workload
  • Manual cleanup: argent gateway restart

Memory Database Locked

If the health check reports a locked database:

  • Check for other processes accessing ~/.argentos/memory.db
  • The WAL (Write-Ahead Logging) mode should prevent most lock issues
  • Restart the gateway if the lock persists