Model Failover
Automatic failover with exponential backoff, cooldowns, and profile rotation.
Overview
ArgentOS is designed to stay online even when individual API providers or accounts hit limits. The failover system automatically rotates between auth profiles and providers, using exponential backoff to avoid hammering rate-limited services.
Failover Chain
When an API call fails, ArgentOS follows this chain:
Primary Profile → Next Profile (same provider) → Alternative Provider → ErrorExample
- Call to
anthropic:titaniumfails (429 rate limit) anthropic:titaniumenters cooldown- Retry with
anthropic:webdevtoday - If that also fails, try
anthropic:semfreak - If all Anthropic profiles are in cooldown, try MiniMax or Z.AI (if configured for the current tier)
- If no providers are available, return an error to the user
Exponential Backoff
When a profile hits a rate limit, it enters an exponential backoff cooldown:
| Failure Count | Cooldown Duration |
|---|---|
| 1st | 30 seconds |
| 2nd | 1 minute |
| 3rd | 2 minutes |
| 4th | 4 minutes |
| 5th+ | 8 minutes (max) |
The cooldown resets after a successful call from the profile.
Cooldown Management
Viewing Cooldown Status
argent statusThe status command shows which profiles are active, which are in cooldown, and when they will be available again.
Manual Reset
argent model reset-cooldownsClears all cooldown timers, making all profiles immediately available.
Error Types
Different errors trigger different behaviors:
| Error | Action |
|---|---|
| 429 Rate Limit | Cooldown + rotate to next profile |
| 529 Overloaded | Cooldown + rotate to next profile |
| 401 Unauthorized | Disable profile (requires manual fix) |
| Network error | Short cooldown + retry |
| Context overflow | Skip auth profile rotation (not a provider issue) |
| 500 Server Error | Short cooldown + retry once |
Context overflow errors specifically do not trigger profile rotation since the issue is with the request, not the provider.
Provider-Specific Notes
Anthropic Max Subscriptions
Max subscriptions have weekly quotas that reset on a rolling basis. When one account's quota is exhausted, failover to another subscription keeps the agent running.
MiniMax Limitations
MiniMax cannot accept tool call history with Anthropic-format IDs. Failover to MiniMax only works for:
- Fresh sessions with no tool history
- Non-tool conversations
The model router accounts for this and avoids routing mid-session tool conversations to MiniMax.
Local Models (Ollama)
Local models via Ollama have no rate limits or costs but are limited in capability. The LOCAL tier is always available as an ultimate fallback for simple queries.