ClawRouter v0.5+ includes intelligent routing features that work automatically.
- Response Cache
- Agentic Auto-Detection
- Tool Detection
- Context-Length-Aware Routing
- Model Aliases
- Free Tier Fallback
- Session Persistence
- Cost Tracking with /stats
ClawRouter includes LLM response caching inspired by LiteLLM's caching system. Identical requests return cached responses, saving both cost and latency.
How it works:
Request: "What is 2+2?"
First call: → API ($0.001) → Cache response
Second call: → Cache HIT → Return instantly ($0)
Features:
| Feature | Default | Description |
|---|---|---|
| TTL | 10 minutes | Responses expire after TTL |
| Max size | 200 entries | LRU eviction when full |
| Item limit | 1MB | Large responses skipped |
| Auto-enabled | Yes | No config needed |
Cache key generation:
The cache key is a SHA-256 hash of the request body (model + messages + params), with normalization:
- Message timestamps stripped (OpenClaw injects
[Mon 2024-01-15 10:30 UTC]) - Keys sorted for consistent hashing
- Stream mode, user, and request_id fields excluded
Bypass cache:
// Via header
fetch("/v1/chat/completions", {
headers: { "Cache-Control": "no-cache" }
})
// Via body
{
"model": "blockrun/auto",
"cache": false, // or "no_cache": true
"messages": [...]
}Check cache stats:
curl http://localhost:8402/cacheResponse:
{
"size": 42,
"maxSize": 200,
"hits": 156,
"misses": 89,
"evictions": 3,
"hitRate": "63.7%"
}Configuration:
Response caching is enabled by default with sensible defaults. For advanced tuning, the cache can be configured programmatically:
import { ResponseCache } from "@blockrun/clawrouter";
const cache = new ResponseCache({
maxSize: 500, // Max cached responses
defaultTTL: 300, // 5 minutes
maxItemSize: 2_097_152, // 2MB max per item
enabled: true,
});ClawRouter automatically detects multi-step agentic tasks and routes to models optimized for autonomous execution:
"what is 2+2" → gemini-flash (standard)
"build the project then run tests" → kimi-k2.5 (auto-agentic)
"fix the bug and make sure it works" → kimi-k2.5 (auto-agentic)
How it works:
- Detects agentic keywords: file ops ("read", "edit"), execution ("run", "test", "deploy"), iteration ("fix", "debug", "verify")
- Threshold: 2+ signals triggers auto-switch to agentic tiers
- No config needed — works automatically
Agentic tier models (optimized for multi-step autonomy):
| Tier | Agentic Model | Why |
|---|---|---|
| SIMPLE | claude-haiku-4.5 | Fast + reliable tool use |
| MEDIUM | kimi-k2.5 | 200+ tool chains, 76% cheaper |
| COMPLEX | claude-sonnet-4.6 | Best balance for complex tasks |
| REASONING | kimi-k2.5 | Extended reasoning + execution |
You can also force agentic mode via config:
# openclaw.yaml
plugins:
- id: "@blockrun/clawrouter"
config:
routing:
overrides:
agenticMode: true # Always use agentic tiersWhen your request includes a tools array (function calling), ClawRouter automatically switches to agentic tiers:
// Request with tools → auto-agentic mode
{
model: "blockrun/auto",
messages: [{ role: "user", content: "Check the weather" }],
tools: [{ type: "function", function: { name: "get_weather", ... } }]
}
// → Routes to claude-haiku-4.5 (excellent tool use)
// → Instead of gemini-flash (may produce malformed tool calls)Why this matters: Some models (like deepseek-reasoner) are optimized for chain-of-thought reasoning but can generate malformed tool calls. Tool detection ensures requests with functions go to models proven to handle tool use correctly.
ClawRouter automatically filters out models that can't handle your context size:
150K token request:
Full chain: [grok-4-fast (131K), deepseek (128K), kimi (262K), gemini (1M)]
Filtered: [kimi (262K), gemini (1M)]
→ Skips models that would fail with "context too long" errors
This prevents wasted API calls and faster fallback to capable models.
Use short aliases instead of full model paths:
/model free # gpt-oss-120b (FREE!)
/model br-sonnet # anthropic/claude-sonnet-4.6
/model br-opus # anthropic/claude-opus-4
/model br-haiku # anthropic/claude-haiku-4.5
/model gpt # openai/gpt-4o
/model gpt5 # openai/gpt-5.2
/model deepseek # deepseek/deepseek-chat
/model reasoner # deepseek/deepseek-reasoner
/model kimi # moonshot/kimi-k2.5
/model gemini # google/gemini-2.5-pro
/model flash # google/gemini-2.5-flash
/model grok # xai/grok-3
/model grok-fast # xai/grok-4-fast-reasoningAll aliases work with /model blockrun/xxx or just /model xxx.
When your wallet balance hits $0, ClawRouter automatically falls back to the free model (gpt-oss-120b):
Wallet: $0.00
Request: "Help me write a function"
→ Routes to gpt-oss-120b (FREE)
→ No "insufficient funds" error
→ Keep building while you top up
You'll never get blocked by an empty wallet — the free tier keeps you running.
For multi-turn conversations, ClawRouter pins the model to prevent mid-task switching:
Turn 1: "Build a React component" → claude-sonnet-4.6
Turn 2: "Add dark mode support" → claude-sonnet-4.6 (pinned)
Turn 3: "Now add tests" → claude-sonnet-4.6 (pinned)
Sessions are identified by conversation ID and persist for 1 hour of inactivity.
Track your savings in real-time:
# In any OpenClaw conversation
/statsOutput:
+============================================================+
| ClawRouter Usage Statistics |
+============================================================+
| Period: last 7 days |
| Total Requests: 442 |
| Total Cost: $1.73 |
| Baseline Cost (Opus): $20.13 |
| Total Saved: $18.40 (91.4%) |
+------------------------------------------------------------+
| Routing by Tier: |
| SIMPLE =========== 55.0% (243) |
| MEDIUM ====== 30.8% (136) |
| COMPLEX = 7.2% (32) |
| REASONING = 7.0% (31) |
+============================================================+
Stats are stored locally at ~/.openclaw/blockrun/logs/ and aggregated on demand.