Skip to content

Latest commit

 

History

History
257 lines (190 loc) · 7.22 KB

File metadata and controls

257 lines (190 loc) · 7.22 KB

Advanced Features

ClawRouter v0.5+ includes intelligent routing features that work automatically.

Table of Contents


Response Cache

ClawRouter includes LLM response caching inspired by LiteLLM's caching system. Identical requests return cached responses, saving both cost and latency.

How it works:

Request: "What is 2+2?"
  First call:  → API ($0.001) → Cache response
  Second call: → Cache HIT → Return instantly ($0)

Features:

Feature Default Description
TTL 10 minutes Responses expire after TTL
Max size 200 entries LRU eviction when full
Item limit 1MB Large responses skipped
Auto-enabled Yes No config needed

Cache key generation:

The cache key is a SHA-256 hash of the request body (model + messages + params), with normalization:

  • Message timestamps stripped (OpenClaw injects [Mon 2024-01-15 10:30 UTC])
  • Keys sorted for consistent hashing
  • Stream mode, user, and request_id fields excluded

Bypass cache:

// Via header
fetch("/v1/chat/completions", {
  headers: { "Cache-Control": "no-cache" }
})

// Via body
{
  "model": "blockrun/auto",
  "cache": false,  // or "no_cache": true
  "messages": [...]
}

Check cache stats:

curl http://localhost:8402/cache

Response:

{
  "size": 42,
  "maxSize": 200,
  "hits": 156,
  "misses": 89,
  "evictions": 3,
  "hitRate": "63.7%"
}

Configuration:

Response caching is enabled by default with sensible defaults. For advanced tuning, the cache can be configured programmatically:

import { ResponseCache } from "@blockrun/clawrouter";

const cache = new ResponseCache({
  maxSize: 500, // Max cached responses
  defaultTTL: 300, // 5 minutes
  maxItemSize: 2_097_152, // 2MB max per item
  enabled: true,
});

Agentic Auto-Detection

ClawRouter automatically detects multi-step agentic tasks and routes to models optimized for autonomous execution:

"what is 2+2"                    → gemini-flash (standard)
"build the project then run tests" → kimi-k2.5 (auto-agentic)
"fix the bug and make sure it works" → kimi-k2.5 (auto-agentic)

How it works:

  • Detects agentic keywords: file ops ("read", "edit"), execution ("run", "test", "deploy"), iteration ("fix", "debug", "verify")
  • Threshold: 2+ signals triggers auto-switch to agentic tiers
  • No config needed — works automatically

Agentic tier models (optimized for multi-step autonomy):

Tier Agentic Model Why
SIMPLE claude-haiku-4.5 Fast + reliable tool use
MEDIUM kimi-k2.5 200+ tool chains, 76% cheaper
COMPLEX claude-sonnet-4.6 Best balance for complex tasks
REASONING kimi-k2.5 Extended reasoning + execution

Force Agentic Mode

You can also force agentic mode via config:

# openclaw.yaml
plugins:
  - id: "@blockrun/clawrouter"
    config:
      routing:
        overrides:
          agenticMode: true # Always use agentic tiers

Tool Detection

When your request includes a tools array (function calling), ClawRouter automatically switches to agentic tiers:

// Request with tools → auto-agentic mode
{
  model: "blockrun/auto",
  messages: [{ role: "user", content: "Check the weather" }],
  tools: [{ type: "function", function: { name: "get_weather", ... } }]
}
// → Routes to claude-haiku-4.5 (excellent tool use)
// → Instead of gemini-flash (may produce malformed tool calls)

Why this matters: Some models (like deepseek-reasoner) are optimized for chain-of-thought reasoning but can generate malformed tool calls. Tool detection ensures requests with functions go to models proven to handle tool use correctly.


Context-Length-Aware Routing

ClawRouter automatically filters out models that can't handle your context size:

150K token request:
  Full chain: [grok-4-fast (131K), deepseek (128K), kimi (262K), gemini (1M)]
  Filtered:   [kimi (262K), gemini (1M)]
  → Skips models that would fail with "context too long" errors

This prevents wasted API calls and faster fallback to capable models.


Model Aliases

Use short aliases instead of full model paths:

/model free      # gpt-oss-120b (FREE!)
/model br-sonnet # anthropic/claude-sonnet-4.6
/model br-opus   # anthropic/claude-opus-4
/model br-haiku  # anthropic/claude-haiku-4.5
/model gpt       # openai/gpt-4o
/model gpt5      # openai/gpt-5.2
/model deepseek  # deepseek/deepseek-chat
/model reasoner  # deepseek/deepseek-reasoner
/model kimi      # moonshot/kimi-k2.5
/model gemini    # google/gemini-2.5-pro
/model flash     # google/gemini-2.5-flash
/model grok      # xai/grok-3
/model grok-fast # xai/grok-4-fast-reasoning

All aliases work with /model blockrun/xxx or just /model xxx.


Free Tier Fallback

When your wallet balance hits $0, ClawRouter automatically falls back to the free model (gpt-oss-120b):

Wallet: $0.00
Request: "Help me write a function"
→ Routes to gpt-oss-120b (FREE)
→ No "insufficient funds" error
→ Keep building while you top up

You'll never get blocked by an empty wallet — the free tier keeps you running.


Session Persistence

For multi-turn conversations, ClawRouter pins the model to prevent mid-task switching:

Turn 1: "Build a React component" → claude-sonnet-4.6
Turn 2: "Add dark mode support"   → claude-sonnet-4.6 (pinned)
Turn 3: "Now add tests"           → claude-sonnet-4.6 (pinned)

Sessions are identified by conversation ID and persist for 1 hour of inactivity.


Cost Tracking with /stats

Track your savings in real-time:

# In any OpenClaw conversation
/stats

Output:

+============================================================+
|              ClawRouter Usage Statistics                   |
+============================================================+
|  Period: last 7 days                                      |
|  Total Requests: 442                                      |
|  Total Cost: $1.73                                       |
|  Baseline Cost (Opus): $20.13                            |
|  Total Saved: $18.40 (91.4%)                             |
+------------------------------------------------------------+
|  Routing by Tier:                                          |
|    SIMPLE     ===========           55.0% (243)            |
|    MEDIUM     ======                30.8% (136)            |
|    COMPLEX    =                      7.2% (32)             |
|    REASONING  =                      7.0% (31)             |
+============================================================+

Stats are stored locally at ~/.openclaw/blockrun/logs/ and aggregated on demand.