INTEGRATION GUIDE

CacheCore + OpenClaw

Cut LLM costs for multi-agent OpenClaw workflows with one config change. No code modifications required for basic integration.

~10 min read Beginner OpenClaw · Caching · Multi-Agent

Why This Integration Works Well

OpenClaw deploys persistent AI agents that do repeatable work: triage emails, run compliance checks, update CRMs, classify documents. Each task is stateless and structurally identical — the same SOUL.md, the same tools, the same type of question. This is exactly the workload CacheCore is built for.

When ten OpenClaw agents run the same compliance check in parallel, only the first one calls the LLM. The rest get the cached result — instantly, at zero cost.

Cost on cache hit
Zero tokens consumed
Latency on cache hit
<10ms
L1 exact match
Config change needed
1 line
No code changes

How It Works

CacheCore is an OpenAI-compatible proxy. OpenClaw already supports configuring a custom base_url for its LLM provider. Pointing that URL at CacheCore is the entire integration.

From OpenClaw's perspective, it's talking to OpenAI. From CacheCore's perspective, it's receiving standard API calls from a tenant. The cache namespace is derived from your tenant identity, permissions, system prompt (SOUL.md content), toolset fingerprint, and policy version — so agents sharing the same role and the same SOUL.md automatically share the same cache namespace.

Shared namespace by design. Two OpenClaw agents with identical SOUL.md files and the same tenant token produce the same namespace hash. A cache hit from agent A is immediately available to agent B — even if they're running in parallel on different tasks. If you want cache isolation between agent roles, give them different SOUL.md content — the namespaces will be separate automatically.

Basic Integration

Open your OpenClaw config file (~/.openclaw/openclaw.json) and update the provider settings:

Before
"agent": {
  "model": "openai/gpt-5.4-mini",
}
After
"agent": {
  "model": "openai/gpt-5.4-mini",
  "baseURL": "https://api.cachecore.it/v1",
  "apiKey": "cc_live_a7f3bc12.eyJ...",
}

Get your CacheCore token from the portal under Projects → API Tokens. Restart the OpenClaw gateway and you're done. Field names above are as documented at docs.openclaw.ai — verify against your installed version.

That's the full integration for structured agent tasks. Tool calls, document classification, compliance checks, CRM updates — all cached automatically.

The Conversational Turn Problem

OpenClaw includes full conversation history in every message payload. If your agent is handling back-and-forth dialogue, the request body changes with every turn — which means a different cache key and no hits.

A long conversation thread could also produce a false L2 semantic hit against a cached response from a different context, since the growing history distorts the embedding away from the actual user intent.

This only affects conversational agents with growing history (e.g. a WhatsApp assistant). Stateless task agents — the most common OpenClaw workload — are not affected.

The best fix is architectural: keep your agents stateless at task boundaries. Instead of accumulating history across user turns, reconstruct context from a knowledge base or structured state on each invocation. This is consistent with how OpenClaw is designed to work for batch automation and matches the request shape that CacheCore caches most effectively.

For agents that genuinely require multi-turn dialogue, accept that those requests will not benefit from caching. CacheCore will proxy them to OpenAI normally and record a MISS in the X-Cache response header.

Multi-Agent Workflows

This is where CacheCore delivers the most value in OpenClaw setups. When you run multiple agents in parallel via sessions_send, agents with the same SOUL.md share a cache namespace automatically.

Agent A runs first

Checks contract #1 for force majeure. Cache miss — forwarded to OpenAI, result stored under the shared namespace.

Agent B asks semantically the same question

Checks contract #2: "Is force majeure covered?" L2 recognises it as equivalent. Cache hit — no LLM call.

Agents C through N get the same benefit

Every agent sharing the same role and SOUL.md hits the same namespace. Internal batch tests show 60%+ hit rates.

Verify It's Working

After restarting the gateway, send two identical requests from your OpenClaw agent (or trigger the same task twice). Check the X-Cache response header in OpenClaw's verbose output:

Response headers — openclaw gateway --verbose
# First request — cache miss, forwarded to OpenAI
X-Cache: MISS

# Second identical request — L1 exact cache hit
X-Cache: HIT_L1
X-Cache-Age: 0

# Semantically equivalent request — L2 semantic hit
X-Cache: HIT_L2
X-Cache-Similarity: 0.94

You can also view hit rates and token savings in real time from the CacheCore portal dashboard.


Summary

Questions or issues? Open a discussion on GitHub or reach out at hello@cachecore.it.


TECHNICAL GUIDE
← Caching for RAG Agents
How CacheCore cuts synthesis latency from 5 seconds to 19 milliseconds, without changing your prompting logic.