docs: sync CLI and prompt-caching reference with code

2026-04-08 10:24:20 +02:00 · 2026-04-05 09:30:31 +01:00
parent 455c642acb
commit bca6faf11d
3 changed files with 46 additions and 5 deletions
--- a/docs/cli/index.md
+++ b/docs/cli/index.md
@@ -1319,7 +1319,7 @@ List and manage [background task](/automation/tasks) runs across agents.
 - `tasks notify <id>` — change notification policy for a task run
 - `tasks cancel <id>` — cancel a running task
 - `tasks audit` — surface operational issues (stale, lost, delivery failures)
- `tasks maintenance` — preview or apply tasks and TaskFlow cleanup/reconciliation (ACP/subagent child sessions, active cron jobs, live CLI runs)
+- `tasks maintenance [--apply] [--json]` — preview or apply tasks and TaskFlow cleanup/reconciliation (ACP/subagent child sessions, active cron jobs, live CLI runs)
 - `tasks flow list` — list active and recent Task Flow flows
 - `tasks flow show <lookup>` — inspect a flow by id or lookup key
 - `tasks flow cancel <lookup>` — cancel a running flow and its active tasks
--- a/docs/reference/prompt-caching.md
+++ b/docs/reference/prompt-caching.md
@@ -112,6 +112,13 @@ Per-agent heartbeat is supported at `agents.list[].heartbeat`.
 - OpenAI returns useful tracing and rate-limit headers such as `x-request-id`, `openai-processing-ms`, and `x-ratelimit-*`, but cache-hit accounting should come from the usage payload, not from headers.
 - In practice, OpenAI often behaves like an initial-prefix cache rather than Anthropic-style moving full-history reuse. Stable long-prefix text turns can land near a `4864` cached-token plateau in current live probes, while tool-heavy or MCP-style transcripts often plateau near `4608` cached tokens even on exact repeats.

+### Anthropic Vertex
+
+- Anthropic models on Vertex AI (`anthropic-vertex/*`) support `cacheRetention` the same way as direct Anthropic.
+- `cacheRetention: "long"` maps to the real 1-hour prompt-cache TTL on Vertex AI endpoints.
+- Default cache retention for `anthropic-vertex` matches direct Anthropic defaults.
+- Vertex requests are routed through boundary-aware cache shaping so cache reuse stays aligned with what providers actually receive.
+
 ### Amazon Bedrock

 - Anthropic Claude model refs (`amazon-bedrock/*anthropic.claude*`) support explicit `cacheRetention` pass-through.
@@ -136,12 +143,16 @@ If the provider does not support this cache mode, `cacheRetention` has no effect

 - Direct Gemini transport (`api: "google-generative-ai"`) reports cache hits
  through upstream `cachedContentTokenCount`; OpenClaw maps that to `cacheRead`.
- If you already have a Gemini cached-content handle, you can pass it through as
+- When `cacheRetention` is set on a direct Gemini model, OpenClaw automatically
+  creates, reuses, and refreshes `cachedContents` resources for system prompts
+  on Google AI Studio runs. This means you no longer need to pre-create a
+  cached-content handle manually.
+- You can still pass a pre-existing Gemini cached-content handle through as
  `params.cachedContent` (or legacy `params.cached_content`) on the configured
  model.
- This is separate from Anthropic/OpenAI prompt-prefix caching. OpenClaw is
-  forwarding a provider-native cached-content reference, not synthesizing cache
-  markers.
+- This is separate from Anthropic/OpenAI prompt-prefix caching. For Gemini,
+  OpenClaw manages a provider-native `cachedContents` resource rather than
+  injecting cache markers into the request.

 ### Gemini CLI JSON usage

@@ -152,6 +163,35 @@ If the provider does not support this cache mode, `cacheRetention` has no effect
 - This is usage normalization only. It does not mean OpenClaw is creating
  Anthropic/OpenAI-style prompt-cache markers for Gemini CLI.

+## System-prompt cache boundary
+
+OpenClaw splits the system prompt into a **stable prefix** and a **volatile
+suffix** separated by an internal cache-prefix boundary. Content above the
+boundary (tool definitions, skills metadata, workspace files, and other
+relatively static context) is ordered so it stays byte-identical across turns.
+Content below the boundary (for example `HEARTBEAT.md`, runtime timestamps, and
+other per-turn metadata) is allowed to change without invalidating the cached
+prefix.
+
+Key design choices:
+
+- Stable workspace project-context files are ordered before `HEARTBEAT.md` so
+  heartbeat churn does not bust the stable prefix.
+- The boundary is applied across Anthropic-family, OpenAI-family, Google, and
+  CLI transport shaping so all supported providers benefit from the same prefix
+  stability.
+- Codex Responses and Anthropic Vertex requests are routed through
+  boundary-aware cache shaping so cache reuse stays aligned with what providers
+  actually receive.
+- System-prompt fingerprints are normalized (whitespace, line endings,
+  hook-added context, runtime capability ordering) so semantically unchanged
+  prompts share KV/cache across turns.
+
+If you see unexpected `cacheWrite` spikes after a config or workspace change,
+check whether the change lands above or below the cache boundary. Moving
+volatile content below the boundary (or stabilizing it) often resolves the
+issue.
+
 ## OpenClaw cache-stability guards

 OpenClaw also keeps several cache-sensitive payload shapes deterministic before
--- a/docs/tools/slash-commands.md
+++ b/docs/tools/slash-commands.md
@@ -122,6 +122,7 @@ Text + native (when enabled):
 - `/model <name>` (alias: `/models`; or `/<alias>` from `agents.defaults.models.*.alias`)
 - `/queue <mode>` (plus options like `debounce:2s cap:25 drop:summarize`; send `/queue` to see current settings)
 - `/bash <command>` (host-only; alias for `! <command>`; requires `commands.bash: true` + `tools.elevated` allowlists)
+- `/dreaming [off|core|rem|deep|status|help]` (toggle dreaming mode or show status; see [Dreaming](/concepts/memory-dreaming))

 Text-only: