diff --git a/docs/cli/index.md b/docs/cli/index.md index 2d07a7218ea..3ddda417b92 100644 --- a/docs/cli/index.md +++ b/docs/cli/index.md @@ -1319,7 +1319,7 @@ List and manage [background task](/automation/tasks) runs across agents. - `tasks notify ` — change notification policy for a task run - `tasks cancel ` — cancel a running task - `tasks audit` — surface operational issues (stale, lost, delivery failures) -- `tasks maintenance` — preview or apply tasks and TaskFlow cleanup/reconciliation (ACP/subagent child sessions, active cron jobs, live CLI runs) +- `tasks maintenance [--apply] [--json]` — preview or apply tasks and TaskFlow cleanup/reconciliation (ACP/subagent child sessions, active cron jobs, live CLI runs) - `tasks flow list` — list active and recent Task Flow flows - `tasks flow show ` — inspect a flow by id or lookup key - `tasks flow cancel ` — cancel a running flow and its active tasks diff --git a/docs/reference/prompt-caching.md b/docs/reference/prompt-caching.md index dbfb5125009..55a9f5e2195 100644 --- a/docs/reference/prompt-caching.md +++ b/docs/reference/prompt-caching.md @@ -112,6 +112,13 @@ Per-agent heartbeat is supported at `agents.list[].heartbeat`. - OpenAI returns useful tracing and rate-limit headers such as `x-request-id`, `openai-processing-ms`, and `x-ratelimit-*`, but cache-hit accounting should come from the usage payload, not from headers. - In practice, OpenAI often behaves like an initial-prefix cache rather than Anthropic-style moving full-history reuse. Stable long-prefix text turns can land near a `4864` cached-token plateau in current live probes, while tool-heavy or MCP-style transcripts often plateau near `4608` cached tokens even on exact repeats. +### Anthropic Vertex + +- Anthropic models on Vertex AI (`anthropic-vertex/*`) support `cacheRetention` the same way as direct Anthropic. +- `cacheRetention: "long"` maps to the real 1-hour prompt-cache TTL on Vertex AI endpoints. +- Default cache retention for `anthropic-vertex` matches direct Anthropic defaults. +- Vertex requests are routed through boundary-aware cache shaping so cache reuse stays aligned with what providers actually receive. + ### Amazon Bedrock - Anthropic Claude model refs (`amazon-bedrock/*anthropic.claude*`) support explicit `cacheRetention` pass-through. @@ -136,12 +143,16 @@ If the provider does not support this cache mode, `cacheRetention` has no effect - Direct Gemini transport (`api: "google-generative-ai"`) reports cache hits through upstream `cachedContentTokenCount`; OpenClaw maps that to `cacheRead`. -- If you already have a Gemini cached-content handle, you can pass it through as +- When `cacheRetention` is set on a direct Gemini model, OpenClaw automatically + creates, reuses, and refreshes `cachedContents` resources for system prompts + on Google AI Studio runs. This means you no longer need to pre-create a + cached-content handle manually. +- You can still pass a pre-existing Gemini cached-content handle through as `params.cachedContent` (or legacy `params.cached_content`) on the configured model. -- This is separate from Anthropic/OpenAI prompt-prefix caching. OpenClaw is - forwarding a provider-native cached-content reference, not synthesizing cache - markers. +- This is separate from Anthropic/OpenAI prompt-prefix caching. For Gemini, + OpenClaw manages a provider-native `cachedContents` resource rather than + injecting cache markers into the request. ### Gemini CLI JSON usage @@ -152,6 +163,35 @@ If the provider does not support this cache mode, `cacheRetention` has no effect - This is usage normalization only. It does not mean OpenClaw is creating Anthropic/OpenAI-style prompt-cache markers for Gemini CLI. +## System-prompt cache boundary + +OpenClaw splits the system prompt into a **stable prefix** and a **volatile +suffix** separated by an internal cache-prefix boundary. Content above the +boundary (tool definitions, skills metadata, workspace files, and other +relatively static context) is ordered so it stays byte-identical across turns. +Content below the boundary (for example `HEARTBEAT.md`, runtime timestamps, and +other per-turn metadata) is allowed to change without invalidating the cached +prefix. + +Key design choices: + +- Stable workspace project-context files are ordered before `HEARTBEAT.md` so + heartbeat churn does not bust the stable prefix. +- The boundary is applied across Anthropic-family, OpenAI-family, Google, and + CLI transport shaping so all supported providers benefit from the same prefix + stability. +- Codex Responses and Anthropic Vertex requests are routed through + boundary-aware cache shaping so cache reuse stays aligned with what providers + actually receive. +- System-prompt fingerprints are normalized (whitespace, line endings, + hook-added context, runtime capability ordering) so semantically unchanged + prompts share KV/cache across turns. + +If you see unexpected `cacheWrite` spikes after a config or workspace change, +check whether the change lands above or below the cache boundary. Moving +volatile content below the boundary (or stabilizing it) often resolves the +issue. + ## OpenClaw cache-stability guards OpenClaw also keeps several cache-sensitive payload shapes deterministic before diff --git a/docs/tools/slash-commands.md b/docs/tools/slash-commands.md index aad8d9973e1..d430a5eb8ac 100644 --- a/docs/tools/slash-commands.md +++ b/docs/tools/slash-commands.md @@ -122,6 +122,7 @@ Text + native (when enabled): - `/model ` (alias: `/models`; or `/` from `agents.defaults.models.*.alias`) - `/queue ` (plus options like `debounce:2s cap:25 drop:summarize`; send `/queue` to see current settings) - `/bash ` (host-only; alias for `! `; requires `commands.bash: true` + `tools.elevated` allowlists) +- `/dreaming [off|core|rem|deep|status|help]` (toggle dreaming mode or show status; see [Dreaming](/concepts/memory-dreaming)) Text-only: