node-llama-cpp defaults contextSize to "auto", which on large embedding
models like Qwen3-Embedding-8B (trained context 40,960) inflates gateway
VRAM from ~8.8 GB to ~32 GB and causes OOM on single-GPU hosts that share
the gateway with an LLM runtime.
Expose memorySearch.local.contextSize in openclaw.json (number | "auto"),
default to 4096 which comfortably covers typical memory-search chunks
(128–512 tokens) while keeping non-weight VRAM bounded.
Closes#69667.
* Refine plugin debug plumbing
* Tighten plugin debug handling
* Reduce active memory overhead
* Abort active memory sidecar on timeout
* Rename active memory blocking subagent wording
* Fix active memory cache and recall selection
* Preserve active memory session scope
* Sanitize recalled context before retrieval
* Add active memory changelog entry
* Harden active memory debug and transcript handling
* Add active memory policy config
* Raise active memory timeout default
* Keep usage footer on primary reply
* Clear stale active memory status lines
* Match legacy active memory status prefixes
* Preserve numeric active memory bullets
* Reuse canonical session keys for active memory
* Let active memory subagent decide relevance
* Refine active memory plugin summary flow
* Fix active memory main-session DM detection
* Trim active memory summaries at word boundaries
* Add active memory prompt styles
* Fix active memory stale status cleanup
* Rename active memory subagent wording
* Add active memory prompt and thinking overrides
* Remove active memory legacy status compat
* Resolve active memory session id status
* Add active memory session toggle
* Add active memory global toggle
* Fix active memory toggle state handling
* Harden active memory transcript persistence
* Fix active memory chat type gating
* Scope active memory transcripts by agent
* Show plugin debug before replies