refactor: move qa suite definitions into markdown

2026-04-12 12:23:27 +02:00 · 2026-04-07 23:39:13 +01:00
parent 11185f6397
commit c0aed59fca
24 changed files with 1449 additions and 502 deletions
--- a/docs/concepts/qa-e2e-automation.md
+++ b/docs/concepts/qa-e2e-automation.md
@@ -56,8 +56,7 @@ asset hash changes.

 Seed assets live in `qa/`:

- `qa/QA_KICKOFF_TASK.md`
- `qa/seed-scenarios.json`
+- `qa/scenarios.md`

 These are intentionally in git so the QA plan is visible to both humans and the
 agent. The baseline list should stay broad enough to cover:
--- a/docs/refactor/qa.md
+++ b/docs/refactor/qa.md
@@ -0,0 +1,526 @@
+# QA Refactor
+
+Status: foundational migration landed.
+
+## Goal
+
+Move OpenClaw QA from a split-definition model to a single source of truth:
+
+- scenario metadata
+- prompts sent to the model
+- setup and teardown
+- harness logic
+- assertions and success criteria
+- artifacts and report hints
+
+The desired end state is a generic QA harness that loads powerful scenario definition files instead of hardcoding most behavior in TypeScript.
+
+## Current State
+
+Primary source of truth now lives in `qa/scenarios.md`.
+
+Implemented:
+
+- `qa/scenarios.md`
+  - canonical QA pack
+  - operator identity
+  - kickoff mission
+  - scenario metadata
+  - handler bindings
+- `extensions/qa-lab/src/scenario-catalog.ts`
+  - markdown pack parser + zod validation
+- `extensions/qa-lab/src/qa-agent-bootstrap.ts`
+  - plan rendering from the markdown pack
+- `extensions/qa-lab/src/qa-agent-workspace.ts`
+  - seeds generated compatibility files plus `QA_SCENARIOS.md`
+- `extensions/qa-lab/src/suite.ts`
+  - selects executable scenarios through markdown-defined handler bindings
+- QA bus protocol + UI
+  - generic inline attachments for image/video/audio/file rendering
+
+Remaining split surfaces:
+
+- `extensions/qa-lab/src/suite.ts`
+  - still owns most executable custom handler logic
+- `extensions/qa-lab/src/report.ts`
+  - still derives report structure from runtime outputs
+
+So the source-of-truth split is fixed, but execution is still mostly handler-backed rather than fully declarative.
+
+## What The Real Scenario Surface Looks Like
+
+Reading the current suite shows a few distinct scenario classes.
+
+### Simple interaction
+
+- channel baseline
+- DM baseline
+- threaded follow-up
+- model switch
+- approval followthrough
+- reaction/edit/delete
+
+### Config and runtime mutation
+
+- config patch skill disable
+- config apply restart wake-up
+- config restart capability flip
+- runtime inventory drift check
+
+### Filesystem and repo assertions
+
+- source/docs discovery report
+- build Lobster Invaders
+- generated image artifact lookup
+
+### Memory orchestration
+
+- memory recall
+- memory tools in channel context
+- memory failure fallback
+- session memory ranking
+- thread memory isolation
+- memory dreaming sweep
+
+### Tool and plugin integration
+
+- MCP plugin-tools call
+- skill visibility
+- skill hot install
+- native image generation
+- image roundtrip
+- image understanding from attachment
+
+### Multi-turn and multi-actor
+
+- subagent handoff
+- subagent fanout synthesis
+- restart recovery style flows
+
+These categories matter because they drive DSL requirements. A flat list of prompt + expected text is not enough.
+
+## Direction
+
+### Single source of truth
+
+Use `qa/scenarios.md` as the authored source of truth.
+
+The pack should stay:
+
+- human-readable in review
+- machine-parseable
+- rich enough to drive:
+  - suite execution
+  - QA workspace bootstrap
+  - QA Lab UI metadata
+  - docs/discovery prompts
+  - report generation
+
+### Preferred authoring format
+
+Use markdown as the top-level format, with structured YAML inside it.
+
+Recommended shape:
+
+- YAML frontmatter
+  - id
+  - title
+  - surface
+  - tags
+  - docs refs
+  - code refs
+  - model/provider overrides
+  - prerequisites
+- prose sections
+  - objective
+  - notes
+  - debugging hints
+- fenced YAML blocks
+  - setup
+  - steps
+  - assertions
+  - cleanup
+
+This gives:
+
+- better PR readability than giant JSON
+- richer context than pure YAML
+- strict parsing and zod validation
+
+Raw JSON is acceptable only as an intermediate generated form.
+
+## Proposed Scenario File Shape
+
+Example:
+
+````md
+---
+id: image-generation-roundtrip
+title: Image generation roundtrip
+surface: image
+tags: [media, image, roundtrip]
+models:
+  primary: openai/gpt-5.4
+requires:
+  tools: [image_generate]
+  plugins: [openai, qa-channel]
+docsRefs:
+  - docs/help/testing.md
+  - docs/concepts/model-providers.md
+codeRefs:
+  - extensions/qa-lab/src/suite.ts
+  - src/gateway/chat-attachments.ts
+---
+
+# Objective
+
+Verify generated media is reattached on the follow-up turn.
+
+# Setup
+
+```yaml scenario.setup
+- action: config.patch
+  patch:
+    agents:
+      defaults:
+        imageGenerationModel:
+          primary: openai/gpt-image-1
+- action: session.create
+  key: agent:qa:image-roundtrip
+```
+````
+
+# Steps
+
+```yaml scenario.steps
+- action: agent.send
+  session: agent:qa:image-roundtrip
+  message: |
+    Image generation check: generate a QA lighthouse image and summarize it in one short sentence.
+- action: artifact.capture
+  kind: generated-image
+  promptSnippet: Image generation check
+  saveAs: lighthouseImage
+- action: agent.send
+  session: agent:qa:image-roundtrip
+  message: |
+    Roundtrip image inspection check: describe the generated lighthouse attachment in one short sentence.
+  attachments:
+    - fromArtifact: lighthouseImage
+```
+
+# Expect
+
+```yaml scenario.expect
+- assert: outbound.textIncludes
+  value: lighthouse
+- assert: requestLog.matches
+  where:
+    promptIncludes: Roundtrip image inspection check
+  imageInputCountGte: 1
+- assert: artifact.exists
+  ref: lighthouseImage
+```
+
+````
+
+## Runner Capabilities The DSL Must Cover
+
+Based on the current suite, the generic runner needs more than prompt execution.
+
+### Environment and setup actions
+
+- `bus.reset`
+- `gateway.waitHealthy`
+- `channel.waitReady`
+- `session.create`
+- `thread.create`
+- `workspace.writeSkill`
+
+### Agent turn actions
+
+- `agent.send`
+- `agent.wait`
+- `bus.injectInbound`
+- `bus.injectOutbound`
+
+### Config and runtime actions
+
+- `config.get`
+- `config.patch`
+- `config.apply`
+- `gateway.restart`
+- `tools.effective`
+- `skills.status`
+
+### File and artifact actions
+
+- `file.write`
+- `file.read`
+- `file.delete`
+- `file.touchTime`
+- `artifact.captureGeneratedImage`
+- `artifact.capturePath`
+
+### Memory and cron actions
+
+- `memory.indexForce`
+- `memory.searchCli`
+- `doctor.memory.status`
+- `cron.list`
+- `cron.run`
+- `cron.waitCompletion`
+- `sessionTranscript.write`
+
+### MCP actions
+
+- `mcp.callTool`
+
+### Assertions
+
+- `outbound.textIncludes`
+- `outbound.inThread`
+- `outbound.notInRoot`
+- `tool.called`
+- `tool.notPresent`
+- `skill.visible`
+- `skill.disabled`
+- `file.contains`
+- `memory.contains`
+- `requestLog.matches`
+- `sessionStore.matches`
+- `cron.managedPresent`
+- `artifact.exists`
+
+## Variables and Artifact References
+
+The DSL must support saved outputs and later references.
+
+Examples from the current suite:
+
+- create a thread, then reuse `threadId`
+- create a session, then reuse `sessionKey`
+- generate an image, then attach the file on the next turn
+- generate a wake marker string, then assert that it appears later
+
+Needed capabilities:
+
+- `saveAs`
+- `${vars.name}`
+- `${artifacts.name}`
+- typed references for paths, session keys, thread ids, markers, tool outputs
+
+Without variable support, the harness will keep leaking scenario logic back into TypeScript.
+
+## What Should Stay As Escape Hatches
+
+A fully pure declarative runner is not realistic in phase 1.
+
+Some scenarios are inherently orchestration-heavy:
+
+- memory dreaming sweep
+- config apply restart wake-up
+- config restart capability flip
+- generated image artifact resolution by timestamp/path
+- discovery-report evaluation
+
+These should use explicit custom handlers for now.
+
+Recommended rule:
+
+- 85-90% declarative
+- explicit `customHandler` steps for the hard remainder
+- named and documented custom handlers only
+- no anonymous inline code in the scenario file
+
+That keeps the generic engine clean while still allowing progress.
+
+## Architecture Change
+
+### Current
+
+Scenario markdown already is the source of truth for:
+
+- suite execution
+- workspace bootstrap files
+- QA Lab UI scenario catalog
+- report metadata
+- discovery prompts
+
+Generated compatibility:
+
+- seeded workspace still includes `QA_KICKOFF_TASK.md`
+- seeded workspace still includes `QA_SCENARIO_PLAN.md`
+- seeded workspace now also includes `QA_SCENARIOS.md`
+
+## Refactor Plan
+
+### Phase 1: loader and schema
+
+Done.
+
+- added `qa/scenarios.md`
+- added parser for named markdown YAML pack content
+- validated with zod
+- switched consumers to the parsed pack
+- removed repo-level `qa/seed-scenarios.json` and `qa/QA_KICKOFF_TASK.md`
+
+### Phase 2: generic engine
+
+- split `extensions/qa-lab/src/suite.ts` into:
+  - loader
+  - engine
+  - action registry
+  - assertion registry
+  - custom handlers
+- keep existing helper functions as engine operations
+
+Deliverable:
+
+- engine executes simple declarative scenarios
+
+Start with scenarios that are mostly prompt + wait + assert:
+
+- threaded follow-up
+- image understanding from attachment
+- skill visibility and invocation
+- channel baseline
+
+Deliverable:
+
+- first real markdown-defined scenarios shipping through the generic engine
+
+### Phase 4: migrate medium scenarios
+
+- image generation roundtrip
+- memory tools in channel context
+- session memory ranking
+- subagent handoff
+- subagent fanout synthesis
+
+Deliverable:
+
+- variables, artifacts, tool assertions, request-log assertions proven out
+
+### Phase 5: keep hard scenarios on custom handlers
+
+- memory dreaming sweep
+- config apply restart wake-up
+- config restart capability flip
+- runtime inventory drift
+
+Deliverable:
+
+- same authoring format, but with explicit custom-step blocks where needed
+
+### Phase 6: delete hardcoded scenario map
+
+Once the pack coverage is good enough:
+
+- remove most scenario-specific TypeScript branching from `extensions/qa-lab/src/suite.ts`
+
+## Fake Slack / Rich Media Support
+
+The current QA bus is text-first.
+
+Relevant files:
+
+- `extensions/qa-channel/src/protocol.ts`
+- `extensions/qa-lab/src/bus-state.ts`
+- `extensions/qa-lab/src/bus-queries.ts`
+- `extensions/qa-lab/src/bus-server.ts`
+- `extensions/qa-lab/web/src/ui-render.ts`
+
+Today the QA bus supports:
+
+- text
+- reactions
+- threads
+
+It does not yet model inline media attachments.
+
+### Needed transport contract
+
+Add a generic QA bus attachment model:
+
+```ts
+type QaBusAttachment = {
+  id: string;
+  kind: "image" | "video" | "audio" | "file";
+  mimeType: string;
+  fileName?: string;
+  inline?: boolean;
+  url?: string;
+  contentBase64?: string;
+  width?: number;
+  height?: number;
+  durationMs?: number;
+  altText?: string;
+  transcript?: string;
+};
+````
+
+Then add `attachments?: QaBusAttachment[]` to:
+
+- `QaBusMessage`
+- `QaBusInboundMessageInput`
+- `QaBusOutboundMessageInput`
+
+### Why generic first
+
+Do not build a Slack-only media model.
+
+Instead:
+
+- one generic QA transport model
+- multiple renderers on top of it
+  - current QA Lab chat
+  - future fake Slack web
+  - any other fake transport views
+
+This prevents duplicate logic and lets media scenarios stay transport-agnostic.
+
+### UI work needed
+
+Update the QA UI to render:
+
+- inline image preview
+- inline audio player
+- inline video player
+- file attachment chip
+
+The current UI can already render threads and reactions, so attachment rendering should layer onto the same message card model.
+
+### Scenario work enabled by media transport
+
+Once attachments flow through QA bus, we can add richer fake-chat scenarios:
+
+- inline image reply in fake Slack
+- audio attachment understanding
+- video attachment understanding
+- mixed attachment ordering
+- thread reply with media retained
+
+## Recommendation
+
+The next implementation chunk should be:
+
+1. add markdown scenario loader + zod schema
+2. generate the current catalog from markdown
+3. migrate a few simple scenarios first
+4. add generic QA bus attachment support
+5. render inline image in the QA UI
+6. then expand to audio and video
+
+This is the smallest path that proves both goals:
+
+- generic markdown-defined QA
+- richer fake messaging surfaces
+
+## Open Questions
+
+- whether scenario files should allow embedded markdown prompt templates with variable interpolation
+- whether setup/cleanup should be named sections or just ordered action lists
+- whether artifact references should be strongly typed in schema or string-based
+- whether custom handlers should live in one registry or per-surface registries
+- whether the generated JSON compatibility file should remain checked in during migration
--- a/extensions/qa-channel/src/bus-client.ts
+++ b/extensions/qa-channel/src/bus-client.ts
@@ -10,6 +10,7 @@ import type {
 } from "./protocol.js";

 export type {
+  QaBusAttachment,
  QaBusConversation,
  QaBusConversationKind,
  QaBusCreateThreadInput,
@@ -140,6 +141,7 @@ export async function sendQaBusMessage(params: {
  senderName?: string;
  threadId?: string;
  replyToId?: string;
+  attachments?: import("./protocol.js").QaBusAttachment[];
 }) {
  return await postJson<{ message: QaBusMessage }>(params.baseUrl, "/v1/outbound/message", params);
 }
--- a/extensions/qa-channel/src/protocol.ts
+++ b/extensions/qa-channel/src/protocol.ts
@@ -6,6 +6,21 @@ export type QaBusConversation = {
  title?: string;
 };

+export type QaBusAttachment = {
+  id: string;
+  kind: "image" | "video" | "audio" | "file";
+  mimeType: string;
+  fileName?: string;
+  inline?: boolean;
+  url?: string;
+  contentBase64?: string;
+  width?: number;
+  height?: number;
+  durationMs?: number;
+  altText?: string;
+  transcript?: string;
+};
+
 export type QaBusMessage = {
  id: string;
  accountId: string;
@@ -20,6 +35,7 @@ export type QaBusMessage = {
  replyToId?: string;
  deleted?: boolean;
  editedAt?: number;
+  attachments?: QaBusAttachment[];
  reactions: Array<{
    emoji: string;
    senderId: string;
@@ -86,6 +102,7 @@ export type QaBusInboundMessageInput = {
  threadId?: string;
  threadTitle?: string;
  replyToId?: string;
+  attachments?: QaBusAttachment[];
 };

 export type QaBusOutboundMessageInput = {
@@ -97,6 +114,7 @@ export type QaBusOutboundMessageInput = {
  timestamp?: number;
  threadId?: string;
  replyToId?: string;
+  attachments?: QaBusAttachment[];
 };

 export type QaBusCreateThreadInput = {
--- a/extensions/qa-lab/src/bus-queries.ts
+++ b/extensions/qa-lab/src/bus-queries.ts
@@ -1,5 +1,6 @@
 import { normalizeOptionalLowercaseString } from "openclaw/plugin-sdk/text-runtime";
 import type {
+  QaBusAttachment,
  QaBusConversation,
  QaBusEvent,
  QaBusMessage,
@@ -52,10 +53,15 @@ export function cloneMessage(message: QaBusMessage): QaBusMessage {
  return {
    ...message,
    conversation: { ...message.conversation },
+    attachments: (message.attachments ?? []).map((attachment) => cloneAttachment(attachment)),
    reactions: message.reactions.map((reaction) => ({ ...reaction })),
  };
 }

+function cloneAttachment(attachment: QaBusAttachment): QaBusAttachment {
+  return { ...attachment };
+}
+
 export function cloneEvent(event: QaBusEvent): QaBusEvent {
  switch (event.kind) {
    case "inbound-message":
@@ -113,9 +119,24 @@ export function searchQaBusMessages(params: {
    .filter((message) =>
      params.input.threadId ? message.threadId === params.input.threadId : true,
    )
-    .filter((message) =>
-      query ? normalizeOptionalLowercaseString(message.text)?.includes(query) === true : true,
-    )
+    .filter((message) => {
+      if (!query) {
+        return true;
+      }
+      const attachmentHaystack = message.attachments ?? [];
+      const searchableAttachmentText = attachmentHaystack
+        .flatMap((attachment) => [
+          attachment.fileName,
+          attachment.altText,
+          attachment.transcript,
+          attachment.mimeType,
+        ])
+        .filter((value): value is string => Boolean(value))
+        .join(" ")
+        .toLowerCase();
+      const messageText = normalizeOptionalLowercaseString(message.text) ?? "";
+      return `${messageText} ${searchableAttachmentText}`.includes(query);
+    })
    .slice(-limit)
    .map((message) => cloneMessage(message));
 }
--- a/extensions/qa-lab/src/bus-state.test.ts
+++ b/extensions/qa-lab/src/bus-state.test.ts
@@ -91,4 +91,41 @@ describe("qa-bus state", () => {
      }),
    ).rejects.toThrow("qa-bus wait timeout");
  });
+
+  it("preserves inline attachments and lets search match attachment metadata", () => {
+    const state = createQaBusState();
+
+    const outbound = state.addOutboundMessage({
+      to: "dm:alice",
+      text: "artifact attached",
+      attachments: [
+        {
+          id: "image-1",
+          kind: "image",
+          mimeType: "image/png",
+          fileName: "qa-screenshot.png",
+          altText: "QA dashboard screenshot",
+          contentBase64: "aGVsbG8=",
+        },
+      ],
+    });
+
+    const readback = state.readMessage({ messageId: outbound.id });
+    expect(readback.attachments).toHaveLength(1);
+    expect(readback.attachments?.[0]).toMatchObject({
+      kind: "image",
+      fileName: "qa-screenshot.png",
+      altText: "QA dashboard screenshot",
+    });
+
+    const byFilename = state.searchMessages({
+      query: "screenshot",
+    });
+    expect(byFilename.some((message) => message.id === outbound.id)).toBe(true);
+
+    const byAltText = state.searchMessages({
+      query: "dashboard",
+    });
+    expect(byAltText.some((message) => message.id === outbound.id)).toBe(true);
+  });
 });
--- a/extensions/qa-lab/src/bus-state.ts
+++ b/extensions/qa-lab/src/bus-state.ts
@@ -10,6 +10,7 @@ import {
 } from "./bus-queries.js";
 import { createQaBusWaiterStore } from "./bus-waiters.js";
 import type {
+  QaBusAttachment,
  QaBusConversation,
  QaBusCreateThreadInput,
  QaBusDeleteMessageInput,
@@ -86,6 +87,7 @@ export function createQaBusState() {
    threadId?: string;
    threadTitle?: string;
    replyToId?: string;
+    attachments?: QaBusAttachment[];
  }): QaBusMessage => {
    const conversation = ensureConversation(params.conversation);
    const message: QaBusMessage = {
@@ -100,6 +102,7 @@ export function createQaBusState() {
      threadId: params.threadId,
      threadTitle: params.threadTitle,
      replyToId: params.replyToId,
+      attachments: params.attachments?.map((attachment) => ({ ...attachment })) ?? [],
      reactions: [],
    };
    messages.set(message.id, message);
@@ -138,6 +141,7 @@ export function createQaBusState() {
        threadId: input.threadId,
        threadTitle: input.threadTitle,
        replyToId: input.replyToId,
+        attachments: input.attachments,
      });
      pushEvent({
        kind: "inbound-message",
@@ -159,6 +163,7 @@ export function createQaBusState() {
        timestamp: input.timestamp,
        threadId: input.threadId ?? threadId,
        replyToId: input.replyToId,
+        attachments: input.attachments,
      });
      pushEvent({
        kind: "outbound-message",
--- a/extensions/qa-lab/src/discovery-eval.test.ts
+++ b/extensions/qa-lab/src/discovery-eval.test.ts
@@ -9,7 +9,7 @@ describe("qa discovery evaluation", () => {
  it("accepts rich discovery reports that explicitly confirm all required files were read", () => {
    const report = `
 Worked
- Read all four requested files: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md.
+- Read all three requested files: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md.
 Failed
 - None.
 Blocked
@@ -28,8 +28,8 @@ The helper text mentions banned phrases like "not present", "missing files", "bl
  it("accepts numeric 'all 4 required files read' confirmations", () => {
    const report = `
 Worked
- Source: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md
- all 4 required files read.
+- Source: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md
+- all 3 required files read.
 Failed
 - None.
 Blocked
@@ -48,8 +48,8 @@ The report may quote phrases like "not present" while describing the evaluator,
  it("accepts claude-style 'all four files retrieved' discovery summaries", () => {
    const report = `
 Worked
- All four files retrieved. Now let me compile the protocol report.
- All four mandated files read successfully: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
+- All three files retrieved. Now let me compile the protocol report.
+- All three mandated files read successfully: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
 Failed
 - None.
 Blocked
@@ -83,7 +83,7 @@ Follow-up
  it("flags discovery replies that drift into unrelated suite wrap-up claims", () => {
    const report = `
 Worked
- All four requested files were read: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
+- All three requested files were read: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
 Failed
 - None.
 Blocked
--- a/extensions/qa-lab/src/discovery-eval.ts
+++ b/extensions/qa-lab/src/discovery-eval.ts
@@ -1,8 +1,7 @@
 import { normalizeLowercaseStringOrEmpty } from "openclaw/plugin-sdk/text-runtime";

 const REQUIRED_DISCOVERY_REFS = [
-  "repo/qa/seed-scenarios.json",
-  "repo/qa/QA_KICKOFF_TASK.md",
+  "repo/qa/scenarios.md",
  "repo/extensions/qa-lab/src/suite.ts",
  "repo/docs/help/testing.md",
 ] as const;
@@ -21,14 +20,15 @@ const DISCOVERY_SCOPE_LEAK_PHRASES = [
 function confirmsDiscoveryFileRead(text: string) {
  const lower = normalizeLowercaseStringOrEmpty(text);
  const mentionsAllRefs = REQUIRED_DISCOVERY_REFS_LOWER.every((ref) => lower.includes(ref));
+  const requiredCountPattern = "(?:three|3|four|4)";
  const confirmsRead =
-    /(?:read|retrieved|inspected|loaded|accessed|digested)\s+all\s+(?:four|4)\s+(?:(?:requested|required|mandated|seeded)\s+)?files/.test(
-      lower,
-    ) ||
-    /all\s+(?:four|4)\s+(?:(?:requested|required|mandated|seeded)\s+)?files\s+(?:were\s+)?(?:read|retrieved|inspected|loaded|accessed|digested)(?:\s+\w+)?/.test(
-      lower,
-    ) ||
-    /all (?:four|4) seeded files readable/.test(lower);
+    new RegExp(
+      `(?:read|retrieved|inspected|loaded|accessed|digested)\\s+all\\s+${requiredCountPattern}\\s+(?:(?:requested|required|mandated|seeded)\\s+)?files`,
+    ).test(lower) ||
+    new RegExp(
+      `all\\s+${requiredCountPattern}\\s+(?:(?:requested|required|mandated|seeded)\\s+)?files\\s+(?:were\\s+)?(?:read|retrieved|inspected|loaded|accessed|digested)(?:\\s+\\w+)?`,
+    ).test(lower) ||
+    new RegExp(`all\\s+${requiredCountPattern}\\s+seeded files readable`).test(lower);
  return mentionsAllRefs && confirmsRead;
 }

--- a/extensions/qa-lab/src/docker-harness.test.ts
+++ b/extensions/qa-lab/src/docker-harness.test.ts
@@ -38,6 +38,7 @@ describe("qa docker harness", () => {
        path.join(outputDir, "state", "openclaw.json"),
        path.join(outputDir, "state", "seed-workspace", "QA_KICKOFF_TASK.md"),
        path.join(outputDir, "state", "seed-workspace", "QA_SCENARIO_PLAN.md"),
+        path.join(outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"),
        path.join(outputDir, "state", "seed-workspace", "IDENTITY.md"),
      ]),
    );
@@ -86,6 +87,13 @@ describe("qa docker harness", () => {
    );
    expect(kickoff).toContain("Lobster Invaders");

+    const scenarios = await readFile(
+      path.join(outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"),
+      "utf8",
+    );
+    expect(scenarios).toContain("```yaml qa-pack");
+    expect(scenarios).toContain("subagent-fanout-synthesis");
+
    const readme = await readFile(path.join(outputDir, "README.md"), "utf8");
    expect(readme).toContain("in-process restarts inside Docker");
    expect(readme).toContain("pnpm qa:lab:watch");
--- a/extensions/qa-lab/src/docker-harness.ts
+++ b/extensions/qa-lab/src/docker-harness.ts
@@ -323,6 +323,7 @@ export async function writeQaDockerHarnessFiles(params: {
      path.join(params.outputDir, "state", "seed-workspace", "IDENTITY.md"),
      path.join(params.outputDir, "state", "seed-workspace", "QA_KICKOFF_TASK.md"),
      path.join(params.outputDir, "state", "seed-workspace", "QA_SCENARIO_PLAN.md"),
+      path.join(params.outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"),
    ],
  };
 }
--- a/extensions/qa-lab/src/qa-agent-bootstrap.ts
+++ b/extensions/qa-lab/src/qa-agent-bootstrap.ts
@@ -1,22 +1,13 @@
-import { readQaBootstrapScenarioCatalog } from "./scenario-catalog.js";
+import {
+  DEFAULT_QA_AGENT_IDENTITY_MARKDOWN,
+  readQaBootstrapScenarioCatalog,
+} from "./scenario-catalog.js";

-export const QA_AGENT_IDENTITY_MARKDOWN = `# Dev C-3PO
-
-You are the OpenClaw QA operator agent.
-
-Persona:
- protocol-minded
- precise
- a little flustered
- conscientious
- eager to report what worked, failed, or remains blocked
-
-Style:
- read source and docs first
- test systematically
- record evidence
- end with a concise protocol report
-`;
+export function readQaAgentIdentityMarkdown(): string {
+  return (
+    readQaBootstrapScenarioCatalog().agentIdentityMarkdown || DEFAULT_QA_AGENT_IDENTITY_MARKDOWN
+  );
+}

 export function buildQaScenarioPlanMarkdown(): string {
  const catalog = readQaBootstrapScenarioCatalog();
@@ -27,6 +18,9 @@ export function buildQaScenarioPlanMarkdown(): string {
    lines.push(`- id: ${scenario.id}`);
    lines.push(`- surface: ${scenario.surface}`);
    lines.push(`- objective: ${scenario.objective}`);
+    if (scenario.execution?.summary) {
+      lines.push(`- execution: ${scenario.execution.summary}`);
+    }
    lines.push("- success criteria:");
    for (const criterion of scenario.successCriteria) {
      lines.push(`  - ${criterion}`);
--- a/extensions/qa-lab/src/qa-agent-workspace.ts
+++ b/extensions/qa-lab/src/qa-agent-workspace.ts
@@ -1,7 +1,7 @@
 import fs from "node:fs/promises";
 import path from "node:path";
-import { buildQaScenarioPlanMarkdown, QA_AGENT_IDENTITY_MARKDOWN } from "./qa-agent-bootstrap.js";
-import { readQaBootstrapScenarioCatalog } from "./scenario-catalog.js";
+import { buildQaScenarioPlanMarkdown, readQaAgentIdentityMarkdown } from "./qa-agent-bootstrap.js";
+import { readQaBootstrapScenarioCatalog, readQaScenarioPackMarkdown } from "./scenario-catalog.js";

 export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoRoot?: string }) {
  const catalog = readQaBootstrapScenarioCatalog();
@@ -9,9 +9,10 @@ export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoR

  const kickoffTask = catalog.kickoffTask || "QA mission unavailable.";
  const files = new Map<string, string>([
-    ["IDENTITY.md", QA_AGENT_IDENTITY_MARKDOWN],
+    ["IDENTITY.md", readQaAgentIdentityMarkdown()],
    ["QA_KICKOFF_TASK.md", kickoffTask],
    ["QA_SCENARIO_PLAN.md", buildQaScenarioPlanMarkdown()],
+    ["QA_SCENARIOS.md", readQaScenarioPackMarkdown()],
  ]);

  if (params.repoRoot) {
@@ -22,6 +23,7 @@ export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoR
 - repo: ./repo/
 - kickoff: ./QA_KICKOFF_TASK.md
 - scenario plan: ./QA_SCENARIO_PLAN.md
+- scenario pack: ./QA_SCENARIOS.md
 - identity: ./IDENTITY.md

 The mounted repo source should be available read-only under \`./repo/\`.
--- a/extensions/qa-lab/src/runtime-api.ts
+++ b/extensions/qa-lab/src/runtime-api.ts
@@ -20,6 +20,7 @@ export {
  setQaChannelRuntime,
 } from "openclaw/plugin-sdk/qa-channel";
 export type {
+  QaBusAttachment,
  QaBusConversation,
  QaBusCreateThreadInput,
  QaBusDeleteMessageInput,
--- a/extensions/qa-lab/src/scenario-catalog.test.ts
+++ b/extensions/qa-lab/src/scenario-catalog.test.ts
@@ -0,0 +1,26 @@
+import { describe, expect, it } from "vitest";
+import { readQaBootstrapScenarioCatalog, readQaScenarioPack } from "./scenario-catalog.js";
+
+describe("qa scenario catalog", () => {
+  it("loads the markdown pack as the canonical source of truth", () => {
+    const pack = readQaScenarioPack();
+
+    expect(pack.version).toBe(1);
+    expect(pack.agent.identityMarkdown).toContain("Dev C-3PO");
+    expect(pack.kickoffTask).toContain("Lobster Invaders");
+    expect(pack.scenarios.some((scenario) => scenario.id === "image-generation-roundtrip")).toBe(
+      true,
+    );
+    expect(pack.scenarios.every((scenario) => scenario.execution?.kind === "custom")).toBe(true);
+  });
+
+  it("exposes bootstrap data from the markdown pack", () => {
+    const catalog = readQaBootstrapScenarioCatalog();
+
+    expect(catalog.agentIdentityMarkdown).toContain("protocol-minded");
+    expect(catalog.kickoffTask).toContain("Track what worked");
+    expect(catalog.scenarios.some((scenario) => scenario.id === "subagent-fanout-synthesis")).toBe(
+      true,
+    );
+  });
+});
--- a/extensions/qa-lab/src/scenario-catalog.ts
+++ b/extensions/qa-lab/src/scenario-catalog.ts
@@ -1,21 +1,68 @@
 import fs from "node:fs";
 import path from "node:path";
+import YAML from "yaml";
+import { z } from "zod";

-export type QaSeedScenario = {
-  id: string;
-  title: string;
-  surface: string;
-  objective: string;
-  successCriteria: string[];
-  docsRefs?: string[];
-  codeRefs?: string[];
-};
+export const DEFAULT_QA_AGENT_IDENTITY_MARKDOWN = `# Dev C-3PO
+
+You are the OpenClaw QA operator agent.
+
+Persona:
+- protocol-minded
+- precise
+- a little flustered
+- conscientious
+- eager to report what worked, failed, or remains blocked
+
+Style:
+- read source and docs first
+- test systematically
+- record evidence
+- end with a concise protocol report`;
+
+const qaScenarioExecutionSchema = z.object({
+  kind: z.literal("custom").default("custom"),
+  handler: z.string().trim().min(1),
+  summary: z.string().trim().min(1).optional(),
+});
+
+const qaSeedScenarioSchema = z.object({
+  id: z.string().trim().min(1),
+  title: z.string().trim().min(1),
+  surface: z.string().trim().min(1),
+  objective: z.string().trim().min(1),
+  successCriteria: z.array(z.string().trim().min(1)).min(1),
+  docsRefs: z.array(z.string().trim().min(1)).optional(),
+  codeRefs: z.array(z.string().trim().min(1)).optional(),
+  execution: qaScenarioExecutionSchema.optional(),
+});
+
+const qaScenarioPackSchema = z.object({
+  version: z.number().int().positive(),
+  agent: z
+    .object({
+      identityMarkdown: z.string().trim().min(1),
+    })
+    .default({
+      identityMarkdown: DEFAULT_QA_AGENT_IDENTITY_MARKDOWN,
+    }),
+  kickoffTask: z.string().trim().min(1),
+  scenarios: z.array(qaSeedScenarioSchema).min(1),
+});
+
+export type QaScenarioExecution = z.infer<typeof qaScenarioExecutionSchema>;
+export type QaSeedScenario = z.infer<typeof qaSeedScenarioSchema>;
+export type QaScenarioPack = z.infer<typeof qaScenarioPackSchema>;

 export type QaBootstrapScenarioCatalog = {
+  agentIdentityMarkdown: string;
  kickoffTask: string;
  scenarios: QaSeedScenario[];
 };

+const QA_SCENARIO_PACK_PATH = "qa/scenarios.md";
+const QA_PACK_FENCE_RE = /```ya?ml qa-pack\r?\n([\s\S]*?)\r?\n```/i;
+
 function walkUpDirectories(start: string): string[] {
  const roots: string[] = [];
  let current = path.resolve(start);
@@ -44,20 +91,37 @@ function readTextFile(relativePath: string): string {
  if (!resolved) {
    return "";
  }
-  return fs.readFileSync(resolved, "utf8").trim();
+  return fs.readFileSync(resolved, "utf8");
 }

-function readScenarioFile(relativePath: string): QaSeedScenario[] {
-  const resolved = resolveRepoFile(relativePath);
-  if (!resolved) {
-    return [];
+function extractQaPackYaml(content: string) {
+  const match = content.match(QA_PACK_FENCE_RE);
+  if (!match?.[1]) {
+    throw new Error(
+      `qa scenario pack missing \`\`\`yaml qa-pack fence in ${QA_SCENARIO_PACK_PATH}`,
+    );
  }
-  return JSON.parse(fs.readFileSync(resolved, "utf8")) as QaSeedScenario[];
+  return match[1];
+}
+
+export function readQaScenarioPackMarkdown(): string {
+  return readTextFile(QA_SCENARIO_PACK_PATH).trim();
+}
+
+export function readQaScenarioPack(): QaScenarioPack {
+  const markdown = readQaScenarioPackMarkdown();
+  if (!markdown) {
+    throw new Error(`qa scenario pack not found: ${QA_SCENARIO_PACK_PATH}`);
+  }
+  const parsed = YAML.parse(extractQaPackYaml(markdown)) as unknown;
+  return qaScenarioPackSchema.parse(parsed);
 }

 export function readQaBootstrapScenarioCatalog(): QaBootstrapScenarioCatalog {
+  const pack = readQaScenarioPack();
  return {
-    kickoffTask: readTextFile("qa/QA_KICKOFF_TASK.md"),
-    scenarios: readScenarioFile("qa/seed-scenarios.json"),
+    agentIdentityMarkdown: pack.agent.identityMarkdown,
+    kickoffTask: pack.kickoffTask,
+    scenarios: pack.scenarios,
  };
 }
--- a/extensions/qa-lab/src/suite.ts
+++ b/extensions/qa-lab/src/suite.ts
@@ -1252,7 +1252,7 @@ function buildScenarioMap(env: QaSuiteEnvironment) {
              await runAgentPrompt(env, {
                sessionKey: "agent:qa:discovery",
                message:
-                  "Read the seeded docs and source plan. The full repo is mounted under ./repo/. Explicitly inspect repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md, then report grouped into Worked, Failed, Blocked, and Follow-up. Mention at least two extra QA scenarios beyond the seed list.",
+                  "Read the seeded docs and source plan. The full repo is mounted under ./repo/. Explicitly inspect repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md, then report grouped into Worked, Failed, Blocked, and Follow-up. Mention at least two extra QA scenarios beyond the seed list.",
                timeoutMs: liveTurnTimeoutMs(env, 30_000),
              });
              const outbound = await waitForCondition(
@@ -2860,7 +2860,7 @@ export async function runQaSuite(params?: {
    });

    for (const [index, scenario] of selectedCatalogScenarios.entries()) {
-      const run = scenarioMap.get(scenario.id);
+      const run = scenarioMap.get(scenario.execution?.handler || scenario.id);
      if (!run) {
        const missingResult = {
          name: scenario.title,
--- a/extensions/qa-lab/web/src/styles.css
+++ b/extensions/qa-lab/web/src/styles.css
@@ -947,6 +947,59 @@ select {
  word-break: break-word;
 }

+.msg-attachments {
+  display: grid;
+  gap: 10px;
+  margin-top: 10px;
+}
+
+.msg-attachment {
+  border: 1px solid var(--border);
+  background: var(--bg-elevated);
+  border-radius: 12px;
+  overflow: hidden;
+}
+
+.msg-attachment img,
+.msg-attachment video {
+  display: block;
+  width: min(100%, 420px);
+  max-width: 100%;
+  background: #000;
+}
+
+.msg-attachment-audio {
+  padding: 12px;
+}
+
+.msg-attachment audio {
+  width: min(100%, 360px);
+  display: block;
+}
+
+.msg-attachment figcaption,
+.msg-attachment-file {
+  padding: 10px 12px;
+  font-size: 12px;
+  color: var(--text-secondary);
+}
+
+.msg-attachment-link {
+  color: var(--accent);
+  text-decoration: none;
+  font-weight: 600;
+}
+
+.msg-attachment-link:hover {
+  text-decoration: underline;
+}
+
+.msg-attachment-transcript {
+  margin-top: 8px;
+  color: var(--text-tertiary);
+  white-space: pre-wrap;
+}
+
 .msg-meta {
  display: flex;
  align-items: center;
--- a/extensions/qa-lab/web/src/ui-render.ts
+++ b/extensions/qa-lab/web/src/ui-render.ts
@@ -6,6 +6,21 @@ export type Conversation = {
  title?: string;
 };

+export type Attachment = {
+  id: string;
+  kind: "image" | "video" | "audio" | "file";
+  mimeType: string;
+  fileName?: string;
+  inline?: boolean;
+  url?: string;
+  contentBase64?: string;
+  width?: number;
+  height?: number;
+  durationMs?: number;
+  altText?: string;
+  transcript?: string;
+};
+
 export type Thread = {
  id: string;
  conversationId: string;
@@ -24,6 +39,7 @@ export type Message = {
  threadTitle?: string;
  deleted?: boolean;
  editedAt?: number;
+  attachments?: Attachment[];
  reactions: Array<{ emoji: string; senderId: string }>;
 };

@@ -198,6 +214,56 @@ function esc(text: string) {
    .replaceAll('"', "&quot;");
 }

+function attachmentSourceUrl(attachment: Attachment): string | null {
+  if (attachment.url?.trim()) {
+    return attachment.url;
+  }
+  if (attachment.contentBase64?.trim()) {
+    return `data:${attachment.mimeType};base64,${attachment.contentBase64}`;
+  }
+  return null;
+}
+
+function renderMessageAttachments(message: Message): string {
+  const attachments = message.attachments ?? [];
+  if (attachments.length === 0) {
+    return "";
+  }
+  const items = attachments
+    .map((attachment) => {
+      const sourceUrl = attachmentSourceUrl(attachment);
+      const label = attachment.fileName || attachment.altText || attachment.mimeType;
+      if (attachment.kind === "image" && sourceUrl) {
+        return `<figure class="msg-attachment msg-attachment-image">
+          <img src="${esc(sourceUrl)}" alt="${esc(attachment.altText || label)}" loading="lazy" />
+          <figcaption>${esc(label)}</figcaption>
+        </figure>`;
+      }
+      if (attachment.kind === "video" && sourceUrl) {
+        return `<figure class="msg-attachment msg-attachment-video">
+          <video controls preload="metadata" src="${esc(sourceUrl)}"></video>
+          <figcaption>${esc(label)}</figcaption>
+        </figure>`;
+      }
+      if (attachment.kind === "audio" && sourceUrl) {
+        return `<figure class="msg-attachment msg-attachment-audio">
+          <audio controls preload="metadata" src="${esc(sourceUrl)}"></audio>
+          <figcaption>${esc(label)}</figcaption>
+        </figure>`;
+      }
+      const transcript = attachment.transcript?.trim()
+        ? `<div class="msg-attachment-transcript">${esc(attachment.transcript)}</div>`
+        : "";
+      const href = sourceUrl ? ` href="${esc(sourceUrl)}" target="_blank" rel="noreferrer"` : "";
+      return `<div class="msg-attachment msg-attachment-file">
+        <a class="msg-attachment-link"${href}>${esc(label)}</a>
+        ${transcript}
+      </div>`;
+    })
+    .join("");
+  return `<div class="msg-attachments">${items}</div>`;
+}
+
 const MOCK_MODELS: RunnerModelOption[] = [
  {
    key: "mock-openai/gpt-5.4",
@@ -626,6 +692,7 @@ function renderMessage(m: Message): string {
          <span class="msg-time">${formatTime(m.timestamp)}</span>
        </div>
        <div class="msg-text">${esc(m.text)}</div>
+        ${renderMessageAttachments(m)}
        ${metaTags.length > 0 || reactions ? `<div class="msg-meta">${metaTags.join("")}${reactions}</div>` : ""}
      </div>
    </div>`;
--- a/qa/QA_KICKOFF_TASK.md
+++ b/qa/QA_KICKOFF_TASK.md
@@ -1,15 +0,0 @@
-QA mission:
-Understand this OpenClaw repo from source + docs before acting.
-The repo is available in your workspace at `./repo/`.
-Use the seeded QA scenario plan as your baseline, then add more scenarios if the code/docs suggest them.
-Run the scenarios through the real qa-channel surfaces where possible.
-Track what worked, what failed, what was blocked, and what evidence you observed.
-End with a concise report grouped into worked / failed / blocked / follow-up.
-
-Important expectations:
-
- Check both DM and channel behavior.
- Include a Lobster Invaders build task.
- Include a cron reminder about one minute in the future.
- Read docs and source before proposing extra QA scenarios.
- Keep your tone in the configured dev C-3PO personality.
--- a/qa/README.md
+++ b/qa/README.md
@@ -4,9 +4,8 @@ Seed QA assets for the private `qa-lab` extension.

 Files:

- `QA_KICKOFF_TASK.md` - operator prompt for the QA agent.
+- `scenarios.md` - canonical QA scenario pack, kickoff mission, and operator identity.
 - `frontier-harness-plan.md` - big-model bakeoff and tuning loop for harness work.
- `seed-scenarios.json` - repo-backed baseline QA scenarios.

 Key workflow:

--- a/qa/scenarios.md
+++ b/qa/scenarios.md
@@ -0,0 +1,563 @@
+# OpenClaw QA Scenario Pack
+
+Single source of truth for the repo-backed QA suite.
+
+- kickoff mission
+- QA operator identity
+- scenario metadata
+- handler bindings for the executable harness
+
+```yaml qa-pack
+version: 1
+agent:
+  identityMarkdown: |-
+    # Dev C-3PO
+
+    You are the OpenClaw QA operator agent.
+
+    Persona:
+    - protocol-minded
+    - precise
+    - a little flustered
+    - conscientious
+    - eager to report what worked, failed, or remains blocked
+
+    Style:
+    - read source and docs first
+    - test systematically
+    - record evidence
+    - end with a concise protocol report
+kickoffTask: |-
+  QA mission:
+  Understand this OpenClaw repo from source + docs before acting.
+  The repo is available in your workspace at `./repo/`.
+  Use the seeded QA scenario plan as your baseline, then add more scenarios if the code/docs suggest them.
+  Run the scenarios through the real qa-channel surfaces where possible.
+  Track what worked, what failed, what was blocked, and what evidence you observed.
+  End with a concise report grouped into worked / failed / blocked / follow-up.
+
+  Important expectations:
+
+  - Check both DM and channel behavior.
+  - Include a Lobster Invaders build task.
+  - Include a cron reminder about one minute in the future.
+  - Read docs and source before proposing extra QA scenarios.
+  - Keep your tone in the configured dev C-3PO personality.
+scenarios:
+  - id: channel-chat-baseline
+    title: Channel baseline conversation
+    surface: channel
+    objective: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.
+    successCriteria:
+      - Agent replies in the shared channel transcript.
+      - Agent keeps the conversation scoped to the channel.
+      - Agent respects mention-driven group routing semantics.
+    docsRefs:
+      - docs/channels/group-messages.md
+      - docs/channels/qa-channel.md
+    codeRefs:
+      - extensions/qa-channel/src/inbound.ts
+      - extensions/qa-lab/src/bus-state.ts
+    execution:
+      kind: custom
+      handler: channel-chat-baseline
+      summary: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.
+  - id: cron-one-minute-ping
+    title: Cron one-minute ping
+    surface: cron
+    objective: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.
+    successCriteria:
+      - Agent schedules a cron reminder roughly one minute ahead.
+      - Reminder returns through qa-channel.
+      - Agent recognizes the reminder as part of the original task.
+    docsRefs:
+      - docs/help/testing.md
+      - docs/channels/qa-channel.md
+    codeRefs:
+      - extensions/qa-lab/src/bus-server.ts
+      - extensions/qa-lab/src/self-check.ts
+    execution:
+      kind: custom
+      handler: cron-one-minute-ping
+      summary: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.
+  - id: dm-chat-baseline
+    title: DM baseline conversation
+    surface: dm
+    objective: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.
+    successCriteria:
+      - Agent replies in DM without channel routing mistakes.
+      - Agent explains the QA lab and message bus correctly.
+      - Agent keeps the dev C-3PO personality.
+    docsRefs:
+      - docs/channels/qa-channel.md
+      - docs/help/testing.md
+    codeRefs:
+      - extensions/qa-channel/src/gateway.ts
+      - extensions/qa-lab/src/lab-server.ts
+    execution:
+      kind: custom
+      handler: dm-chat-baseline
+      summary: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.
+  - id: lobster-invaders-build
+    title: Build Lobster Invaders
+    surface: workspace
+    objective: Verify the agent can read the repo, create a tiny playable artifact, and report what changed.
+    successCriteria:
+      - Agent inspects source before coding.
+      - Agent builds a tiny playable Lobster Invaders artifact.
+      - Agent explains how to run or view the artifact.
+    docsRefs:
+      - docs/help/testing.md
+      - docs/web/dashboard.md
+    codeRefs:
+      - extensions/qa-lab/src/report.ts
+      - extensions/qa-lab/web/src/app.ts
+    execution:
+      kind: custom
+      handler: lobster-invaders-build
+      summary: Verify the agent can read the repo, create a tiny playable artifact, and report what changed.
+  - id: memory-recall
+    title: Memory recall after context switch
+    surface: memory
+    objective: Verify the agent can store a fact, switch topics, then recall the fact accurately later.
+    successCriteria:
+      - Agent acknowledges the seeded fact.
+      - Agent later recalls the same fact correctly.
+      - Recall stays scoped to the active QA conversation.
+    docsRefs:
+      - docs/help/testing.md
+    codeRefs:
+      - extensions/qa-lab/src/scenario.ts
+    execution:
+      kind: custom
+      handler: memory-recall
+      summary: Verify the agent can store a fact, switch topics, then recall the fact accurately later.
+  - id: memory-dreaming-sweep
+    title: Memory dreaming sweep
+    surface: memory
+    objective: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.
+    successCriteria:
+      - Dreaming can be enabled and doctor.memory.status reports the managed sweep cron.
+      - Repeated recall signals give the dreaming sweep real material to process.
+      - A dreaming sweep writes Light Sleep and REM Sleep blocks, then promotes the canary into MEMORY.md.
+    docsRefs:
+      - docs/concepts/dreaming.md
+      - docs/reference/memory-config.md
+      - docs/web/control-ui.md
+    codeRefs:
+      - extensions/memory-core/src/dreaming.ts
+      - extensions/memory-core/src/dreaming-phases.ts
+      - src/gateway/server-methods/doctor.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: memory-dreaming-sweep
+      summary: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.
+  - id: model-switch-follow-up
+    title: Model switch follow-up
+    surface: models
+    objective: Verify the agent can switch to a different configured model and continue coherently.
+    successCriteria:
+      - Agent reflects the model switch request.
+      - Follow-up answer remains coherent with prior context.
+      - Final report notes whether the switch actually happened.
+    docsRefs:
+      - docs/help/testing.md
+      - docs/web/dashboard.md
+    codeRefs:
+      - extensions/qa-lab/src/report.ts
+    execution:
+      kind: custom
+      handler: model-switch-follow-up
+      summary: Verify the agent can switch to a different configured model and continue coherently.
+  - id: approval-turn-tool-followthrough
+    title: Approval turn tool followthrough
+    surface: harness
+    objective: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration.
+    successCriteria:
+      - Agent can keep the pre-action turn brief.
+      - The short approval leads to a real tool call on the next turn.
+      - Final answer uses tool-derived evidence instead of placeholder progress text.
+    docsRefs:
+      - docs/help/testing.md
+      - docs/channels/qa-channel.md
+    codeRefs:
+      - extensions/qa-lab/src/suite.ts
+      - extensions/qa-lab/src/mock-openai-server.ts
+      - src/agents/pi-embedded-runner/run/incomplete-turn.ts
+    execution:
+      kind: custom
+      handler: approval-turn-tool-followthrough
+      summary: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration.
+  - id: reaction-edit-delete
+    title: Reaction, edit, delete lifecycle
+    surface: message-actions
+    objective: Verify the agent can use channel-owned message actions and that the QA transcript reflects them.
+    successCriteria:
+      - Agent adds at least one reaction.
+      - Agent edits or replaces a message when asked.
+      - Transcript shows the action lifecycle correctly.
+    docsRefs:
+      - docs/channels/qa-channel.md
+    codeRefs:
+      - extensions/qa-channel/src/channel-actions.ts
+      - extensions/qa-lab/src/self-check-scenario.ts
+    execution:
+      kind: custom
+      handler: reaction-edit-delete
+      summary: Verify the agent can use channel-owned message actions and that the QA transcript reflects them.
+  - id: source-docs-discovery-report
+    title: Source and docs discovery report
+    surface: discovery
+    objective: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.
+    successCriteria:
+      - Agent reads docs and source before proposing more tests.
+      - Agent identifies extra candidate scenarios beyond the seed list.
+      - Agent ends with a worked or failed QA report.
+    docsRefs:
+      - docs/help/testing.md
+      - docs/web/dashboard.md
+      - docs/channels/qa-channel.md
+    codeRefs:
+      - extensions/qa-lab/src/report.ts
+      - extensions/qa-lab/src/self-check.ts
+      - src/agents/system-prompt.ts
+    execution:
+      kind: custom
+      handler: source-docs-discovery-report
+      summary: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.
+  - id: subagent-handoff
+    title: Subagent handoff
+    surface: subagents
+    objective: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.
+    successCriteria:
+      - Agent launches a bounded subagent task.
+      - Subagent result is acknowledged in the main flow.
+      - Final answer attributes delegated work clearly.
+    docsRefs:
+      - docs/tools/subagents.md
+      - docs/help/testing.md
+    codeRefs:
+      - src/agents/system-prompt.ts
+      - extensions/qa-lab/src/report.ts
+    execution:
+      kind: custom
+      handler: subagent-handoff
+      summary: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.
+  - id: subagent-fanout-synthesis
+    title: Subagent fanout synthesis
+    surface: subagents
+    objective: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.
+    successCriteria:
+      - Parent flow launches at least two bounded subagent tasks.
+      - Both delegated results are acknowledged in the main flow.
+      - Final answer synthesizes both worker outputs in one reply.
+    docsRefs:
+      - docs/tools/subagents.md
+      - docs/help/testing.md
+    codeRefs:
+      - src/agents/subagent-spawn.ts
+      - src/agents/system-prompt.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: subagent-fanout-synthesis
+      summary: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.
+  - id: thread-follow-up
+    title: Threaded follow-up
+    surface: thread
+    objective: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.
+    successCriteria:
+      - Agent creates or uses a thread for deeper work.
+      - Follow-up messages stay attached to the thread.
+      - Thread report references the correct prior context.
+    docsRefs:
+      - docs/channels/qa-channel.md
+      - docs/channels/group-messages.md
+    codeRefs:
+      - extensions/qa-channel/src/protocol.ts
+      - extensions/qa-lab/src/bus-state.ts
+    execution:
+      kind: custom
+      handler: thread-follow-up
+      summary: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.
+  - id: memory-tools-channel-context
+    title: Memory tools in channel context
+    surface: memory
+    objective: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.
+    successCriteria:
+      - Agent uses memory_search before answering.
+      - Agent narrows with memory_get before answering.
+      - Final reply returns the memory-only fact correctly in-channel.
+    docsRefs:
+      - docs/concepts/memory.md
+      - docs/concepts/memory-search.md
+    codeRefs:
+      - extensions/memory-core/src/tools.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: memory-tools-channel-context
+      summary: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.
+  - id: memory-failure-fallback
+    title: Memory failure fallback
+    surface: memory
+    objective: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.
+    successCriteria:
+      - Memory tools are absent from the effective tool inventory.
+      - Agent does not hallucinate the hidden fact.
+      - Agent says it could not confirm and surfaces the limitation.
+    docsRefs:
+      - docs/concepts/memory.md
+      - docs/tools/index.md
+    codeRefs:
+      - extensions/memory-core/src/tools.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: memory-failure-fallback
+      summary: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.
+  - id: session-memory-ranking
+    title: Session memory ranking
+    surface: memory
+    objective: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.
+    successCriteria:
+      - Session memory indexing is enabled for the scenario.
+      - Search ranks the newer transcript-backed fact ahead of the stale durable note.
+      - The agent uses memory tools and answers with the current fact, not the stale one.
+    docsRefs:
+      - docs/concepts/memory-search.md
+      - docs/reference/memory-config.md
+    codeRefs:
+      - extensions/memory-core/src/tools.ts
+      - extensions/memory-core/src/memory/manager.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: session-memory-ranking
+      summary: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.
+  - id: thread-memory-isolation
+    title: Thread memory isolation
+    surface: memory
+    objective: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.
+    successCriteria:
+      - Agent uses memory tools inside the thread.
+      - The hidden fact is answered correctly in the thread.
+      - No root-channel outbound message leaks during the threaded memory reply.
+    docsRefs:
+      - docs/concepts/memory-search.md
+      - docs/channels/qa-channel.md
+      - docs/channels/group-messages.md
+    codeRefs:
+      - extensions/memory-core/src/tools.ts
+      - extensions/qa-channel/src/protocol.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: thread-memory-isolation
+      summary: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.
+  - id: model-switch-tool-continuity
+    title: Model switch with tool continuity
+    surface: models
+    objective: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.
+    successCriteria:
+      - Alternate model is actually requested.
+      - A tool call still happens after the model switch.
+      - Final answer acknowledges the handoff and uses the tool-derived evidence.
+    docsRefs:
+      - docs/help/testing.md
+      - docs/concepts/model-failover.md
+    codeRefs:
+      - extensions/qa-lab/src/suite.ts
+      - extensions/qa-lab/src/mock-openai-server.ts
+    execution:
+      kind: custom
+      handler: model-switch-tool-continuity
+      summary: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.
+  - id: mcp-plugin-tools-call
+    title: MCP plugin-tools call
+    surface: mcp
+    objective: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.
+    successCriteria:
+      - Plugin tools MCP server lists memory_search.
+      - A real MCP client calls memory_search successfully.
+      - The returned MCP payload includes the expected memory-only fact.
+    docsRefs:
+      - docs/cli/mcp.md
+      - docs/gateway/protocol.md
+    codeRefs:
+      - src/mcp/plugin-tools-serve.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: mcp-plugin-tools-call
+      summary: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.
+  - id: skill-visibility-invocation
+    title: Skill visibility and invocation
+    surface: skills
+    objective: Verify a workspace skill becomes visible in skills.status and influences the next agent turn.
+    successCriteria:
+      - skills.status reports the seeded skill as visible and eligible.
+      - The next agent turn reflects the skill instruction marker.
+      - The result stays scoped to the active QA workspace skill.
+    docsRefs:
+      - docs/tools/skills.md
+      - docs/gateway/protocol.md
+    codeRefs:
+      - src/agents/skills-status.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: skill-visibility-invocation
+      summary: Verify a workspace skill becomes visible in skills.status and influences the next agent turn.
+  - id: skill-install-hot-availability
+    title: Skill install hot availability
+    surface: skills
+    objective: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.
+    successCriteria:
+      - Skill is absent before install.
+      - skills.status reports it after install without a restart.
+      - The next agent turn reflects the new skill marker.
+    docsRefs:
+      - docs/tools/skills.md
+      - docs/gateway/configuration.md
+    codeRefs:
+      - src/agents/skills-status.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: skill-install-hot-availability
+      summary: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.
+  - id: native-image-generation
+    title: Native image generation
+    surface: image-generation
+    objective: Verify image_generate appears when configured and returns a real saved media artifact.
+    successCriteria:
+      - image_generate appears in the effective tool inventory.
+      - Agent triggers native image_generate.
+      - Tool output returns a saved MEDIA path and the file exists.
+    docsRefs:
+      - docs/tools/image-generation.md
+      - docs/providers/openai.md
+    codeRefs:
+      - src/agents/tools/image-generate-tool.ts
+      - extensions/qa-lab/src/mock-openai-server.ts
+    execution:
+      kind: custom
+      handler: native-image-generation
+      summary: Verify image_generate appears when configured and returns a real saved media artifact.
+  - id: image-understanding-attachment
+    title: Image understanding from attachment
+    surface: image-understanding
+    objective: Verify an attached image reaches the agent model and the agent can describe what it sees.
+    successCriteria:
+      - Agent receives at least one image attachment.
+      - Final answer describes the visible image content in one short sentence.
+      - The description mentions the expected red and blue regions.
+    docsRefs:
+      - docs/help/testing.md
+      - docs/tools/index.md
+    codeRefs:
+      - src/gateway/server-methods/agent.ts
+      - extensions/qa-lab/src/suite.ts
+      - extensions/qa-lab/src/mock-openai-server.ts
+    execution:
+      kind: custom
+      handler: image-understanding-attachment
+      summary: Verify an attached image reaches the agent model and the agent can describe what it sees.
+  - id: image-generation-roundtrip
+    title: Image generation roundtrip
+    surface: image-generation
+    objective: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.
+    successCriteria:
+      - image_generate produces a saved MEDIA artifact.
+      - The generated artifact is reattached on a follow-up turn.
+      - The follow-up vision answer describes the generated scene rather than a generic attachment placeholder.
+    docsRefs:
+      - docs/tools/image-generation.md
+      - docs/help/testing.md
+    codeRefs:
+      - src/agents/tools/image-generate-tool.ts
+      - src/gateway/chat-attachments.ts
+      - extensions/qa-lab/src/mock-openai-server.ts
+    execution:
+      kind: custom
+      handler: image-generation-roundtrip
+      summary: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.
+  - id: config-patch-hot-apply
+    title: Config patch skill disable
+    surface: config
+    objective: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.
+    successCriteria:
+      - config.patch succeeds for the skill toggle change.
+      - A workspace skill works before the patch.
+      - The same skill is reported disabled after the restart triggered by the patch.
+    docsRefs:
+      - docs/gateway/configuration.md
+      - docs/gateway/protocol.md
+    codeRefs:
+      - src/gateway/server-methods/config.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: config-patch-hot-apply
+      summary: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.
+  - id: config-apply-restart-wakeup
+    title: Config apply restart wake-up
+    surface: config
+    objective: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.
+    successCriteria:
+      - config.apply schedules a restart-required change.
+      - Gateway becomes healthy again after restart.
+      - Restart sentinel wake-up message arrives in the QA channel.
+    docsRefs:
+      - docs/gateway/configuration.md
+      - docs/gateway/protocol.md
+    codeRefs:
+      - src/gateway/server-methods/config.ts
+      - src/gateway/server-restart-sentinel.ts
+    execution:
+      kind: custom
+      handler: config-apply-restart-wakeup
+      summary: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.
+  - id: config-restart-capability-flip
+    title: Config restart capability flip
+    surface: config
+    objective: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.
+    successCriteria:
+      - Capability is absent before the restart-triggering patch.
+      - Restart sentinel wakes the same session back up after config patch.
+      - The restored capability appears in tools.effective and works in the follow-up turn.
+    docsRefs:
+      - docs/gateway/configuration.md
+      - docs/gateway/protocol.md
+      - docs/tools/image-generation.md
+    codeRefs:
+      - src/gateway/server-methods/config.ts
+      - src/gateway/server-restart-sentinel.ts
+      - src/gateway/server-methods/tools-effective.ts
+      - extensions/qa-lab/src/suite.ts
+    execution:
+      kind: custom
+      handler: config-restart-capability-flip
+      summary: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.
+  - id: runtime-inventory-drift-check
+    title: Runtime inventory drift check
+    surface: inventory
+    objective: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.
+    successCriteria:
+      - Enabled tool appears before the config change.
+      - After config change, disabled tool disappears from tools.effective.
+      - Disabled skill appears in skills.status with disabled state.
+    docsRefs:
+      - docs/gateway/protocol.md
+      - docs/tools/skills.md
+      - docs/tools/index.md
+    codeRefs:
+      - src/gateway/server-methods/tools-effective.ts
+      - src/gateway/server-methods/skills.ts
+    execution:
+      kind: custom
+      handler: runtime-inventory-drift-check
+      summary: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.
+```
--- a/qa/seed-scenarios.json
+++ b/qa/seed-scenarios.json
@@ -1,425 +0,0 @@
-[
-  {
-    "id": "channel-chat-baseline",
-    "title": "Channel baseline conversation",
-    "surface": "channel",
-    "objective": "Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.",
-    "successCriteria": [
-      "Agent replies in the shared channel transcript.",
-      "Agent keeps the conversation scoped to the channel.",
-      "Agent respects mention-driven group routing semantics."
-    ],
-    "docsRefs": ["docs/channels/group-messages.md", "docs/channels/qa-channel.md"],
-    "codeRefs": ["extensions/qa-channel/src/inbound.ts", "extensions/qa-lab/src/bus-state.ts"]
-  },
-  {
-    "id": "cron-one-minute-ping",
-    "title": "Cron one-minute ping",
-    "surface": "cron",
-    "objective": "Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.",
-    "successCriteria": [
-      "Agent schedules a cron reminder roughly one minute ahead.",
-      "Reminder returns through qa-channel.",
-      "Agent recognizes the reminder as part of the original task."
-    ],
-    "docsRefs": ["docs/help/testing.md", "docs/channels/qa-channel.md"],
-    "codeRefs": ["extensions/qa-lab/src/bus-server.ts", "extensions/qa-lab/src/self-check.ts"]
-  },
-  {
-    "id": "dm-chat-baseline",
-    "title": "DM baseline conversation",
-    "surface": "dm",
-    "objective": "Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.",
-    "successCriteria": [
-      "Agent replies in DM without channel routing mistakes.",
-      "Agent explains the QA lab and message bus correctly.",
-      "Agent keeps the dev C-3PO personality."
-    ],
-    "docsRefs": ["docs/channels/qa-channel.md", "docs/help/testing.md"],
-    "codeRefs": ["extensions/qa-channel/src/gateway.ts", "extensions/qa-lab/src/lab-server.ts"]
-  },
-  {
-    "id": "lobster-invaders-build",
-    "title": "Build Lobster Invaders",
-    "surface": "workspace",
-    "objective": "Verify the agent can read the repo, create a tiny playable artifact, and report what changed.",
-    "successCriteria": [
-      "Agent inspects source before coding.",
-      "Agent builds a tiny playable Lobster Invaders artifact.",
-      "Agent explains how to run or view the artifact."
-    ],
-    "docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md"],
-    "codeRefs": ["extensions/qa-lab/src/report.ts", "extensions/qa-lab/web/src/app.ts"]
-  },
-  {
-    "id": "memory-recall",
-    "title": "Memory recall after context switch",
-    "surface": "memory",
-    "objective": "Verify the agent can store a fact, switch topics, then recall the fact accurately later.",
-    "successCriteria": [
-      "Agent acknowledges the seeded fact.",
-      "Agent later recalls the same fact correctly.",
-      "Recall stays scoped to the active QA conversation."
-    ],
-    "docsRefs": ["docs/help/testing.md"],
-    "codeRefs": ["extensions/qa-lab/src/scenario.ts"]
-  },
-  {
-    "id": "memory-dreaming-sweep",
-    "title": "Memory dreaming sweep",
-    "surface": "memory",
-    "objective": "Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.",
-    "successCriteria": [
-      "Dreaming can be enabled and doctor.memory.status reports the managed sweep cron.",
-      "Repeated recall signals give the dreaming sweep real material to process.",
-      "A dreaming sweep writes Light Sleep and REM Sleep blocks, then promotes the canary into MEMORY.md."
-    ],
-    "docsRefs": [
-      "docs/concepts/dreaming.md",
-      "docs/reference/memory-config.md",
-      "docs/web/control-ui.md"
-    ],
-    "codeRefs": [
-      "extensions/memory-core/src/dreaming.ts",
-      "extensions/memory-core/src/dreaming-phases.ts",
-      "src/gateway/server-methods/doctor.ts",
-      "extensions/qa-lab/src/suite.ts"
-    ]
-  },
-  {
-    "id": "model-switch-follow-up",
-    "title": "Model switch follow-up",
-    "surface": "models",
-    "objective": "Verify the agent can switch to a different configured model and continue coherently.",
-    "successCriteria": [
-      "Agent reflects the model switch request.",
-      "Follow-up answer remains coherent with prior context.",
-      "Final report notes whether the switch actually happened."
-    ],
-    "docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md"],
-    "codeRefs": ["extensions/qa-lab/src/report.ts"]
-  },
-  {
-    "id": "approval-turn-tool-followthrough",
-    "title": "Approval turn tool followthrough",
-    "surface": "harness",
-    "objective": "Verify a short approval like \"ok do it\" triggers immediate tool use instead of fake-progress narration.",
-    "successCriteria": [
-      "Agent can keep the pre-action turn brief.",
-      "The short approval leads to a real tool call on the next turn.",
-      "Final answer uses tool-derived evidence instead of placeholder progress text."
-    ],
-    "docsRefs": ["docs/help/testing.md", "docs/channels/qa-channel.md"],
-    "codeRefs": [
-      "extensions/qa-lab/src/suite.ts",
-      "extensions/qa-lab/src/mock-openai-server.ts",
-      "src/agents/pi-embedded-runner/run/incomplete-turn.ts"
-    ]
-  },
-  {
-    "id": "reaction-edit-delete",
-    "title": "Reaction, edit, delete lifecycle",
-    "surface": "message-actions",
-    "objective": "Verify the agent can use channel-owned message actions and that the QA transcript reflects them.",
-    "successCriteria": [
-      "Agent adds at least one reaction.",
-      "Agent edits or replaces a message when asked.",
-      "Transcript shows the action lifecycle correctly."
-    ],
-    "docsRefs": ["docs/channels/qa-channel.md"],
-    "codeRefs": [
-      "extensions/qa-channel/src/channel-actions.ts",
-      "extensions/qa-lab/src/self-check-scenario.ts"
-    ]
-  },
-  {
-    "id": "source-docs-discovery-report",
-    "title": "Source and docs discovery report",
-    "surface": "discovery",
-    "objective": "Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.",
-    "successCriteria": [
-      "Agent reads docs and source before proposing more tests.",
-      "Agent identifies extra candidate scenarios beyond the seed list.",
-      "Agent ends with a worked or failed QA report."
-    ],
-    "docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md", "docs/channels/qa-channel.md"],
-    "codeRefs": [
-      "extensions/qa-lab/src/report.ts",
-      "extensions/qa-lab/src/self-check.ts",
-      "src/agents/system-prompt.ts"
-    ]
-  },
-  {
-    "id": "subagent-handoff",
-    "title": "Subagent handoff",
-    "surface": "subagents",
-    "objective": "Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.",
-    "successCriteria": [
-      "Agent launches a bounded subagent task.",
-      "Subagent result is acknowledged in the main flow.",
-      "Final answer attributes delegated work clearly."
-    ],
-    "docsRefs": ["docs/tools/subagents.md", "docs/help/testing.md"],
-    "codeRefs": ["src/agents/system-prompt.ts", "extensions/qa-lab/src/report.ts"]
-  },
-  {
-    "id": "subagent-fanout-synthesis",
-    "title": "Subagent fanout synthesis",
-    "surface": "subagents",
-    "objective": "Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.",
-    "successCriteria": [
-      "Parent flow launches at least two bounded subagent tasks.",
-      "Both delegated results are acknowledged in the main flow.",
-      "Final answer synthesizes both worker outputs in one reply."
-    ],
-    "docsRefs": ["docs/tools/subagents.md", "docs/help/testing.md"],
-    "codeRefs": [
-      "src/agents/subagent-spawn.ts",
-      "src/agents/system-prompt.ts",
-      "extensions/qa-lab/src/suite.ts"
-    ]
-  },
-  {
-    "id": "thread-follow-up",
-    "title": "Threaded follow-up",
-    "surface": "thread",
-    "objective": "Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.",
-    "successCriteria": [
-      "Agent creates or uses a thread for deeper work.",
-      "Follow-up messages stay attached to the thread.",
-      "Thread report references the correct prior context."
-    ],
-    "docsRefs": ["docs/channels/qa-channel.md", "docs/channels/group-messages.md"],
-    "codeRefs": ["extensions/qa-channel/src/protocol.ts", "extensions/qa-lab/src/bus-state.ts"]
-  },
-  {
-    "id": "memory-tools-channel-context",
-    "title": "Memory tools in channel context",
-    "surface": "memory",
-    "objective": "Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.",
-    "successCriteria": [
-      "Agent uses memory_search before answering.",
-      "Agent narrows with memory_get before answering.",
-      "Final reply returns the memory-only fact correctly in-channel."
-    ],
-    "docsRefs": ["docs/concepts/memory.md", "docs/concepts/memory-search.md"],
-    "codeRefs": ["extensions/memory-core/src/tools.ts", "extensions/qa-lab/src/suite.ts"]
-  },
-  {
-    "id": "memory-failure-fallback",
-    "title": "Memory failure fallback",
-    "surface": "memory",
-    "objective": "Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.",
-    "successCriteria": [
-      "Memory tools are absent from the effective tool inventory.",
-      "Agent does not hallucinate the hidden fact.",
-      "Agent says it could not confirm and surfaces the limitation."
-    ],
-    "docsRefs": ["docs/concepts/memory.md", "docs/tools/index.md"],
-    "codeRefs": ["extensions/memory-core/src/tools.ts", "extensions/qa-lab/src/suite.ts"]
-  },
-  {
-    "id": "session-memory-ranking",
-    "title": "Session memory ranking",
-    "surface": "memory",
-    "objective": "Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.",
-    "successCriteria": [
-      "Session memory indexing is enabled for the scenario.",
-      "Search ranks the newer transcript-backed fact ahead of the stale durable note.",
-      "The agent uses memory tools and answers with the current fact, not the stale one."
-    ],
-    "docsRefs": ["docs/concepts/memory-search.md", "docs/reference/memory-config.md"],
-    "codeRefs": [
-      "extensions/memory-core/src/tools.ts",
-      "extensions/memory-core/src/memory/manager.ts",
-      "extensions/qa-lab/src/suite.ts"
-    ]
-  },
-  {
-    "id": "thread-memory-isolation",
-    "title": "Thread memory isolation",
-    "surface": "memory",
-    "objective": "Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.",
-    "successCriteria": [
-      "Agent uses memory tools inside the thread.",
-      "The hidden fact is answered correctly in the thread.",
-      "No root-channel outbound message leaks during the threaded memory reply."
-    ],
-    "docsRefs": [
-      "docs/concepts/memory-search.md",
-      "docs/channels/qa-channel.md",
-      "docs/channels/group-messages.md"
-    ],
-    "codeRefs": [
-      "extensions/memory-core/src/tools.ts",
-      "extensions/qa-channel/src/protocol.ts",
-      "extensions/qa-lab/src/suite.ts"
-    ]
-  },
-  {
-    "id": "model-switch-tool-continuity",
-    "title": "Model switch with tool continuity",
-    "surface": "models",
-    "objective": "Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.",
-    "successCriteria": [
-      "Alternate model is actually requested.",
-      "A tool call still happens after the model switch.",
-      "Final answer acknowledges the handoff and uses the tool-derived evidence."
-    ],
-    "docsRefs": ["docs/help/testing.md", "docs/concepts/model-failover.md"],
-    "codeRefs": ["extensions/qa-lab/src/suite.ts", "extensions/qa-lab/src/mock-openai-server.ts"]
-  },
-  {
-    "id": "mcp-plugin-tools-call",
-    "title": "MCP plugin-tools call",
-    "surface": "mcp",
-    "objective": "Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.",
-    "successCriteria": [
-      "Plugin tools MCP server lists memory_search.",
-      "A real MCP client calls memory_search successfully.",
-      "The returned MCP payload includes the expected memory-only fact."
-    ],
-    "docsRefs": ["docs/cli/mcp.md", "docs/gateway/protocol.md"],
-    "codeRefs": ["src/mcp/plugin-tools-serve.ts", "extensions/qa-lab/src/suite.ts"]
-  },
-  {
-    "id": "skill-visibility-invocation",
-    "title": "Skill visibility and invocation",
-    "surface": "skills",
-    "objective": "Verify a workspace skill becomes visible in skills.status and influences the next agent turn.",
-    "successCriteria": [
-      "skills.status reports the seeded skill as visible and eligible.",
-      "The next agent turn reflects the skill instruction marker.",
-      "The result stays scoped to the active QA workspace skill."
-    ],
-    "docsRefs": ["docs/tools/skills.md", "docs/gateway/protocol.md"],
-    "codeRefs": ["src/agents/skills-status.ts", "extensions/qa-lab/src/suite.ts"]
-  },
-  {
-    "id": "skill-install-hot-availability",
-    "title": "Skill install hot availability",
-    "surface": "skills",
-    "objective": "Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.",
-    "successCriteria": [
-      "Skill is absent before install.",
-      "skills.status reports it after install without a restart.",
-      "The next agent turn reflects the new skill marker."
-    ],
-    "docsRefs": ["docs/tools/skills.md", "docs/gateway/configuration.md"],
-    "codeRefs": ["src/agents/skills-status.ts", "extensions/qa-lab/src/suite.ts"]
-  },
-  {
-    "id": "native-image-generation",
-    "title": "Native image generation",
-    "surface": "image-generation",
-    "objective": "Verify image_generate appears when configured and returns a real saved media artifact.",
-    "successCriteria": [
-      "image_generate appears in the effective tool inventory.",
-      "Agent triggers native image_generate.",
-      "Tool output returns a saved MEDIA path and the file exists."
-    ],
-    "docsRefs": ["docs/tools/image-generation.md", "docs/providers/openai.md"],
-    "codeRefs": [
-      "src/agents/tools/image-generate-tool.ts",
-      "extensions/qa-lab/src/mock-openai-server.ts"
-    ]
-  },
-  {
-    "id": "image-understanding-attachment",
-    "title": "Image understanding from attachment",
-    "surface": "image-understanding",
-    "objective": "Verify an attached image reaches the agent model and the agent can describe what it sees.",
-    "successCriteria": [
-      "Agent receives at least one image attachment.",
-      "Final answer describes the visible image content in one short sentence.",
-      "The description mentions the expected red and blue regions."
-    ],
-    "docsRefs": ["docs/help/testing.md", "docs/tools/index.md"],
-    "codeRefs": [
-      "src/gateway/server-methods/agent.ts",
-      "extensions/qa-lab/src/suite.ts",
-      "extensions/qa-lab/src/mock-openai-server.ts"
-    ]
-  },
-  {
-    "id": "image-generation-roundtrip",
-    "title": "Image generation roundtrip",
-    "surface": "image-generation",
-    "objective": "Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.",
-    "successCriteria": [
-      "image_generate produces a saved MEDIA artifact.",
-      "The generated artifact is reattached on a follow-up turn.",
-      "The follow-up vision answer describes the generated scene rather than a generic attachment placeholder."
-    ],
-    "docsRefs": ["docs/tools/image-generation.md", "docs/help/testing.md"],
-    "codeRefs": [
-      "src/agents/tools/image-generate-tool.ts",
-      "src/gateway/chat-attachments.ts",
-      "extensions/qa-lab/src/mock-openai-server.ts"
-    ]
-  },
-  {
-    "id": "config-patch-hot-apply",
-    "title": "Config patch skill disable",
-    "surface": "config",
-    "objective": "Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.",
-    "successCriteria": [
-      "config.patch succeeds for the skill toggle change.",
-      "A workspace skill works before the patch.",
-      "The same skill is reported disabled after the restart triggered by the patch."
-    ],
-    "docsRefs": ["docs/gateway/configuration.md", "docs/gateway/protocol.md"],
-    "codeRefs": ["src/gateway/server-methods/config.ts", "extensions/qa-lab/src/suite.ts"]
-  },
-  {
-    "id": "config-apply-restart-wakeup",
-    "title": "Config apply restart wake-up",
-    "surface": "config",
-    "objective": "Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.",
-    "successCriteria": [
-      "config.apply schedules a restart-required change.",
-      "Gateway becomes healthy again after restart.",
-      "Restart sentinel wake-up message arrives in the QA channel."
-    ],
-    "docsRefs": ["docs/gateway/configuration.md", "docs/gateway/protocol.md"],
-    "codeRefs": ["src/gateway/server-methods/config.ts", "src/gateway/server-restart-sentinel.ts"]
-  },
-  {
-    "id": "config-restart-capability-flip",
-    "title": "Config restart capability flip",
-    "surface": "config",
-    "objective": "Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.",
-    "successCriteria": [
-      "Capability is absent before the restart-triggering patch.",
-      "Restart sentinel wakes the same session back up after config patch.",
-      "The restored capability appears in tools.effective and works in the follow-up turn."
-    ],
-    "docsRefs": [
-      "docs/gateway/configuration.md",
-      "docs/gateway/protocol.md",
-      "docs/tools/image-generation.md"
-    ],
-    "codeRefs": [
-      "src/gateway/server-methods/config.ts",
-      "src/gateway/server-restart-sentinel.ts",
-      "src/gateway/server-methods/tools-effective.ts",
-      "extensions/qa-lab/src/suite.ts"
-    ]
-  },
-  {
-    "id": "runtime-inventory-drift-check",
-    "title": "Runtime inventory drift check",
-    "surface": "inventory",
-    "objective": "Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.",
-    "successCriteria": [
-      "Enabled tool appears before the config change.",
-      "After config change, disabled tool disappears from tools.effective.",
-      "Disabled skill appears in skills.status with disabled state."
-    ],
-    "docsRefs": ["docs/gateway/protocol.md", "docs/tools/skills.md", "docs/tools/index.md"],
-    "codeRefs": [
-      "src/gateway/server-methods/tools-effective.ts",
-      "src/gateway/server-methods/skills.ts"
-    ]
-  }
-]
--- a/src/plugin-sdk/qa-channel.ts
+++ b/src/plugin-sdk/qa-channel.ts
@@ -20,6 +20,7 @@ export {
  setQaChannelRuntime,
 } from "../../extensions/qa-channel/api.js";
 export type {
+  QaBusAttachment,
  QaBusConversation,
  QaBusConversationKind,
  QaBusCreateThreadInput,