diff --git a/docs/concepts/qa-e2e-automation.md b/docs/concepts/qa-e2e-automation.md index 0c4d7173d19..42511cdefab 100644 --- a/docs/concepts/qa-e2e-automation.md +++ b/docs/concepts/qa-e2e-automation.md @@ -56,8 +56,7 @@ asset hash changes. Seed assets live in `qa/`: -- `qa/QA_KICKOFF_TASK.md` -- `qa/seed-scenarios.json` +- `qa/scenarios.md` These are intentionally in git so the QA plan is visible to both humans and the agent. The baseline list should stay broad enough to cover: diff --git a/docs/refactor/qa.md b/docs/refactor/qa.md new file mode 100644 index 00000000000..4b48c82e6fa --- /dev/null +++ b/docs/refactor/qa.md @@ -0,0 +1,526 @@ +# QA Refactor + +Status: foundational migration landed. + +## Goal + +Move OpenClaw QA from a split-definition model to a single source of truth: + +- scenario metadata +- prompts sent to the model +- setup and teardown +- harness logic +- assertions and success criteria +- artifacts and report hints + +The desired end state is a generic QA harness that loads powerful scenario definition files instead of hardcoding most behavior in TypeScript. + +## Current State + +Primary source of truth now lives in `qa/scenarios.md`. + +Implemented: + +- `qa/scenarios.md` + - canonical QA pack + - operator identity + - kickoff mission + - scenario metadata + - handler bindings +- `extensions/qa-lab/src/scenario-catalog.ts` + - markdown pack parser + zod validation +- `extensions/qa-lab/src/qa-agent-bootstrap.ts` + - plan rendering from the markdown pack +- `extensions/qa-lab/src/qa-agent-workspace.ts` + - seeds generated compatibility files plus `QA_SCENARIOS.md` +- `extensions/qa-lab/src/suite.ts` + - selects executable scenarios through markdown-defined handler bindings +- QA bus protocol + UI + - generic inline attachments for image/video/audio/file rendering + +Remaining split surfaces: + +- `extensions/qa-lab/src/suite.ts` + - still owns most executable custom handler logic +- `extensions/qa-lab/src/report.ts` + - still derives report structure from runtime outputs + +So the source-of-truth split is fixed, but execution is still mostly handler-backed rather than fully declarative. + +## What The Real Scenario Surface Looks Like + +Reading the current suite shows a few distinct scenario classes. + +### Simple interaction + +- channel baseline +- DM baseline +- threaded follow-up +- model switch +- approval followthrough +- reaction/edit/delete + +### Config and runtime mutation + +- config patch skill disable +- config apply restart wake-up +- config restart capability flip +- runtime inventory drift check + +### Filesystem and repo assertions + +- source/docs discovery report +- build Lobster Invaders +- generated image artifact lookup + +### Memory orchestration + +- memory recall +- memory tools in channel context +- memory failure fallback +- session memory ranking +- thread memory isolation +- memory dreaming sweep + +### Tool and plugin integration + +- MCP plugin-tools call +- skill visibility +- skill hot install +- native image generation +- image roundtrip +- image understanding from attachment + +### Multi-turn and multi-actor + +- subagent handoff +- subagent fanout synthesis +- restart recovery style flows + +These categories matter because they drive DSL requirements. A flat list of prompt + expected text is not enough. + +## Direction + +### Single source of truth + +Use `qa/scenarios.md` as the authored source of truth. + +The pack should stay: + +- human-readable in review +- machine-parseable +- rich enough to drive: + - suite execution + - QA workspace bootstrap + - QA Lab UI metadata + - docs/discovery prompts + - report generation + +### Preferred authoring format + +Use markdown as the top-level format, with structured YAML inside it. + +Recommended shape: + +- YAML frontmatter + - id + - title + - surface + - tags + - docs refs + - code refs + - model/provider overrides + - prerequisites +- prose sections + - objective + - notes + - debugging hints +- fenced YAML blocks + - setup + - steps + - assertions + - cleanup + +This gives: + +- better PR readability than giant JSON +- richer context than pure YAML +- strict parsing and zod validation + +Raw JSON is acceptable only as an intermediate generated form. + +## Proposed Scenario File Shape + +Example: + +````md +--- +id: image-generation-roundtrip +title: Image generation roundtrip +surface: image +tags: [media, image, roundtrip] +models: + primary: openai/gpt-5.4 +requires: + tools: [image_generate] + plugins: [openai, qa-channel] +docsRefs: + - docs/help/testing.md + - docs/concepts/model-providers.md +codeRefs: + - extensions/qa-lab/src/suite.ts + - src/gateway/chat-attachments.ts +--- + +# Objective + +Verify generated media is reattached on the follow-up turn. + +# Setup + +```yaml scenario.setup +- action: config.patch + patch: + agents: + defaults: + imageGenerationModel: + primary: openai/gpt-image-1 +- action: session.create + key: agent:qa:image-roundtrip +``` +```` + +# Steps + +```yaml scenario.steps +- action: agent.send + session: agent:qa:image-roundtrip + message: | + Image generation check: generate a QA lighthouse image and summarize it in one short sentence. +- action: artifact.capture + kind: generated-image + promptSnippet: Image generation check + saveAs: lighthouseImage +- action: agent.send + session: agent:qa:image-roundtrip + message: | + Roundtrip image inspection check: describe the generated lighthouse attachment in one short sentence. + attachments: + - fromArtifact: lighthouseImage +``` + +# Expect + +```yaml scenario.expect +- assert: outbound.textIncludes + value: lighthouse +- assert: requestLog.matches + where: + promptIncludes: Roundtrip image inspection check + imageInputCountGte: 1 +- assert: artifact.exists + ref: lighthouseImage +``` + +```` + +## Runner Capabilities The DSL Must Cover + +Based on the current suite, the generic runner needs more than prompt execution. + +### Environment and setup actions + +- `bus.reset` +- `gateway.waitHealthy` +- `channel.waitReady` +- `session.create` +- `thread.create` +- `workspace.writeSkill` + +### Agent turn actions + +- `agent.send` +- `agent.wait` +- `bus.injectInbound` +- `bus.injectOutbound` + +### Config and runtime actions + +- `config.get` +- `config.patch` +- `config.apply` +- `gateway.restart` +- `tools.effective` +- `skills.status` + +### File and artifact actions + +- `file.write` +- `file.read` +- `file.delete` +- `file.touchTime` +- `artifact.captureGeneratedImage` +- `artifact.capturePath` + +### Memory and cron actions + +- `memory.indexForce` +- `memory.searchCli` +- `doctor.memory.status` +- `cron.list` +- `cron.run` +- `cron.waitCompletion` +- `sessionTranscript.write` + +### MCP actions + +- `mcp.callTool` + +### Assertions + +- `outbound.textIncludes` +- `outbound.inThread` +- `outbound.notInRoot` +- `tool.called` +- `tool.notPresent` +- `skill.visible` +- `skill.disabled` +- `file.contains` +- `memory.contains` +- `requestLog.matches` +- `sessionStore.matches` +- `cron.managedPresent` +- `artifact.exists` + +## Variables and Artifact References + +The DSL must support saved outputs and later references. + +Examples from the current suite: + +- create a thread, then reuse `threadId` +- create a session, then reuse `sessionKey` +- generate an image, then attach the file on the next turn +- generate a wake marker string, then assert that it appears later + +Needed capabilities: + +- `saveAs` +- `${vars.name}` +- `${artifacts.name}` +- typed references for paths, session keys, thread ids, markers, tool outputs + +Without variable support, the harness will keep leaking scenario logic back into TypeScript. + +## What Should Stay As Escape Hatches + +A fully pure declarative runner is not realistic in phase 1. + +Some scenarios are inherently orchestration-heavy: + +- memory dreaming sweep +- config apply restart wake-up +- config restart capability flip +- generated image artifact resolution by timestamp/path +- discovery-report evaluation + +These should use explicit custom handlers for now. + +Recommended rule: + +- 85-90% declarative +- explicit `customHandler` steps for the hard remainder +- named and documented custom handlers only +- no anonymous inline code in the scenario file + +That keeps the generic engine clean while still allowing progress. + +## Architecture Change + +### Current + +Scenario markdown already is the source of truth for: + +- suite execution +- workspace bootstrap files +- QA Lab UI scenario catalog +- report metadata +- discovery prompts + +Generated compatibility: + +- seeded workspace still includes `QA_KICKOFF_TASK.md` +- seeded workspace still includes `QA_SCENARIO_PLAN.md` +- seeded workspace now also includes `QA_SCENARIOS.md` + +## Refactor Plan + +### Phase 1: loader and schema + +Done. + +- added `qa/scenarios.md` +- added parser for named markdown YAML pack content +- validated with zod +- switched consumers to the parsed pack +- removed repo-level `qa/seed-scenarios.json` and `qa/QA_KICKOFF_TASK.md` + +### Phase 2: generic engine + +- split `extensions/qa-lab/src/suite.ts` into: + - loader + - engine + - action registry + - assertion registry + - custom handlers +- keep existing helper functions as engine operations + +Deliverable: + +- engine executes simple declarative scenarios + +Start with scenarios that are mostly prompt + wait + assert: + +- threaded follow-up +- image understanding from attachment +- skill visibility and invocation +- channel baseline + +Deliverable: + +- first real markdown-defined scenarios shipping through the generic engine + +### Phase 4: migrate medium scenarios + +- image generation roundtrip +- memory tools in channel context +- session memory ranking +- subagent handoff +- subagent fanout synthesis + +Deliverable: + +- variables, artifacts, tool assertions, request-log assertions proven out + +### Phase 5: keep hard scenarios on custom handlers + +- memory dreaming sweep +- config apply restart wake-up +- config restart capability flip +- runtime inventory drift + +Deliverable: + +- same authoring format, but with explicit custom-step blocks where needed + +### Phase 6: delete hardcoded scenario map + +Once the pack coverage is good enough: + +- remove most scenario-specific TypeScript branching from `extensions/qa-lab/src/suite.ts` + +## Fake Slack / Rich Media Support + +The current QA bus is text-first. + +Relevant files: + +- `extensions/qa-channel/src/protocol.ts` +- `extensions/qa-lab/src/bus-state.ts` +- `extensions/qa-lab/src/bus-queries.ts` +- `extensions/qa-lab/src/bus-server.ts` +- `extensions/qa-lab/web/src/ui-render.ts` + +Today the QA bus supports: + +- text +- reactions +- threads + +It does not yet model inline media attachments. + +### Needed transport contract + +Add a generic QA bus attachment model: + +```ts +type QaBusAttachment = { + id: string; + kind: "image" | "video" | "audio" | "file"; + mimeType: string; + fileName?: string; + inline?: boolean; + url?: string; + contentBase64?: string; + width?: number; + height?: number; + durationMs?: number; + altText?: string; + transcript?: string; +}; +```` + +Then add `attachments?: QaBusAttachment[]` to: + +- `QaBusMessage` +- `QaBusInboundMessageInput` +- `QaBusOutboundMessageInput` + +### Why generic first + +Do not build a Slack-only media model. + +Instead: + +- one generic QA transport model +- multiple renderers on top of it + - current QA Lab chat + - future fake Slack web + - any other fake transport views + +This prevents duplicate logic and lets media scenarios stay transport-agnostic. + +### UI work needed + +Update the QA UI to render: + +- inline image preview +- inline audio player +- inline video player +- file attachment chip + +The current UI can already render threads and reactions, so attachment rendering should layer onto the same message card model. + +### Scenario work enabled by media transport + +Once attachments flow through QA bus, we can add richer fake-chat scenarios: + +- inline image reply in fake Slack +- audio attachment understanding +- video attachment understanding +- mixed attachment ordering +- thread reply with media retained + +## Recommendation + +The next implementation chunk should be: + +1. add markdown scenario loader + zod schema +2. generate the current catalog from markdown +3. migrate a few simple scenarios first +4. add generic QA bus attachment support +5. render inline image in the QA UI +6. then expand to audio and video + +This is the smallest path that proves both goals: + +- generic markdown-defined QA +- richer fake messaging surfaces + +## Open Questions + +- whether scenario files should allow embedded markdown prompt templates with variable interpolation +- whether setup/cleanup should be named sections or just ordered action lists +- whether artifact references should be strongly typed in schema or string-based +- whether custom handlers should live in one registry or per-surface registries +- whether the generated JSON compatibility file should remain checked in during migration diff --git a/extensions/qa-channel/src/bus-client.ts b/extensions/qa-channel/src/bus-client.ts index ca109cee65a..bd79351962e 100644 --- a/extensions/qa-channel/src/bus-client.ts +++ b/extensions/qa-channel/src/bus-client.ts @@ -10,6 +10,7 @@ import type { } from "./protocol.js"; export type { + QaBusAttachment, QaBusConversation, QaBusConversationKind, QaBusCreateThreadInput, @@ -140,6 +141,7 @@ export async function sendQaBusMessage(params: { senderName?: string; threadId?: string; replyToId?: string; + attachments?: import("./protocol.js").QaBusAttachment[]; }) { return await postJson<{ message: QaBusMessage }>(params.baseUrl, "/v1/outbound/message", params); } diff --git a/extensions/qa-channel/src/protocol.ts b/extensions/qa-channel/src/protocol.ts index ce00a37d057..a6bc175be89 100644 --- a/extensions/qa-channel/src/protocol.ts +++ b/extensions/qa-channel/src/protocol.ts @@ -6,6 +6,21 @@ export type QaBusConversation = { title?: string; }; +export type QaBusAttachment = { + id: string; + kind: "image" | "video" | "audio" | "file"; + mimeType: string; + fileName?: string; + inline?: boolean; + url?: string; + contentBase64?: string; + width?: number; + height?: number; + durationMs?: number; + altText?: string; + transcript?: string; +}; + export type QaBusMessage = { id: string; accountId: string; @@ -20,6 +35,7 @@ export type QaBusMessage = { replyToId?: string; deleted?: boolean; editedAt?: number; + attachments?: QaBusAttachment[]; reactions: Array<{ emoji: string; senderId: string; @@ -86,6 +102,7 @@ export type QaBusInboundMessageInput = { threadId?: string; threadTitle?: string; replyToId?: string; + attachments?: QaBusAttachment[]; }; export type QaBusOutboundMessageInput = { @@ -97,6 +114,7 @@ export type QaBusOutboundMessageInput = { timestamp?: number; threadId?: string; replyToId?: string; + attachments?: QaBusAttachment[]; }; export type QaBusCreateThreadInput = { diff --git a/extensions/qa-lab/src/bus-queries.ts b/extensions/qa-lab/src/bus-queries.ts index bb7d0323cb3..38670ade771 100644 --- a/extensions/qa-lab/src/bus-queries.ts +++ b/extensions/qa-lab/src/bus-queries.ts @@ -1,5 +1,6 @@ import { normalizeOptionalLowercaseString } from "openclaw/plugin-sdk/text-runtime"; import type { + QaBusAttachment, QaBusConversation, QaBusEvent, QaBusMessage, @@ -52,10 +53,15 @@ export function cloneMessage(message: QaBusMessage): QaBusMessage { return { ...message, conversation: { ...message.conversation }, + attachments: (message.attachments ?? []).map((attachment) => cloneAttachment(attachment)), reactions: message.reactions.map((reaction) => ({ ...reaction })), }; } +function cloneAttachment(attachment: QaBusAttachment): QaBusAttachment { + return { ...attachment }; +} + export function cloneEvent(event: QaBusEvent): QaBusEvent { switch (event.kind) { case "inbound-message": @@ -113,9 +119,24 @@ export function searchQaBusMessages(params: { .filter((message) => params.input.threadId ? message.threadId === params.input.threadId : true, ) - .filter((message) => - query ? normalizeOptionalLowercaseString(message.text)?.includes(query) === true : true, - ) + .filter((message) => { + if (!query) { + return true; + } + const attachmentHaystack = message.attachments ?? []; + const searchableAttachmentText = attachmentHaystack + .flatMap((attachment) => [ + attachment.fileName, + attachment.altText, + attachment.transcript, + attachment.mimeType, + ]) + .filter((value): value is string => Boolean(value)) + .join(" ") + .toLowerCase(); + const messageText = normalizeOptionalLowercaseString(message.text) ?? ""; + return `${messageText} ${searchableAttachmentText}`.includes(query); + }) .slice(-limit) .map((message) => cloneMessage(message)); } diff --git a/extensions/qa-lab/src/bus-state.test.ts b/extensions/qa-lab/src/bus-state.test.ts index 9acc89d9962..0060cc350d0 100644 --- a/extensions/qa-lab/src/bus-state.test.ts +++ b/extensions/qa-lab/src/bus-state.test.ts @@ -91,4 +91,41 @@ describe("qa-bus state", () => { }), ).rejects.toThrow("qa-bus wait timeout"); }); + + it("preserves inline attachments and lets search match attachment metadata", () => { + const state = createQaBusState(); + + const outbound = state.addOutboundMessage({ + to: "dm:alice", + text: "artifact attached", + attachments: [ + { + id: "image-1", + kind: "image", + mimeType: "image/png", + fileName: "qa-screenshot.png", + altText: "QA dashboard screenshot", + contentBase64: "aGVsbG8=", + }, + ], + }); + + const readback = state.readMessage({ messageId: outbound.id }); + expect(readback.attachments).toHaveLength(1); + expect(readback.attachments?.[0]).toMatchObject({ + kind: "image", + fileName: "qa-screenshot.png", + altText: "QA dashboard screenshot", + }); + + const byFilename = state.searchMessages({ + query: "screenshot", + }); + expect(byFilename.some((message) => message.id === outbound.id)).toBe(true); + + const byAltText = state.searchMessages({ + query: "dashboard", + }); + expect(byAltText.some((message) => message.id === outbound.id)).toBe(true); + }); }); diff --git a/extensions/qa-lab/src/bus-state.ts b/extensions/qa-lab/src/bus-state.ts index 6beddca6734..6a8dc67efaf 100644 --- a/extensions/qa-lab/src/bus-state.ts +++ b/extensions/qa-lab/src/bus-state.ts @@ -10,6 +10,7 @@ import { } from "./bus-queries.js"; import { createQaBusWaiterStore } from "./bus-waiters.js"; import type { + QaBusAttachment, QaBusConversation, QaBusCreateThreadInput, QaBusDeleteMessageInput, @@ -86,6 +87,7 @@ export function createQaBusState() { threadId?: string; threadTitle?: string; replyToId?: string; + attachments?: QaBusAttachment[]; }): QaBusMessage => { const conversation = ensureConversation(params.conversation); const message: QaBusMessage = { @@ -100,6 +102,7 @@ export function createQaBusState() { threadId: params.threadId, threadTitle: params.threadTitle, replyToId: params.replyToId, + attachments: params.attachments?.map((attachment) => ({ ...attachment })) ?? [], reactions: [], }; messages.set(message.id, message); @@ -138,6 +141,7 @@ export function createQaBusState() { threadId: input.threadId, threadTitle: input.threadTitle, replyToId: input.replyToId, + attachments: input.attachments, }); pushEvent({ kind: "inbound-message", @@ -159,6 +163,7 @@ export function createQaBusState() { timestamp: input.timestamp, threadId: input.threadId ?? threadId, replyToId: input.replyToId, + attachments: input.attachments, }); pushEvent({ kind: "outbound-message", diff --git a/extensions/qa-lab/src/discovery-eval.test.ts b/extensions/qa-lab/src/discovery-eval.test.ts index 981357e8358..badc4edd6e2 100644 --- a/extensions/qa-lab/src/discovery-eval.test.ts +++ b/extensions/qa-lab/src/discovery-eval.test.ts @@ -9,7 +9,7 @@ describe("qa discovery evaluation", () => { it("accepts rich discovery reports that explicitly confirm all required files were read", () => { const report = ` Worked -- Read all four requested files: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md. +- Read all three requested files: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md. Failed - None. Blocked @@ -28,8 +28,8 @@ The helper text mentions banned phrases like "not present", "missing files", "bl it("accepts numeric 'all 4 required files read' confirmations", () => { const report = ` Worked -- Source: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md -- all 4 required files read. +- Source: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md +- all 3 required files read. Failed - None. Blocked @@ -48,8 +48,8 @@ The report may quote phrases like "not present" while describing the evaluator, it("accepts claude-style 'all four files retrieved' discovery summaries", () => { const report = ` Worked -- All four files retrieved. Now let me compile the protocol report. -- All four mandated files read successfully: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md. +- All three files retrieved. Now let me compile the protocol report. +- All three mandated files read successfully: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md. Failed - None. Blocked @@ -83,7 +83,7 @@ Follow-up it("flags discovery replies that drift into unrelated suite wrap-up claims", () => { const report = ` Worked -- All four requested files were read: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md. +- All three requested files were read: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md. Failed - None. Blocked diff --git a/extensions/qa-lab/src/discovery-eval.ts b/extensions/qa-lab/src/discovery-eval.ts index b43f564f329..66961dc0118 100644 --- a/extensions/qa-lab/src/discovery-eval.ts +++ b/extensions/qa-lab/src/discovery-eval.ts @@ -1,8 +1,7 @@ import { normalizeLowercaseStringOrEmpty } from "openclaw/plugin-sdk/text-runtime"; const REQUIRED_DISCOVERY_REFS = [ - "repo/qa/seed-scenarios.json", - "repo/qa/QA_KICKOFF_TASK.md", + "repo/qa/scenarios.md", "repo/extensions/qa-lab/src/suite.ts", "repo/docs/help/testing.md", ] as const; @@ -21,14 +20,15 @@ const DISCOVERY_SCOPE_LEAK_PHRASES = [ function confirmsDiscoveryFileRead(text: string) { const lower = normalizeLowercaseStringOrEmpty(text); const mentionsAllRefs = REQUIRED_DISCOVERY_REFS_LOWER.every((ref) => lower.includes(ref)); + const requiredCountPattern = "(?:three|3|four|4)"; const confirmsRead = - /(?:read|retrieved|inspected|loaded|accessed|digested)\s+all\s+(?:four|4)\s+(?:(?:requested|required|mandated|seeded)\s+)?files/.test( - lower, - ) || - /all\s+(?:four|4)\s+(?:(?:requested|required|mandated|seeded)\s+)?files\s+(?:were\s+)?(?:read|retrieved|inspected|loaded|accessed|digested)(?:\s+\w+)?/.test( - lower, - ) || - /all (?:four|4) seeded files readable/.test(lower); + new RegExp( + `(?:read|retrieved|inspected|loaded|accessed|digested)\\s+all\\s+${requiredCountPattern}\\s+(?:(?:requested|required|mandated|seeded)\\s+)?files`, + ).test(lower) || + new RegExp( + `all\\s+${requiredCountPattern}\\s+(?:(?:requested|required|mandated|seeded)\\s+)?files\\s+(?:were\\s+)?(?:read|retrieved|inspected|loaded|accessed|digested)(?:\\s+\\w+)?`, + ).test(lower) || + new RegExp(`all\\s+${requiredCountPattern}\\s+seeded files readable`).test(lower); return mentionsAllRefs && confirmsRead; } diff --git a/extensions/qa-lab/src/docker-harness.test.ts b/extensions/qa-lab/src/docker-harness.test.ts index f74dad030fd..d5823cdc7f4 100644 --- a/extensions/qa-lab/src/docker-harness.test.ts +++ b/extensions/qa-lab/src/docker-harness.test.ts @@ -38,6 +38,7 @@ describe("qa docker harness", () => { path.join(outputDir, "state", "openclaw.json"), path.join(outputDir, "state", "seed-workspace", "QA_KICKOFF_TASK.md"), path.join(outputDir, "state", "seed-workspace", "QA_SCENARIO_PLAN.md"), + path.join(outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"), path.join(outputDir, "state", "seed-workspace", "IDENTITY.md"), ]), ); @@ -86,6 +87,13 @@ describe("qa docker harness", () => { ); expect(kickoff).toContain("Lobster Invaders"); + const scenarios = await readFile( + path.join(outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"), + "utf8", + ); + expect(scenarios).toContain("```yaml qa-pack"); + expect(scenarios).toContain("subagent-fanout-synthesis"); + const readme = await readFile(path.join(outputDir, "README.md"), "utf8"); expect(readme).toContain("in-process restarts inside Docker"); expect(readme).toContain("pnpm qa:lab:watch"); diff --git a/extensions/qa-lab/src/docker-harness.ts b/extensions/qa-lab/src/docker-harness.ts index c07fddd5273..bcb94aa4664 100644 --- a/extensions/qa-lab/src/docker-harness.ts +++ b/extensions/qa-lab/src/docker-harness.ts @@ -323,6 +323,7 @@ export async function writeQaDockerHarnessFiles(params: { path.join(params.outputDir, "state", "seed-workspace", "IDENTITY.md"), path.join(params.outputDir, "state", "seed-workspace", "QA_KICKOFF_TASK.md"), path.join(params.outputDir, "state", "seed-workspace", "QA_SCENARIO_PLAN.md"), + path.join(params.outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"), ], }; } diff --git a/extensions/qa-lab/src/qa-agent-bootstrap.ts b/extensions/qa-lab/src/qa-agent-bootstrap.ts index ac3666c8774..682e8a8e3dc 100644 --- a/extensions/qa-lab/src/qa-agent-bootstrap.ts +++ b/extensions/qa-lab/src/qa-agent-bootstrap.ts @@ -1,22 +1,13 @@ -import { readQaBootstrapScenarioCatalog } from "./scenario-catalog.js"; +import { + DEFAULT_QA_AGENT_IDENTITY_MARKDOWN, + readQaBootstrapScenarioCatalog, +} from "./scenario-catalog.js"; -export const QA_AGENT_IDENTITY_MARKDOWN = `# Dev C-3PO - -You are the OpenClaw QA operator agent. - -Persona: -- protocol-minded -- precise -- a little flustered -- conscientious -- eager to report what worked, failed, or remains blocked - -Style: -- read source and docs first -- test systematically -- record evidence -- end with a concise protocol report -`; +export function readQaAgentIdentityMarkdown(): string { + return ( + readQaBootstrapScenarioCatalog().agentIdentityMarkdown || DEFAULT_QA_AGENT_IDENTITY_MARKDOWN + ); +} export function buildQaScenarioPlanMarkdown(): string { const catalog = readQaBootstrapScenarioCatalog(); @@ -27,6 +18,9 @@ export function buildQaScenarioPlanMarkdown(): string { lines.push(`- id: ${scenario.id}`); lines.push(`- surface: ${scenario.surface}`); lines.push(`- objective: ${scenario.objective}`); + if (scenario.execution?.summary) { + lines.push(`- execution: ${scenario.execution.summary}`); + } lines.push("- success criteria:"); for (const criterion of scenario.successCriteria) { lines.push(` - ${criterion}`); diff --git a/extensions/qa-lab/src/qa-agent-workspace.ts b/extensions/qa-lab/src/qa-agent-workspace.ts index f5c6dcbd5b6..73d9b340b99 100644 --- a/extensions/qa-lab/src/qa-agent-workspace.ts +++ b/extensions/qa-lab/src/qa-agent-workspace.ts @@ -1,7 +1,7 @@ import fs from "node:fs/promises"; import path from "node:path"; -import { buildQaScenarioPlanMarkdown, QA_AGENT_IDENTITY_MARKDOWN } from "./qa-agent-bootstrap.js"; -import { readQaBootstrapScenarioCatalog } from "./scenario-catalog.js"; +import { buildQaScenarioPlanMarkdown, readQaAgentIdentityMarkdown } from "./qa-agent-bootstrap.js"; +import { readQaBootstrapScenarioCatalog, readQaScenarioPackMarkdown } from "./scenario-catalog.js"; export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoRoot?: string }) { const catalog = readQaBootstrapScenarioCatalog(); @@ -9,9 +9,10 @@ export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoR const kickoffTask = catalog.kickoffTask || "QA mission unavailable."; const files = new Map([ - ["IDENTITY.md", QA_AGENT_IDENTITY_MARKDOWN], + ["IDENTITY.md", readQaAgentIdentityMarkdown()], ["QA_KICKOFF_TASK.md", kickoffTask], ["QA_SCENARIO_PLAN.md", buildQaScenarioPlanMarkdown()], + ["QA_SCENARIOS.md", readQaScenarioPackMarkdown()], ]); if (params.repoRoot) { @@ -22,6 +23,7 @@ export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoR - repo: ./repo/ - kickoff: ./QA_KICKOFF_TASK.md - scenario plan: ./QA_SCENARIO_PLAN.md +- scenario pack: ./QA_SCENARIOS.md - identity: ./IDENTITY.md The mounted repo source should be available read-only under \`./repo/\`. diff --git a/extensions/qa-lab/src/runtime-api.ts b/extensions/qa-lab/src/runtime-api.ts index bae2854511d..9b8c845e99c 100644 --- a/extensions/qa-lab/src/runtime-api.ts +++ b/extensions/qa-lab/src/runtime-api.ts @@ -20,6 +20,7 @@ export { setQaChannelRuntime, } from "openclaw/plugin-sdk/qa-channel"; export type { + QaBusAttachment, QaBusConversation, QaBusCreateThreadInput, QaBusDeleteMessageInput, diff --git a/extensions/qa-lab/src/scenario-catalog.test.ts b/extensions/qa-lab/src/scenario-catalog.test.ts new file mode 100644 index 00000000000..71eb993f1fb --- /dev/null +++ b/extensions/qa-lab/src/scenario-catalog.test.ts @@ -0,0 +1,26 @@ +import { describe, expect, it } from "vitest"; +import { readQaBootstrapScenarioCatalog, readQaScenarioPack } from "./scenario-catalog.js"; + +describe("qa scenario catalog", () => { + it("loads the markdown pack as the canonical source of truth", () => { + const pack = readQaScenarioPack(); + + expect(pack.version).toBe(1); + expect(pack.agent.identityMarkdown).toContain("Dev C-3PO"); + expect(pack.kickoffTask).toContain("Lobster Invaders"); + expect(pack.scenarios.some((scenario) => scenario.id === "image-generation-roundtrip")).toBe( + true, + ); + expect(pack.scenarios.every((scenario) => scenario.execution?.kind === "custom")).toBe(true); + }); + + it("exposes bootstrap data from the markdown pack", () => { + const catalog = readQaBootstrapScenarioCatalog(); + + expect(catalog.agentIdentityMarkdown).toContain("protocol-minded"); + expect(catalog.kickoffTask).toContain("Track what worked"); + expect(catalog.scenarios.some((scenario) => scenario.id === "subagent-fanout-synthesis")).toBe( + true, + ); + }); +}); diff --git a/extensions/qa-lab/src/scenario-catalog.ts b/extensions/qa-lab/src/scenario-catalog.ts index 76155b5f526..59ae44bfc7d 100644 --- a/extensions/qa-lab/src/scenario-catalog.ts +++ b/extensions/qa-lab/src/scenario-catalog.ts @@ -1,21 +1,68 @@ import fs from "node:fs"; import path from "node:path"; +import YAML from "yaml"; +import { z } from "zod"; -export type QaSeedScenario = { - id: string; - title: string; - surface: string; - objective: string; - successCriteria: string[]; - docsRefs?: string[]; - codeRefs?: string[]; -}; +export const DEFAULT_QA_AGENT_IDENTITY_MARKDOWN = `# Dev C-3PO + +You are the OpenClaw QA operator agent. + +Persona: +- protocol-minded +- precise +- a little flustered +- conscientious +- eager to report what worked, failed, or remains blocked + +Style: +- read source and docs first +- test systematically +- record evidence +- end with a concise protocol report`; + +const qaScenarioExecutionSchema = z.object({ + kind: z.literal("custom").default("custom"), + handler: z.string().trim().min(1), + summary: z.string().trim().min(1).optional(), +}); + +const qaSeedScenarioSchema = z.object({ + id: z.string().trim().min(1), + title: z.string().trim().min(1), + surface: z.string().trim().min(1), + objective: z.string().trim().min(1), + successCriteria: z.array(z.string().trim().min(1)).min(1), + docsRefs: z.array(z.string().trim().min(1)).optional(), + codeRefs: z.array(z.string().trim().min(1)).optional(), + execution: qaScenarioExecutionSchema.optional(), +}); + +const qaScenarioPackSchema = z.object({ + version: z.number().int().positive(), + agent: z + .object({ + identityMarkdown: z.string().trim().min(1), + }) + .default({ + identityMarkdown: DEFAULT_QA_AGENT_IDENTITY_MARKDOWN, + }), + kickoffTask: z.string().trim().min(1), + scenarios: z.array(qaSeedScenarioSchema).min(1), +}); + +export type QaScenarioExecution = z.infer; +export type QaSeedScenario = z.infer; +export type QaScenarioPack = z.infer; export type QaBootstrapScenarioCatalog = { + agentIdentityMarkdown: string; kickoffTask: string; scenarios: QaSeedScenario[]; }; +const QA_SCENARIO_PACK_PATH = "qa/scenarios.md"; +const QA_PACK_FENCE_RE = /```ya?ml qa-pack\r?\n([\s\S]*?)\r?\n```/i; + function walkUpDirectories(start: string): string[] { const roots: string[] = []; let current = path.resolve(start); @@ -44,20 +91,37 @@ function readTextFile(relativePath: string): string { if (!resolved) { return ""; } - return fs.readFileSync(resolved, "utf8").trim(); + return fs.readFileSync(resolved, "utf8"); } -function readScenarioFile(relativePath: string): QaSeedScenario[] { - const resolved = resolveRepoFile(relativePath); - if (!resolved) { - return []; +function extractQaPackYaml(content: string) { + const match = content.match(QA_PACK_FENCE_RE); + if (!match?.[1]) { + throw new Error( + `qa scenario pack missing \`\`\`yaml qa-pack fence in ${QA_SCENARIO_PACK_PATH}`, + ); } - return JSON.parse(fs.readFileSync(resolved, "utf8")) as QaSeedScenario[]; + return match[1]; +} + +export function readQaScenarioPackMarkdown(): string { + return readTextFile(QA_SCENARIO_PACK_PATH).trim(); +} + +export function readQaScenarioPack(): QaScenarioPack { + const markdown = readQaScenarioPackMarkdown(); + if (!markdown) { + throw new Error(`qa scenario pack not found: ${QA_SCENARIO_PACK_PATH}`); + } + const parsed = YAML.parse(extractQaPackYaml(markdown)) as unknown; + return qaScenarioPackSchema.parse(parsed); } export function readQaBootstrapScenarioCatalog(): QaBootstrapScenarioCatalog { + const pack = readQaScenarioPack(); return { - kickoffTask: readTextFile("qa/QA_KICKOFF_TASK.md"), - scenarios: readScenarioFile("qa/seed-scenarios.json"), + agentIdentityMarkdown: pack.agent.identityMarkdown, + kickoffTask: pack.kickoffTask, + scenarios: pack.scenarios, }; } diff --git a/extensions/qa-lab/src/suite.ts b/extensions/qa-lab/src/suite.ts index ead1a1adbc1..a3a18eaaa44 100644 --- a/extensions/qa-lab/src/suite.ts +++ b/extensions/qa-lab/src/suite.ts @@ -1252,7 +1252,7 @@ function buildScenarioMap(env: QaSuiteEnvironment) { await runAgentPrompt(env, { sessionKey: "agent:qa:discovery", message: - "Read the seeded docs and source plan. The full repo is mounted under ./repo/. Explicitly inspect repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md, then report grouped into Worked, Failed, Blocked, and Follow-up. Mention at least two extra QA scenarios beyond the seed list.", + "Read the seeded docs and source plan. The full repo is mounted under ./repo/. Explicitly inspect repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md, then report grouped into Worked, Failed, Blocked, and Follow-up. Mention at least two extra QA scenarios beyond the seed list.", timeoutMs: liveTurnTimeoutMs(env, 30_000), }); const outbound = await waitForCondition( @@ -2860,7 +2860,7 @@ export async function runQaSuite(params?: { }); for (const [index, scenario] of selectedCatalogScenarios.entries()) { - const run = scenarioMap.get(scenario.id); + const run = scenarioMap.get(scenario.execution?.handler || scenario.id); if (!run) { const missingResult = { name: scenario.title, diff --git a/extensions/qa-lab/web/src/styles.css b/extensions/qa-lab/web/src/styles.css index e6c95953a81..c4f1ca9ea66 100644 --- a/extensions/qa-lab/web/src/styles.css +++ b/extensions/qa-lab/web/src/styles.css @@ -947,6 +947,59 @@ select { word-break: break-word; } +.msg-attachments { + display: grid; + gap: 10px; + margin-top: 10px; +} + +.msg-attachment { + border: 1px solid var(--border); + background: var(--bg-elevated); + border-radius: 12px; + overflow: hidden; +} + +.msg-attachment img, +.msg-attachment video { + display: block; + width: min(100%, 420px); + max-width: 100%; + background: #000; +} + +.msg-attachment-audio { + padding: 12px; +} + +.msg-attachment audio { + width: min(100%, 360px); + display: block; +} + +.msg-attachment figcaption, +.msg-attachment-file { + padding: 10px 12px; + font-size: 12px; + color: var(--text-secondary); +} + +.msg-attachment-link { + color: var(--accent); + text-decoration: none; + font-weight: 600; +} + +.msg-attachment-link:hover { + text-decoration: underline; +} + +.msg-attachment-transcript { + margin-top: 8px; + color: var(--text-tertiary); + white-space: pre-wrap; +} + .msg-meta { display: flex; align-items: center; diff --git a/extensions/qa-lab/web/src/ui-render.ts b/extensions/qa-lab/web/src/ui-render.ts index 5cbb1ea0117..307fc263e97 100644 --- a/extensions/qa-lab/web/src/ui-render.ts +++ b/extensions/qa-lab/web/src/ui-render.ts @@ -6,6 +6,21 @@ export type Conversation = { title?: string; }; +export type Attachment = { + id: string; + kind: "image" | "video" | "audio" | "file"; + mimeType: string; + fileName?: string; + inline?: boolean; + url?: string; + contentBase64?: string; + width?: number; + height?: number; + durationMs?: number; + altText?: string; + transcript?: string; +}; + export type Thread = { id: string; conversationId: string; @@ -24,6 +39,7 @@ export type Message = { threadTitle?: string; deleted?: boolean; editedAt?: number; + attachments?: Attachment[]; reactions: Array<{ emoji: string; senderId: string }>; }; @@ -198,6 +214,56 @@ function esc(text: string) { .replaceAll('"', """); } +function attachmentSourceUrl(attachment: Attachment): string | null { + if (attachment.url?.trim()) { + return attachment.url; + } + if (attachment.contentBase64?.trim()) { + return `data:${attachment.mimeType};base64,${attachment.contentBase64}`; + } + return null; +} + +function renderMessageAttachments(message: Message): string { + const attachments = message.attachments ?? []; + if (attachments.length === 0) { + return ""; + } + const items = attachments + .map((attachment) => { + const sourceUrl = attachmentSourceUrl(attachment); + const label = attachment.fileName || attachment.altText || attachment.mimeType; + if (attachment.kind === "image" && sourceUrl) { + return `
+ ${esc(attachment.altText || label)} +
${esc(label)}
+
`; + } + if (attachment.kind === "video" && sourceUrl) { + return `
+ +
${esc(label)}
+
`; + } + if (attachment.kind === "audio" && sourceUrl) { + return `
+ +
${esc(label)}
+
`; + } + const transcript = attachment.transcript?.trim() + ? `
${esc(attachment.transcript)}
` + : ""; + const href = sourceUrl ? ` href="${esc(sourceUrl)}" target="_blank" rel="noreferrer"` : ""; + return `
+ ${esc(label)} + ${transcript} +
`; + }) + .join(""); + return `
${items}
`; +} + const MOCK_MODELS: RunnerModelOption[] = [ { key: "mock-openai/gpt-5.4", @@ -626,6 +692,7 @@ function renderMessage(m: Message): string { ${formatTime(m.timestamp)}
${esc(m.text)}
+ ${renderMessageAttachments(m)} ${metaTags.length > 0 || reactions ? `
${metaTags.join("")}${reactions}
` : ""} `; diff --git a/qa/QA_KICKOFF_TASK.md b/qa/QA_KICKOFF_TASK.md deleted file mode 100644 index e09e4a61fcb..00000000000 --- a/qa/QA_KICKOFF_TASK.md +++ /dev/null @@ -1,15 +0,0 @@ -QA mission: -Understand this OpenClaw repo from source + docs before acting. -The repo is available in your workspace at `./repo/`. -Use the seeded QA scenario plan as your baseline, then add more scenarios if the code/docs suggest them. -Run the scenarios through the real qa-channel surfaces where possible. -Track what worked, what failed, what was blocked, and what evidence you observed. -End with a concise report grouped into worked / failed / blocked / follow-up. - -Important expectations: - -- Check both DM and channel behavior. -- Include a Lobster Invaders build task. -- Include a cron reminder about one minute in the future. -- Read docs and source before proposing extra QA scenarios. -- Keep your tone in the configured dev C-3PO personality. diff --git a/qa/README.md b/qa/README.md index 3063c079026..756ba580770 100644 --- a/qa/README.md +++ b/qa/README.md @@ -4,9 +4,8 @@ Seed QA assets for the private `qa-lab` extension. Files: -- `QA_KICKOFF_TASK.md` - operator prompt for the QA agent. +- `scenarios.md` - canonical QA scenario pack, kickoff mission, and operator identity. - `frontier-harness-plan.md` - big-model bakeoff and tuning loop for harness work. -- `seed-scenarios.json` - repo-backed baseline QA scenarios. Key workflow: diff --git a/qa/scenarios.md b/qa/scenarios.md new file mode 100644 index 00000000000..82ee4c31aae --- /dev/null +++ b/qa/scenarios.md @@ -0,0 +1,563 @@ +# OpenClaw QA Scenario Pack + +Single source of truth for the repo-backed QA suite. + +- kickoff mission +- QA operator identity +- scenario metadata +- handler bindings for the executable harness + +```yaml qa-pack +version: 1 +agent: + identityMarkdown: |- + # Dev C-3PO + + You are the OpenClaw QA operator agent. + + Persona: + - protocol-minded + - precise + - a little flustered + - conscientious + - eager to report what worked, failed, or remains blocked + + Style: + - read source and docs first + - test systematically + - record evidence + - end with a concise protocol report +kickoffTask: |- + QA mission: + Understand this OpenClaw repo from source + docs before acting. + The repo is available in your workspace at `./repo/`. + Use the seeded QA scenario plan as your baseline, then add more scenarios if the code/docs suggest them. + Run the scenarios through the real qa-channel surfaces where possible. + Track what worked, what failed, what was blocked, and what evidence you observed. + End with a concise report grouped into worked / failed / blocked / follow-up. + + Important expectations: + + - Check both DM and channel behavior. + - Include a Lobster Invaders build task. + - Include a cron reminder about one minute in the future. + - Read docs and source before proposing extra QA scenarios. + - Keep your tone in the configured dev C-3PO personality. +scenarios: + - id: channel-chat-baseline + title: Channel baseline conversation + surface: channel + objective: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics. + successCriteria: + - Agent replies in the shared channel transcript. + - Agent keeps the conversation scoped to the channel. + - Agent respects mention-driven group routing semantics. + docsRefs: + - docs/channels/group-messages.md + - docs/channels/qa-channel.md + codeRefs: + - extensions/qa-channel/src/inbound.ts + - extensions/qa-lab/src/bus-state.ts + execution: + kind: custom + handler: channel-chat-baseline + summary: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics. + - id: cron-one-minute-ping + title: Cron one-minute ping + surface: cron + objective: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel. + successCriteria: + - Agent schedules a cron reminder roughly one minute ahead. + - Reminder returns through qa-channel. + - Agent recognizes the reminder as part of the original task. + docsRefs: + - docs/help/testing.md + - docs/channels/qa-channel.md + codeRefs: + - extensions/qa-lab/src/bus-server.ts + - extensions/qa-lab/src/self-check.ts + execution: + kind: custom + handler: cron-one-minute-ping + summary: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel. + - id: dm-chat-baseline + title: DM baseline conversation + surface: dm + objective: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character. + successCriteria: + - Agent replies in DM without channel routing mistakes. + - Agent explains the QA lab and message bus correctly. + - Agent keeps the dev C-3PO personality. + docsRefs: + - docs/channels/qa-channel.md + - docs/help/testing.md + codeRefs: + - extensions/qa-channel/src/gateway.ts + - extensions/qa-lab/src/lab-server.ts + execution: + kind: custom + handler: dm-chat-baseline + summary: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character. + - id: lobster-invaders-build + title: Build Lobster Invaders + surface: workspace + objective: Verify the agent can read the repo, create a tiny playable artifact, and report what changed. + successCriteria: + - Agent inspects source before coding. + - Agent builds a tiny playable Lobster Invaders artifact. + - Agent explains how to run or view the artifact. + docsRefs: + - docs/help/testing.md + - docs/web/dashboard.md + codeRefs: + - extensions/qa-lab/src/report.ts + - extensions/qa-lab/web/src/app.ts + execution: + kind: custom + handler: lobster-invaders-build + summary: Verify the agent can read the repo, create a tiny playable artifact, and report what changed. + - id: memory-recall + title: Memory recall after context switch + surface: memory + objective: Verify the agent can store a fact, switch topics, then recall the fact accurately later. + successCriteria: + - Agent acknowledges the seeded fact. + - Agent later recalls the same fact correctly. + - Recall stays scoped to the active QA conversation. + docsRefs: + - docs/help/testing.md + codeRefs: + - extensions/qa-lab/src/scenario.ts + execution: + kind: custom + handler: memory-recall + summary: Verify the agent can store a fact, switch topics, then recall the fact accurately later. + - id: memory-dreaming-sweep + title: Memory dreaming sweep + surface: memory + objective: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory. + successCriteria: + - Dreaming can be enabled and doctor.memory.status reports the managed sweep cron. + - Repeated recall signals give the dreaming sweep real material to process. + - A dreaming sweep writes Light Sleep and REM Sleep blocks, then promotes the canary into MEMORY.md. + docsRefs: + - docs/concepts/dreaming.md + - docs/reference/memory-config.md + - docs/web/control-ui.md + codeRefs: + - extensions/memory-core/src/dreaming.ts + - extensions/memory-core/src/dreaming-phases.ts + - src/gateway/server-methods/doctor.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: memory-dreaming-sweep + summary: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory. + - id: model-switch-follow-up + title: Model switch follow-up + surface: models + objective: Verify the agent can switch to a different configured model and continue coherently. + successCriteria: + - Agent reflects the model switch request. + - Follow-up answer remains coherent with prior context. + - Final report notes whether the switch actually happened. + docsRefs: + - docs/help/testing.md + - docs/web/dashboard.md + codeRefs: + - extensions/qa-lab/src/report.ts + execution: + kind: custom + handler: model-switch-follow-up + summary: Verify the agent can switch to a different configured model and continue coherently. + - id: approval-turn-tool-followthrough + title: Approval turn tool followthrough + surface: harness + objective: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration. + successCriteria: + - Agent can keep the pre-action turn brief. + - The short approval leads to a real tool call on the next turn. + - Final answer uses tool-derived evidence instead of placeholder progress text. + docsRefs: + - docs/help/testing.md + - docs/channels/qa-channel.md + codeRefs: + - extensions/qa-lab/src/suite.ts + - extensions/qa-lab/src/mock-openai-server.ts + - src/agents/pi-embedded-runner/run/incomplete-turn.ts + execution: + kind: custom + handler: approval-turn-tool-followthrough + summary: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration. + - id: reaction-edit-delete + title: Reaction, edit, delete lifecycle + surface: message-actions + objective: Verify the agent can use channel-owned message actions and that the QA transcript reflects them. + successCriteria: + - Agent adds at least one reaction. + - Agent edits or replaces a message when asked. + - Transcript shows the action lifecycle correctly. + docsRefs: + - docs/channels/qa-channel.md + codeRefs: + - extensions/qa-channel/src/channel-actions.ts + - extensions/qa-lab/src/self-check-scenario.ts + execution: + kind: custom + handler: reaction-edit-delete + summary: Verify the agent can use channel-owned message actions and that the QA transcript reflects them. + - id: source-docs-discovery-report + title: Source and docs discovery report + surface: discovery + objective: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report. + successCriteria: + - Agent reads docs and source before proposing more tests. + - Agent identifies extra candidate scenarios beyond the seed list. + - Agent ends with a worked or failed QA report. + docsRefs: + - docs/help/testing.md + - docs/web/dashboard.md + - docs/channels/qa-channel.md + codeRefs: + - extensions/qa-lab/src/report.ts + - extensions/qa-lab/src/self-check.ts + - src/agents/system-prompt.ts + execution: + kind: custom + handler: source-docs-discovery-report + summary: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report. + - id: subagent-handoff + title: Subagent handoff + surface: subagents + objective: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread. + successCriteria: + - Agent launches a bounded subagent task. + - Subagent result is acknowledged in the main flow. + - Final answer attributes delegated work clearly. + docsRefs: + - docs/tools/subagents.md + - docs/help/testing.md + codeRefs: + - src/agents/system-prompt.ts + - extensions/qa-lab/src/report.ts + execution: + kind: custom + handler: subagent-handoff + summary: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread. + - id: subagent-fanout-synthesis + title: Subagent fanout synthesis + surface: subagents + objective: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply. + successCriteria: + - Parent flow launches at least two bounded subagent tasks. + - Both delegated results are acknowledged in the main flow. + - Final answer synthesizes both worker outputs in one reply. + docsRefs: + - docs/tools/subagents.md + - docs/help/testing.md + codeRefs: + - src/agents/subagent-spawn.ts + - src/agents/system-prompt.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: subagent-fanout-synthesis + summary: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply. + - id: thread-follow-up + title: Threaded follow-up + surface: thread + objective: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel. + successCriteria: + - Agent creates or uses a thread for deeper work. + - Follow-up messages stay attached to the thread. + - Thread report references the correct prior context. + docsRefs: + - docs/channels/qa-channel.md + - docs/channels/group-messages.md + codeRefs: + - extensions/qa-channel/src/protocol.ts + - extensions/qa-lab/src/bus-state.ts + execution: + kind: custom + handler: thread-follow-up + summary: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel. + - id: memory-tools-channel-context + title: Memory tools in channel context + surface: memory + objective: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript. + successCriteria: + - Agent uses memory_search before answering. + - Agent narrows with memory_get before answering. + - Final reply returns the memory-only fact correctly in-channel. + docsRefs: + - docs/concepts/memory.md + - docs/concepts/memory-search.md + codeRefs: + - extensions/memory-core/src/tools.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: memory-tools-channel-context + summary: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript. + - id: memory-failure-fallback + title: Memory failure fallback + surface: memory + objective: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes. + successCriteria: + - Memory tools are absent from the effective tool inventory. + - Agent does not hallucinate the hidden fact. + - Agent says it could not confirm and surfaces the limitation. + docsRefs: + - docs/concepts/memory.md + - docs/tools/index.md + codeRefs: + - extensions/memory-core/src/tools.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: memory-failure-fallback + summary: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes. + - id: session-memory-ranking + title: Session memory ranking + surface: memory + objective: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact. + successCriteria: + - Session memory indexing is enabled for the scenario. + - Search ranks the newer transcript-backed fact ahead of the stale durable note. + - The agent uses memory tools and answers with the current fact, not the stale one. + docsRefs: + - docs/concepts/memory-search.md + - docs/reference/memory-config.md + codeRefs: + - extensions/memory-core/src/tools.ts + - extensions/memory-core/src/memory/manager.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: session-memory-ranking + summary: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact. + - id: thread-memory-isolation + title: Thread memory isolation + surface: memory + objective: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel. + successCriteria: + - Agent uses memory tools inside the thread. + - The hidden fact is answered correctly in the thread. + - No root-channel outbound message leaks during the threaded memory reply. + docsRefs: + - docs/concepts/memory-search.md + - docs/channels/qa-channel.md + - docs/channels/group-messages.md + codeRefs: + - extensions/memory-core/src/tools.ts + - extensions/qa-channel/src/protocol.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: thread-memory-isolation + summary: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel. + - id: model-switch-tool-continuity + title: Model switch with tool continuity + surface: models + objective: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior. + successCriteria: + - Alternate model is actually requested. + - A tool call still happens after the model switch. + - Final answer acknowledges the handoff and uses the tool-derived evidence. + docsRefs: + - docs/help/testing.md + - docs/concepts/model-failover.md + codeRefs: + - extensions/qa-lab/src/suite.ts + - extensions/qa-lab/src/mock-openai-server.ts + execution: + kind: custom + handler: model-switch-tool-continuity + summary: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior. + - id: mcp-plugin-tools-call + title: MCP plugin-tools call + surface: mcp + objective: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully. + successCriteria: + - Plugin tools MCP server lists memory_search. + - A real MCP client calls memory_search successfully. + - The returned MCP payload includes the expected memory-only fact. + docsRefs: + - docs/cli/mcp.md + - docs/gateway/protocol.md + codeRefs: + - src/mcp/plugin-tools-serve.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: mcp-plugin-tools-call + summary: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully. + - id: skill-visibility-invocation + title: Skill visibility and invocation + surface: skills + objective: Verify a workspace skill becomes visible in skills.status and influences the next agent turn. + successCriteria: + - skills.status reports the seeded skill as visible and eligible. + - The next agent turn reflects the skill instruction marker. + - The result stays scoped to the active QA workspace skill. + docsRefs: + - docs/tools/skills.md + - docs/gateway/protocol.md + codeRefs: + - src/agents/skills-status.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: skill-visibility-invocation + summary: Verify a workspace skill becomes visible in skills.status and influences the next agent turn. + - id: skill-install-hot-availability + title: Skill install hot availability + surface: skills + objective: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately. + successCriteria: + - Skill is absent before install. + - skills.status reports it after install without a restart. + - The next agent turn reflects the new skill marker. + docsRefs: + - docs/tools/skills.md + - docs/gateway/configuration.md + codeRefs: + - src/agents/skills-status.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: skill-install-hot-availability + summary: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately. + - id: native-image-generation + title: Native image generation + surface: image-generation + objective: Verify image_generate appears when configured and returns a real saved media artifact. + successCriteria: + - image_generate appears in the effective tool inventory. + - Agent triggers native image_generate. + - Tool output returns a saved MEDIA path and the file exists. + docsRefs: + - docs/tools/image-generation.md + - docs/providers/openai.md + codeRefs: + - src/agents/tools/image-generate-tool.ts + - extensions/qa-lab/src/mock-openai-server.ts + execution: + kind: custom + handler: native-image-generation + summary: Verify image_generate appears when configured and returns a real saved media artifact. + - id: image-understanding-attachment + title: Image understanding from attachment + surface: image-understanding + objective: Verify an attached image reaches the agent model and the agent can describe what it sees. + successCriteria: + - Agent receives at least one image attachment. + - Final answer describes the visible image content in one short sentence. + - The description mentions the expected red and blue regions. + docsRefs: + - docs/help/testing.md + - docs/tools/index.md + codeRefs: + - src/gateway/server-methods/agent.ts + - extensions/qa-lab/src/suite.ts + - extensions/qa-lab/src/mock-openai-server.ts + execution: + kind: custom + handler: image-understanding-attachment + summary: Verify an attached image reaches the agent model and the agent can describe what it sees. + - id: image-generation-roundtrip + title: Image generation roundtrip + surface: image-generation + objective: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path. + successCriteria: + - image_generate produces a saved MEDIA artifact. + - The generated artifact is reattached on a follow-up turn. + - The follow-up vision answer describes the generated scene rather than a generic attachment placeholder. + docsRefs: + - docs/tools/image-generation.md + - docs/help/testing.md + codeRefs: + - src/agents/tools/image-generate-tool.ts + - src/gateway/chat-attachments.ts + - extensions/qa-lab/src/mock-openai-server.ts + execution: + kind: custom + handler: image-generation-roundtrip + summary: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path. + - id: config-patch-hot-apply + title: Config patch skill disable + surface: config + objective: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly. + successCriteria: + - config.patch succeeds for the skill toggle change. + - A workspace skill works before the patch. + - The same skill is reported disabled after the restart triggered by the patch. + docsRefs: + - docs/gateway/configuration.md + - docs/gateway/protocol.md + codeRefs: + - src/gateway/server-methods/config.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: config-patch-hot-apply + summary: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly. + - id: config-apply-restart-wakeup + title: Config apply restart wake-up + surface: config + objective: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel. + successCriteria: + - config.apply schedules a restart-required change. + - Gateway becomes healthy again after restart. + - Restart sentinel wake-up message arrives in the QA channel. + docsRefs: + - docs/gateway/configuration.md + - docs/gateway/protocol.md + codeRefs: + - src/gateway/server-methods/config.ts + - src/gateway/server-restart-sentinel.ts + execution: + kind: custom + handler: config-apply-restart-wakeup + summary: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel. + - id: config-restart-capability-flip + title: Config restart capability flip + surface: config + objective: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up. + successCriteria: + - Capability is absent before the restart-triggering patch. + - Restart sentinel wakes the same session back up after config patch. + - The restored capability appears in tools.effective and works in the follow-up turn. + docsRefs: + - docs/gateway/configuration.md + - docs/gateway/protocol.md + - docs/tools/image-generation.md + codeRefs: + - src/gateway/server-methods/config.ts + - src/gateway/server-restart-sentinel.ts + - src/gateway/server-methods/tools-effective.ts + - extensions/qa-lab/src/suite.ts + execution: + kind: custom + handler: config-restart-capability-flip + summary: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up. + - id: runtime-inventory-drift-check + title: Runtime inventory drift check + surface: inventory + objective: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes. + successCriteria: + - Enabled tool appears before the config change. + - After config change, disabled tool disappears from tools.effective. + - Disabled skill appears in skills.status with disabled state. + docsRefs: + - docs/gateway/protocol.md + - docs/tools/skills.md + - docs/tools/index.md + codeRefs: + - src/gateway/server-methods/tools-effective.ts + - src/gateway/server-methods/skills.ts + execution: + kind: custom + handler: runtime-inventory-drift-check + summary: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes. +``` diff --git a/qa/seed-scenarios.json b/qa/seed-scenarios.json deleted file mode 100644 index 7ddc873003c..00000000000 --- a/qa/seed-scenarios.json +++ /dev/null @@ -1,425 +0,0 @@ -[ - { - "id": "channel-chat-baseline", - "title": "Channel baseline conversation", - "surface": "channel", - "objective": "Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.", - "successCriteria": [ - "Agent replies in the shared channel transcript.", - "Agent keeps the conversation scoped to the channel.", - "Agent respects mention-driven group routing semantics." - ], - "docsRefs": ["docs/channels/group-messages.md", "docs/channels/qa-channel.md"], - "codeRefs": ["extensions/qa-channel/src/inbound.ts", "extensions/qa-lab/src/bus-state.ts"] - }, - { - "id": "cron-one-minute-ping", - "title": "Cron one-minute ping", - "surface": "cron", - "objective": "Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.", - "successCriteria": [ - "Agent schedules a cron reminder roughly one minute ahead.", - "Reminder returns through qa-channel.", - "Agent recognizes the reminder as part of the original task." - ], - "docsRefs": ["docs/help/testing.md", "docs/channels/qa-channel.md"], - "codeRefs": ["extensions/qa-lab/src/bus-server.ts", "extensions/qa-lab/src/self-check.ts"] - }, - { - "id": "dm-chat-baseline", - "title": "DM baseline conversation", - "surface": "dm", - "objective": "Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.", - "successCriteria": [ - "Agent replies in DM without channel routing mistakes.", - "Agent explains the QA lab and message bus correctly.", - "Agent keeps the dev C-3PO personality." - ], - "docsRefs": ["docs/channels/qa-channel.md", "docs/help/testing.md"], - "codeRefs": ["extensions/qa-channel/src/gateway.ts", "extensions/qa-lab/src/lab-server.ts"] - }, - { - "id": "lobster-invaders-build", - "title": "Build Lobster Invaders", - "surface": "workspace", - "objective": "Verify the agent can read the repo, create a tiny playable artifact, and report what changed.", - "successCriteria": [ - "Agent inspects source before coding.", - "Agent builds a tiny playable Lobster Invaders artifact.", - "Agent explains how to run or view the artifact." - ], - "docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md"], - "codeRefs": ["extensions/qa-lab/src/report.ts", "extensions/qa-lab/web/src/app.ts"] - }, - { - "id": "memory-recall", - "title": "Memory recall after context switch", - "surface": "memory", - "objective": "Verify the agent can store a fact, switch topics, then recall the fact accurately later.", - "successCriteria": [ - "Agent acknowledges the seeded fact.", - "Agent later recalls the same fact correctly.", - "Recall stays scoped to the active QA conversation." - ], - "docsRefs": ["docs/help/testing.md"], - "codeRefs": ["extensions/qa-lab/src/scenario.ts"] - }, - { - "id": "memory-dreaming-sweep", - "title": "Memory dreaming sweep", - "surface": "memory", - "objective": "Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.", - "successCriteria": [ - "Dreaming can be enabled and doctor.memory.status reports the managed sweep cron.", - "Repeated recall signals give the dreaming sweep real material to process.", - "A dreaming sweep writes Light Sleep and REM Sleep blocks, then promotes the canary into MEMORY.md." - ], - "docsRefs": [ - "docs/concepts/dreaming.md", - "docs/reference/memory-config.md", - "docs/web/control-ui.md" - ], - "codeRefs": [ - "extensions/memory-core/src/dreaming.ts", - "extensions/memory-core/src/dreaming-phases.ts", - "src/gateway/server-methods/doctor.ts", - "extensions/qa-lab/src/suite.ts" - ] - }, - { - "id": "model-switch-follow-up", - "title": "Model switch follow-up", - "surface": "models", - "objective": "Verify the agent can switch to a different configured model and continue coherently.", - "successCriteria": [ - "Agent reflects the model switch request.", - "Follow-up answer remains coherent with prior context.", - "Final report notes whether the switch actually happened." - ], - "docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md"], - "codeRefs": ["extensions/qa-lab/src/report.ts"] - }, - { - "id": "approval-turn-tool-followthrough", - "title": "Approval turn tool followthrough", - "surface": "harness", - "objective": "Verify a short approval like \"ok do it\" triggers immediate tool use instead of fake-progress narration.", - "successCriteria": [ - "Agent can keep the pre-action turn brief.", - "The short approval leads to a real tool call on the next turn.", - "Final answer uses tool-derived evidence instead of placeholder progress text." - ], - "docsRefs": ["docs/help/testing.md", "docs/channels/qa-channel.md"], - "codeRefs": [ - "extensions/qa-lab/src/suite.ts", - "extensions/qa-lab/src/mock-openai-server.ts", - "src/agents/pi-embedded-runner/run/incomplete-turn.ts" - ] - }, - { - "id": "reaction-edit-delete", - "title": "Reaction, edit, delete lifecycle", - "surface": "message-actions", - "objective": "Verify the agent can use channel-owned message actions and that the QA transcript reflects them.", - "successCriteria": [ - "Agent adds at least one reaction.", - "Agent edits or replaces a message when asked.", - "Transcript shows the action lifecycle correctly." - ], - "docsRefs": ["docs/channels/qa-channel.md"], - "codeRefs": [ - "extensions/qa-channel/src/channel-actions.ts", - "extensions/qa-lab/src/self-check-scenario.ts" - ] - }, - { - "id": "source-docs-discovery-report", - "title": "Source and docs discovery report", - "surface": "discovery", - "objective": "Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.", - "successCriteria": [ - "Agent reads docs and source before proposing more tests.", - "Agent identifies extra candidate scenarios beyond the seed list.", - "Agent ends with a worked or failed QA report." - ], - "docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md", "docs/channels/qa-channel.md"], - "codeRefs": [ - "extensions/qa-lab/src/report.ts", - "extensions/qa-lab/src/self-check.ts", - "src/agents/system-prompt.ts" - ] - }, - { - "id": "subagent-handoff", - "title": "Subagent handoff", - "surface": "subagents", - "objective": "Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.", - "successCriteria": [ - "Agent launches a bounded subagent task.", - "Subagent result is acknowledged in the main flow.", - "Final answer attributes delegated work clearly." - ], - "docsRefs": ["docs/tools/subagents.md", "docs/help/testing.md"], - "codeRefs": ["src/agents/system-prompt.ts", "extensions/qa-lab/src/report.ts"] - }, - { - "id": "subagent-fanout-synthesis", - "title": "Subagent fanout synthesis", - "surface": "subagents", - "objective": "Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.", - "successCriteria": [ - "Parent flow launches at least two bounded subagent tasks.", - "Both delegated results are acknowledged in the main flow.", - "Final answer synthesizes both worker outputs in one reply." - ], - "docsRefs": ["docs/tools/subagents.md", "docs/help/testing.md"], - "codeRefs": [ - "src/agents/subagent-spawn.ts", - "src/agents/system-prompt.ts", - "extensions/qa-lab/src/suite.ts" - ] - }, - { - "id": "thread-follow-up", - "title": "Threaded follow-up", - "surface": "thread", - "objective": "Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.", - "successCriteria": [ - "Agent creates or uses a thread for deeper work.", - "Follow-up messages stay attached to the thread.", - "Thread report references the correct prior context." - ], - "docsRefs": ["docs/channels/qa-channel.md", "docs/channels/group-messages.md"], - "codeRefs": ["extensions/qa-channel/src/protocol.ts", "extensions/qa-lab/src/bus-state.ts"] - }, - { - "id": "memory-tools-channel-context", - "title": "Memory tools in channel context", - "surface": "memory", - "objective": "Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.", - "successCriteria": [ - "Agent uses memory_search before answering.", - "Agent narrows with memory_get before answering.", - "Final reply returns the memory-only fact correctly in-channel." - ], - "docsRefs": ["docs/concepts/memory.md", "docs/concepts/memory-search.md"], - "codeRefs": ["extensions/memory-core/src/tools.ts", "extensions/qa-lab/src/suite.ts"] - }, - { - "id": "memory-failure-fallback", - "title": "Memory failure fallback", - "surface": "memory", - "objective": "Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.", - "successCriteria": [ - "Memory tools are absent from the effective tool inventory.", - "Agent does not hallucinate the hidden fact.", - "Agent says it could not confirm and surfaces the limitation." - ], - "docsRefs": ["docs/concepts/memory.md", "docs/tools/index.md"], - "codeRefs": ["extensions/memory-core/src/tools.ts", "extensions/qa-lab/src/suite.ts"] - }, - { - "id": "session-memory-ranking", - "title": "Session memory ranking", - "surface": "memory", - "objective": "Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.", - "successCriteria": [ - "Session memory indexing is enabled for the scenario.", - "Search ranks the newer transcript-backed fact ahead of the stale durable note.", - "The agent uses memory tools and answers with the current fact, not the stale one." - ], - "docsRefs": ["docs/concepts/memory-search.md", "docs/reference/memory-config.md"], - "codeRefs": [ - "extensions/memory-core/src/tools.ts", - "extensions/memory-core/src/memory/manager.ts", - "extensions/qa-lab/src/suite.ts" - ] - }, - { - "id": "thread-memory-isolation", - "title": "Thread memory isolation", - "surface": "memory", - "objective": "Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.", - "successCriteria": [ - "Agent uses memory tools inside the thread.", - "The hidden fact is answered correctly in the thread.", - "No root-channel outbound message leaks during the threaded memory reply." - ], - "docsRefs": [ - "docs/concepts/memory-search.md", - "docs/channels/qa-channel.md", - "docs/channels/group-messages.md" - ], - "codeRefs": [ - "extensions/memory-core/src/tools.ts", - "extensions/qa-channel/src/protocol.ts", - "extensions/qa-lab/src/suite.ts" - ] - }, - { - "id": "model-switch-tool-continuity", - "title": "Model switch with tool continuity", - "surface": "models", - "objective": "Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.", - "successCriteria": [ - "Alternate model is actually requested.", - "A tool call still happens after the model switch.", - "Final answer acknowledges the handoff and uses the tool-derived evidence." - ], - "docsRefs": ["docs/help/testing.md", "docs/concepts/model-failover.md"], - "codeRefs": ["extensions/qa-lab/src/suite.ts", "extensions/qa-lab/src/mock-openai-server.ts"] - }, - { - "id": "mcp-plugin-tools-call", - "title": "MCP plugin-tools call", - "surface": "mcp", - "objective": "Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.", - "successCriteria": [ - "Plugin tools MCP server lists memory_search.", - "A real MCP client calls memory_search successfully.", - "The returned MCP payload includes the expected memory-only fact." - ], - "docsRefs": ["docs/cli/mcp.md", "docs/gateway/protocol.md"], - "codeRefs": ["src/mcp/plugin-tools-serve.ts", "extensions/qa-lab/src/suite.ts"] - }, - { - "id": "skill-visibility-invocation", - "title": "Skill visibility and invocation", - "surface": "skills", - "objective": "Verify a workspace skill becomes visible in skills.status and influences the next agent turn.", - "successCriteria": [ - "skills.status reports the seeded skill as visible and eligible.", - "The next agent turn reflects the skill instruction marker.", - "The result stays scoped to the active QA workspace skill." - ], - "docsRefs": ["docs/tools/skills.md", "docs/gateway/protocol.md"], - "codeRefs": ["src/agents/skills-status.ts", "extensions/qa-lab/src/suite.ts"] - }, - { - "id": "skill-install-hot-availability", - "title": "Skill install hot availability", - "surface": "skills", - "objective": "Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.", - "successCriteria": [ - "Skill is absent before install.", - "skills.status reports it after install without a restart.", - "The next agent turn reflects the new skill marker." - ], - "docsRefs": ["docs/tools/skills.md", "docs/gateway/configuration.md"], - "codeRefs": ["src/agents/skills-status.ts", "extensions/qa-lab/src/suite.ts"] - }, - { - "id": "native-image-generation", - "title": "Native image generation", - "surface": "image-generation", - "objective": "Verify image_generate appears when configured and returns a real saved media artifact.", - "successCriteria": [ - "image_generate appears in the effective tool inventory.", - "Agent triggers native image_generate.", - "Tool output returns a saved MEDIA path and the file exists." - ], - "docsRefs": ["docs/tools/image-generation.md", "docs/providers/openai.md"], - "codeRefs": [ - "src/agents/tools/image-generate-tool.ts", - "extensions/qa-lab/src/mock-openai-server.ts" - ] - }, - { - "id": "image-understanding-attachment", - "title": "Image understanding from attachment", - "surface": "image-understanding", - "objective": "Verify an attached image reaches the agent model and the agent can describe what it sees.", - "successCriteria": [ - "Agent receives at least one image attachment.", - "Final answer describes the visible image content in one short sentence.", - "The description mentions the expected red and blue regions." - ], - "docsRefs": ["docs/help/testing.md", "docs/tools/index.md"], - "codeRefs": [ - "src/gateway/server-methods/agent.ts", - "extensions/qa-lab/src/suite.ts", - "extensions/qa-lab/src/mock-openai-server.ts" - ] - }, - { - "id": "image-generation-roundtrip", - "title": "Image generation roundtrip", - "surface": "image-generation", - "objective": "Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.", - "successCriteria": [ - "image_generate produces a saved MEDIA artifact.", - "The generated artifact is reattached on a follow-up turn.", - "The follow-up vision answer describes the generated scene rather than a generic attachment placeholder." - ], - "docsRefs": ["docs/tools/image-generation.md", "docs/help/testing.md"], - "codeRefs": [ - "src/agents/tools/image-generate-tool.ts", - "src/gateway/chat-attachments.ts", - "extensions/qa-lab/src/mock-openai-server.ts" - ] - }, - { - "id": "config-patch-hot-apply", - "title": "Config patch skill disable", - "surface": "config", - "objective": "Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.", - "successCriteria": [ - "config.patch succeeds for the skill toggle change.", - "A workspace skill works before the patch.", - "The same skill is reported disabled after the restart triggered by the patch." - ], - "docsRefs": ["docs/gateway/configuration.md", "docs/gateway/protocol.md"], - "codeRefs": ["src/gateway/server-methods/config.ts", "extensions/qa-lab/src/suite.ts"] - }, - { - "id": "config-apply-restart-wakeup", - "title": "Config apply restart wake-up", - "surface": "config", - "objective": "Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.", - "successCriteria": [ - "config.apply schedules a restart-required change.", - "Gateway becomes healthy again after restart.", - "Restart sentinel wake-up message arrives in the QA channel." - ], - "docsRefs": ["docs/gateway/configuration.md", "docs/gateway/protocol.md"], - "codeRefs": ["src/gateway/server-methods/config.ts", "src/gateway/server-restart-sentinel.ts"] - }, - { - "id": "config-restart-capability-flip", - "title": "Config restart capability flip", - "surface": "config", - "objective": "Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.", - "successCriteria": [ - "Capability is absent before the restart-triggering patch.", - "Restart sentinel wakes the same session back up after config patch.", - "The restored capability appears in tools.effective and works in the follow-up turn." - ], - "docsRefs": [ - "docs/gateway/configuration.md", - "docs/gateway/protocol.md", - "docs/tools/image-generation.md" - ], - "codeRefs": [ - "src/gateway/server-methods/config.ts", - "src/gateway/server-restart-sentinel.ts", - "src/gateway/server-methods/tools-effective.ts", - "extensions/qa-lab/src/suite.ts" - ] - }, - { - "id": "runtime-inventory-drift-check", - "title": "Runtime inventory drift check", - "surface": "inventory", - "objective": "Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.", - "successCriteria": [ - "Enabled tool appears before the config change.", - "After config change, disabled tool disappears from tools.effective.", - "Disabled skill appears in skills.status with disabled state." - ], - "docsRefs": ["docs/gateway/protocol.md", "docs/tools/skills.md", "docs/tools/index.md"], - "codeRefs": [ - "src/gateway/server-methods/tools-effective.ts", - "src/gateway/server-methods/skills.ts" - ] - } -] diff --git a/src/plugin-sdk/qa-channel.ts b/src/plugin-sdk/qa-channel.ts index 7b4c8241486..6983cfb6922 100644 --- a/src/plugin-sdk/qa-channel.ts +++ b/src/plugin-sdk/qa-channel.ts @@ -20,6 +20,7 @@ export { setQaChannelRuntime, } from "../../extensions/qa-channel/api.js"; export type { + QaBusAttachment, QaBusConversation, QaBusConversationKind, QaBusCreateThreadInput,