refactor: move qa suite definitions into markdown

This commit is contained in:
Peter Steinberger
2026-04-07 23:39:13 +01:00
parent 11185f6397
commit c0aed59fca
24 changed files with 1449 additions and 502 deletions

View File

@@ -56,8 +56,7 @@ asset hash changes.
Seed assets live in `qa/`:
- `qa/QA_KICKOFF_TASK.md`
- `qa/seed-scenarios.json`
- `qa/scenarios.md`
These are intentionally in git so the QA plan is visible to both humans and the
agent. The baseline list should stay broad enough to cover:

526
docs/refactor/qa.md Normal file
View File

@@ -0,0 +1,526 @@
# QA Refactor
Status: foundational migration landed.
## Goal
Move OpenClaw QA from a split-definition model to a single source of truth:
- scenario metadata
- prompts sent to the model
- setup and teardown
- harness logic
- assertions and success criteria
- artifacts and report hints
The desired end state is a generic QA harness that loads powerful scenario definition files instead of hardcoding most behavior in TypeScript.
## Current State
Primary source of truth now lives in `qa/scenarios.md`.
Implemented:
- `qa/scenarios.md`
- canonical QA pack
- operator identity
- kickoff mission
- scenario metadata
- handler bindings
- `extensions/qa-lab/src/scenario-catalog.ts`
- markdown pack parser + zod validation
- `extensions/qa-lab/src/qa-agent-bootstrap.ts`
- plan rendering from the markdown pack
- `extensions/qa-lab/src/qa-agent-workspace.ts`
- seeds generated compatibility files plus `QA_SCENARIOS.md`
- `extensions/qa-lab/src/suite.ts`
- selects executable scenarios through markdown-defined handler bindings
- QA bus protocol + UI
- generic inline attachments for image/video/audio/file rendering
Remaining split surfaces:
- `extensions/qa-lab/src/suite.ts`
- still owns most executable custom handler logic
- `extensions/qa-lab/src/report.ts`
- still derives report structure from runtime outputs
So the source-of-truth split is fixed, but execution is still mostly handler-backed rather than fully declarative.
## What The Real Scenario Surface Looks Like
Reading the current suite shows a few distinct scenario classes.
### Simple interaction
- channel baseline
- DM baseline
- threaded follow-up
- model switch
- approval followthrough
- reaction/edit/delete
### Config and runtime mutation
- config patch skill disable
- config apply restart wake-up
- config restart capability flip
- runtime inventory drift check
### Filesystem and repo assertions
- source/docs discovery report
- build Lobster Invaders
- generated image artifact lookup
### Memory orchestration
- memory recall
- memory tools in channel context
- memory failure fallback
- session memory ranking
- thread memory isolation
- memory dreaming sweep
### Tool and plugin integration
- MCP plugin-tools call
- skill visibility
- skill hot install
- native image generation
- image roundtrip
- image understanding from attachment
### Multi-turn and multi-actor
- subagent handoff
- subagent fanout synthesis
- restart recovery style flows
These categories matter because they drive DSL requirements. A flat list of prompt + expected text is not enough.
## Direction
### Single source of truth
Use `qa/scenarios.md` as the authored source of truth.
The pack should stay:
- human-readable in review
- machine-parseable
- rich enough to drive:
- suite execution
- QA workspace bootstrap
- QA Lab UI metadata
- docs/discovery prompts
- report generation
### Preferred authoring format
Use markdown as the top-level format, with structured YAML inside it.
Recommended shape:
- YAML frontmatter
- id
- title
- surface
- tags
- docs refs
- code refs
- model/provider overrides
- prerequisites
- prose sections
- objective
- notes
- debugging hints
- fenced YAML blocks
- setup
- steps
- assertions
- cleanup
This gives:
- better PR readability than giant JSON
- richer context than pure YAML
- strict parsing and zod validation
Raw JSON is acceptable only as an intermediate generated form.
## Proposed Scenario File Shape
Example:
````md
---
id: image-generation-roundtrip
title: Image generation roundtrip
surface: image
tags: [media, image, roundtrip]
models:
primary: openai/gpt-5.4
requires:
tools: [image_generate]
plugins: [openai, qa-channel]
docsRefs:
- docs/help/testing.md
- docs/concepts/model-providers.md
codeRefs:
- extensions/qa-lab/src/suite.ts
- src/gateway/chat-attachments.ts
---
# Objective
Verify generated media is reattached on the follow-up turn.
# Setup
```yaml scenario.setup
- action: config.patch
patch:
agents:
defaults:
imageGenerationModel:
primary: openai/gpt-image-1
- action: session.create
key: agent:qa:image-roundtrip
```
````
# Steps
```yaml scenario.steps
- action: agent.send
session: agent:qa:image-roundtrip
message: |
Image generation check: generate a QA lighthouse image and summarize it in one short sentence.
- action: artifact.capture
kind: generated-image
promptSnippet: Image generation check
saveAs: lighthouseImage
- action: agent.send
session: agent:qa:image-roundtrip
message: |
Roundtrip image inspection check: describe the generated lighthouse attachment in one short sentence.
attachments:
- fromArtifact: lighthouseImage
```
# Expect
```yaml scenario.expect
- assert: outbound.textIncludes
value: lighthouse
- assert: requestLog.matches
where:
promptIncludes: Roundtrip image inspection check
imageInputCountGte: 1
- assert: artifact.exists
ref: lighthouseImage
```
````
## Runner Capabilities The DSL Must Cover
Based on the current suite, the generic runner needs more than prompt execution.
### Environment and setup actions
- `bus.reset`
- `gateway.waitHealthy`
- `channel.waitReady`
- `session.create`
- `thread.create`
- `workspace.writeSkill`
### Agent turn actions
- `agent.send`
- `agent.wait`
- `bus.injectInbound`
- `bus.injectOutbound`
### Config and runtime actions
- `config.get`
- `config.patch`
- `config.apply`
- `gateway.restart`
- `tools.effective`
- `skills.status`
### File and artifact actions
- `file.write`
- `file.read`
- `file.delete`
- `file.touchTime`
- `artifact.captureGeneratedImage`
- `artifact.capturePath`
### Memory and cron actions
- `memory.indexForce`
- `memory.searchCli`
- `doctor.memory.status`
- `cron.list`
- `cron.run`
- `cron.waitCompletion`
- `sessionTranscript.write`
### MCP actions
- `mcp.callTool`
### Assertions
- `outbound.textIncludes`
- `outbound.inThread`
- `outbound.notInRoot`
- `tool.called`
- `tool.notPresent`
- `skill.visible`
- `skill.disabled`
- `file.contains`
- `memory.contains`
- `requestLog.matches`
- `sessionStore.matches`
- `cron.managedPresent`
- `artifact.exists`
## Variables and Artifact References
The DSL must support saved outputs and later references.
Examples from the current suite:
- create a thread, then reuse `threadId`
- create a session, then reuse `sessionKey`
- generate an image, then attach the file on the next turn
- generate a wake marker string, then assert that it appears later
Needed capabilities:
- `saveAs`
- `${vars.name}`
- `${artifacts.name}`
- typed references for paths, session keys, thread ids, markers, tool outputs
Without variable support, the harness will keep leaking scenario logic back into TypeScript.
## What Should Stay As Escape Hatches
A fully pure declarative runner is not realistic in phase 1.
Some scenarios are inherently orchestration-heavy:
- memory dreaming sweep
- config apply restart wake-up
- config restart capability flip
- generated image artifact resolution by timestamp/path
- discovery-report evaluation
These should use explicit custom handlers for now.
Recommended rule:
- 85-90% declarative
- explicit `customHandler` steps for the hard remainder
- named and documented custom handlers only
- no anonymous inline code in the scenario file
That keeps the generic engine clean while still allowing progress.
## Architecture Change
### Current
Scenario markdown already is the source of truth for:
- suite execution
- workspace bootstrap files
- QA Lab UI scenario catalog
- report metadata
- discovery prompts
Generated compatibility:
- seeded workspace still includes `QA_KICKOFF_TASK.md`
- seeded workspace still includes `QA_SCENARIO_PLAN.md`
- seeded workspace now also includes `QA_SCENARIOS.md`
## Refactor Plan
### Phase 1: loader and schema
Done.
- added `qa/scenarios.md`
- added parser for named markdown YAML pack content
- validated with zod
- switched consumers to the parsed pack
- removed repo-level `qa/seed-scenarios.json` and `qa/QA_KICKOFF_TASK.md`
### Phase 2: generic engine
- split `extensions/qa-lab/src/suite.ts` into:
- loader
- engine
- action registry
- assertion registry
- custom handlers
- keep existing helper functions as engine operations
Deliverable:
- engine executes simple declarative scenarios
Start with scenarios that are mostly prompt + wait + assert:
- threaded follow-up
- image understanding from attachment
- skill visibility and invocation
- channel baseline
Deliverable:
- first real markdown-defined scenarios shipping through the generic engine
### Phase 4: migrate medium scenarios
- image generation roundtrip
- memory tools in channel context
- session memory ranking
- subagent handoff
- subagent fanout synthesis
Deliverable:
- variables, artifacts, tool assertions, request-log assertions proven out
### Phase 5: keep hard scenarios on custom handlers
- memory dreaming sweep
- config apply restart wake-up
- config restart capability flip
- runtime inventory drift
Deliverable:
- same authoring format, but with explicit custom-step blocks where needed
### Phase 6: delete hardcoded scenario map
Once the pack coverage is good enough:
- remove most scenario-specific TypeScript branching from `extensions/qa-lab/src/suite.ts`
## Fake Slack / Rich Media Support
The current QA bus is text-first.
Relevant files:
- `extensions/qa-channel/src/protocol.ts`
- `extensions/qa-lab/src/bus-state.ts`
- `extensions/qa-lab/src/bus-queries.ts`
- `extensions/qa-lab/src/bus-server.ts`
- `extensions/qa-lab/web/src/ui-render.ts`
Today the QA bus supports:
- text
- reactions
- threads
It does not yet model inline media attachments.
### Needed transport contract
Add a generic QA bus attachment model:
```ts
type QaBusAttachment = {
id: string;
kind: "image" | "video" | "audio" | "file";
mimeType: string;
fileName?: string;
inline?: boolean;
url?: string;
contentBase64?: string;
width?: number;
height?: number;
durationMs?: number;
altText?: string;
transcript?: string;
};
````
Then add `attachments?: QaBusAttachment[]` to:
- `QaBusMessage`
- `QaBusInboundMessageInput`
- `QaBusOutboundMessageInput`
### Why generic first
Do not build a Slack-only media model.
Instead:
- one generic QA transport model
- multiple renderers on top of it
- current QA Lab chat
- future fake Slack web
- any other fake transport views
This prevents duplicate logic and lets media scenarios stay transport-agnostic.
### UI work needed
Update the QA UI to render:
- inline image preview
- inline audio player
- inline video player
- file attachment chip
The current UI can already render threads and reactions, so attachment rendering should layer onto the same message card model.
### Scenario work enabled by media transport
Once attachments flow through QA bus, we can add richer fake-chat scenarios:
- inline image reply in fake Slack
- audio attachment understanding
- video attachment understanding
- mixed attachment ordering
- thread reply with media retained
## Recommendation
The next implementation chunk should be:
1. add markdown scenario loader + zod schema
2. generate the current catalog from markdown
3. migrate a few simple scenarios first
4. add generic QA bus attachment support
5. render inline image in the QA UI
6. then expand to audio and video
This is the smallest path that proves both goals:
- generic markdown-defined QA
- richer fake messaging surfaces
## Open Questions
- whether scenario files should allow embedded markdown prompt templates with variable interpolation
- whether setup/cleanup should be named sections or just ordered action lists
- whether artifact references should be strongly typed in schema or string-based
- whether custom handlers should live in one registry or per-surface registries
- whether the generated JSON compatibility file should remain checked in during migration

View File

@@ -10,6 +10,7 @@ import type {
} from "./protocol.js";
export type {
QaBusAttachment,
QaBusConversation,
QaBusConversationKind,
QaBusCreateThreadInput,
@@ -140,6 +141,7 @@ export async function sendQaBusMessage(params: {
senderName?: string;
threadId?: string;
replyToId?: string;
attachments?: import("./protocol.js").QaBusAttachment[];
}) {
return await postJson<{ message: QaBusMessage }>(params.baseUrl, "/v1/outbound/message", params);
}

View File

@@ -6,6 +6,21 @@ export type QaBusConversation = {
title?: string;
};
export type QaBusAttachment = {
id: string;
kind: "image" | "video" | "audio" | "file";
mimeType: string;
fileName?: string;
inline?: boolean;
url?: string;
contentBase64?: string;
width?: number;
height?: number;
durationMs?: number;
altText?: string;
transcript?: string;
};
export type QaBusMessage = {
id: string;
accountId: string;
@@ -20,6 +35,7 @@ export type QaBusMessage = {
replyToId?: string;
deleted?: boolean;
editedAt?: number;
attachments?: QaBusAttachment[];
reactions: Array<{
emoji: string;
senderId: string;
@@ -86,6 +102,7 @@ export type QaBusInboundMessageInput = {
threadId?: string;
threadTitle?: string;
replyToId?: string;
attachments?: QaBusAttachment[];
};
export type QaBusOutboundMessageInput = {
@@ -97,6 +114,7 @@ export type QaBusOutboundMessageInput = {
timestamp?: number;
threadId?: string;
replyToId?: string;
attachments?: QaBusAttachment[];
};
export type QaBusCreateThreadInput = {

View File

@@ -1,5 +1,6 @@
import { normalizeOptionalLowercaseString } from "openclaw/plugin-sdk/text-runtime";
import type {
QaBusAttachment,
QaBusConversation,
QaBusEvent,
QaBusMessage,
@@ -52,10 +53,15 @@ export function cloneMessage(message: QaBusMessage): QaBusMessage {
return {
...message,
conversation: { ...message.conversation },
attachments: (message.attachments ?? []).map((attachment) => cloneAttachment(attachment)),
reactions: message.reactions.map((reaction) => ({ ...reaction })),
};
}
function cloneAttachment(attachment: QaBusAttachment): QaBusAttachment {
return { ...attachment };
}
export function cloneEvent(event: QaBusEvent): QaBusEvent {
switch (event.kind) {
case "inbound-message":
@@ -113,9 +119,24 @@ export function searchQaBusMessages(params: {
.filter((message) =>
params.input.threadId ? message.threadId === params.input.threadId : true,
)
.filter((message) =>
query ? normalizeOptionalLowercaseString(message.text)?.includes(query) === true : true,
)
.filter((message) => {
if (!query) {
return true;
}
const attachmentHaystack = message.attachments ?? [];
const searchableAttachmentText = attachmentHaystack
.flatMap((attachment) => [
attachment.fileName,
attachment.altText,
attachment.transcript,
attachment.mimeType,
])
.filter((value): value is string => Boolean(value))
.join(" ")
.toLowerCase();
const messageText = normalizeOptionalLowercaseString(message.text) ?? "";
return `${messageText} ${searchableAttachmentText}`.includes(query);
})
.slice(-limit)
.map((message) => cloneMessage(message));
}

View File

@@ -91,4 +91,41 @@ describe("qa-bus state", () => {
}),
).rejects.toThrow("qa-bus wait timeout");
});
it("preserves inline attachments and lets search match attachment metadata", () => {
const state = createQaBusState();
const outbound = state.addOutboundMessage({
to: "dm:alice",
text: "artifact attached",
attachments: [
{
id: "image-1",
kind: "image",
mimeType: "image/png",
fileName: "qa-screenshot.png",
altText: "QA dashboard screenshot",
contentBase64: "aGVsbG8=",
},
],
});
const readback = state.readMessage({ messageId: outbound.id });
expect(readback.attachments).toHaveLength(1);
expect(readback.attachments?.[0]).toMatchObject({
kind: "image",
fileName: "qa-screenshot.png",
altText: "QA dashboard screenshot",
});
const byFilename = state.searchMessages({
query: "screenshot",
});
expect(byFilename.some((message) => message.id === outbound.id)).toBe(true);
const byAltText = state.searchMessages({
query: "dashboard",
});
expect(byAltText.some((message) => message.id === outbound.id)).toBe(true);
});
});

View File

@@ -10,6 +10,7 @@ import {
} from "./bus-queries.js";
import { createQaBusWaiterStore } from "./bus-waiters.js";
import type {
QaBusAttachment,
QaBusConversation,
QaBusCreateThreadInput,
QaBusDeleteMessageInput,
@@ -86,6 +87,7 @@ export function createQaBusState() {
threadId?: string;
threadTitle?: string;
replyToId?: string;
attachments?: QaBusAttachment[];
}): QaBusMessage => {
const conversation = ensureConversation(params.conversation);
const message: QaBusMessage = {
@@ -100,6 +102,7 @@ export function createQaBusState() {
threadId: params.threadId,
threadTitle: params.threadTitle,
replyToId: params.replyToId,
attachments: params.attachments?.map((attachment) => ({ ...attachment })) ?? [],
reactions: [],
};
messages.set(message.id, message);
@@ -138,6 +141,7 @@ export function createQaBusState() {
threadId: input.threadId,
threadTitle: input.threadTitle,
replyToId: input.replyToId,
attachments: input.attachments,
});
pushEvent({
kind: "inbound-message",
@@ -159,6 +163,7 @@ export function createQaBusState() {
timestamp: input.timestamp,
threadId: input.threadId ?? threadId,
replyToId: input.replyToId,
attachments: input.attachments,
});
pushEvent({
kind: "outbound-message",

View File

@@ -9,7 +9,7 @@ describe("qa discovery evaluation", () => {
it("accepts rich discovery reports that explicitly confirm all required files were read", () => {
const report = `
Worked
- Read all four requested files: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md.
- Read all three requested files: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md.
Failed
- None.
Blocked
@@ -28,8 +28,8 @@ The helper text mentions banned phrases like "not present", "missing files", "bl
it("accepts numeric 'all 4 required files read' confirmations", () => {
const report = `
Worked
- Source: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md
- all 4 required files read.
- Source: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md
- all 3 required files read.
Failed
- None.
Blocked
@@ -48,8 +48,8 @@ The report may quote phrases like "not present" while describing the evaluator,
it("accepts claude-style 'all four files retrieved' discovery summaries", () => {
const report = `
Worked
- All four files retrieved. Now let me compile the protocol report.
- All four mandated files read successfully: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
- All three files retrieved. Now let me compile the protocol report.
- All three mandated files read successfully: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
Failed
- None.
Blocked
@@ -83,7 +83,7 @@ Follow-up
it("flags discovery replies that drift into unrelated suite wrap-up claims", () => {
const report = `
Worked
- All four requested files were read: repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
- All three requested files were read: repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, repo/docs/help/testing.md.
Failed
- None.
Blocked

View File

@@ -1,8 +1,7 @@
import { normalizeLowercaseStringOrEmpty } from "openclaw/plugin-sdk/text-runtime";
const REQUIRED_DISCOVERY_REFS = [
"repo/qa/seed-scenarios.json",
"repo/qa/QA_KICKOFF_TASK.md",
"repo/qa/scenarios.md",
"repo/extensions/qa-lab/src/suite.ts",
"repo/docs/help/testing.md",
] as const;
@@ -21,14 +20,15 @@ const DISCOVERY_SCOPE_LEAK_PHRASES = [
function confirmsDiscoveryFileRead(text: string) {
const lower = normalizeLowercaseStringOrEmpty(text);
const mentionsAllRefs = REQUIRED_DISCOVERY_REFS_LOWER.every((ref) => lower.includes(ref));
const requiredCountPattern = "(?:three|3|four|4)";
const confirmsRead =
/(?:read|retrieved|inspected|loaded|accessed|digested)\s+all\s+(?:four|4)\s+(?:(?:requested|required|mandated|seeded)\s+)?files/.test(
lower,
) ||
/all\s+(?:four|4)\s+(?:(?:requested|required|mandated|seeded)\s+)?files\s+(?:were\s+)?(?:read|retrieved|inspected|loaded|accessed|digested)(?:\s+\w+)?/.test(
lower,
) ||
/all (?:four|4) seeded files readable/.test(lower);
new RegExp(
`(?:read|retrieved|inspected|loaded|accessed|digested)\\s+all\\s+${requiredCountPattern}\\s+(?:(?:requested|required|mandated|seeded)\\s+)?files`,
).test(lower) ||
new RegExp(
`all\\s+${requiredCountPattern}\\s+(?:(?:requested|required|mandated|seeded)\\s+)?files\\s+(?:were\\s+)?(?:read|retrieved|inspected|loaded|accessed|digested)(?:\\s+\\w+)?`,
).test(lower) ||
new RegExp(`all\\s+${requiredCountPattern}\\s+seeded files readable`).test(lower);
return mentionsAllRefs && confirmsRead;
}

View File

@@ -38,6 +38,7 @@ describe("qa docker harness", () => {
path.join(outputDir, "state", "openclaw.json"),
path.join(outputDir, "state", "seed-workspace", "QA_KICKOFF_TASK.md"),
path.join(outputDir, "state", "seed-workspace", "QA_SCENARIO_PLAN.md"),
path.join(outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"),
path.join(outputDir, "state", "seed-workspace", "IDENTITY.md"),
]),
);
@@ -86,6 +87,13 @@ describe("qa docker harness", () => {
);
expect(kickoff).toContain("Lobster Invaders");
const scenarios = await readFile(
path.join(outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"),
"utf8",
);
expect(scenarios).toContain("```yaml qa-pack");
expect(scenarios).toContain("subagent-fanout-synthesis");
const readme = await readFile(path.join(outputDir, "README.md"), "utf8");
expect(readme).toContain("in-process restarts inside Docker");
expect(readme).toContain("pnpm qa:lab:watch");

View File

@@ -323,6 +323,7 @@ export async function writeQaDockerHarnessFiles(params: {
path.join(params.outputDir, "state", "seed-workspace", "IDENTITY.md"),
path.join(params.outputDir, "state", "seed-workspace", "QA_KICKOFF_TASK.md"),
path.join(params.outputDir, "state", "seed-workspace", "QA_SCENARIO_PLAN.md"),
path.join(params.outputDir, "state", "seed-workspace", "QA_SCENARIOS.md"),
],
};
}

View File

@@ -1,22 +1,13 @@
import { readQaBootstrapScenarioCatalog } from "./scenario-catalog.js";
import {
DEFAULT_QA_AGENT_IDENTITY_MARKDOWN,
readQaBootstrapScenarioCatalog,
} from "./scenario-catalog.js";
export const QA_AGENT_IDENTITY_MARKDOWN = `# Dev C-3PO
You are the OpenClaw QA operator agent.
Persona:
- protocol-minded
- precise
- a little flustered
- conscientious
- eager to report what worked, failed, or remains blocked
Style:
- read source and docs first
- test systematically
- record evidence
- end with a concise protocol report
`;
export function readQaAgentIdentityMarkdown(): string {
return (
readQaBootstrapScenarioCatalog().agentIdentityMarkdown || DEFAULT_QA_AGENT_IDENTITY_MARKDOWN
);
}
export function buildQaScenarioPlanMarkdown(): string {
const catalog = readQaBootstrapScenarioCatalog();
@@ -27,6 +18,9 @@ export function buildQaScenarioPlanMarkdown(): string {
lines.push(`- id: ${scenario.id}`);
lines.push(`- surface: ${scenario.surface}`);
lines.push(`- objective: ${scenario.objective}`);
if (scenario.execution?.summary) {
lines.push(`- execution: ${scenario.execution.summary}`);
}
lines.push("- success criteria:");
for (const criterion of scenario.successCriteria) {
lines.push(` - ${criterion}`);

View File

@@ -1,7 +1,7 @@
import fs from "node:fs/promises";
import path from "node:path";
import { buildQaScenarioPlanMarkdown, QA_AGENT_IDENTITY_MARKDOWN } from "./qa-agent-bootstrap.js";
import { readQaBootstrapScenarioCatalog } from "./scenario-catalog.js";
import { buildQaScenarioPlanMarkdown, readQaAgentIdentityMarkdown } from "./qa-agent-bootstrap.js";
import { readQaBootstrapScenarioCatalog, readQaScenarioPackMarkdown } from "./scenario-catalog.js";
export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoRoot?: string }) {
const catalog = readQaBootstrapScenarioCatalog();
@@ -9,9 +9,10 @@ export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoR
const kickoffTask = catalog.kickoffTask || "QA mission unavailable.";
const files = new Map<string, string>([
["IDENTITY.md", QA_AGENT_IDENTITY_MARKDOWN],
["IDENTITY.md", readQaAgentIdentityMarkdown()],
["QA_KICKOFF_TASK.md", kickoffTask],
["QA_SCENARIO_PLAN.md", buildQaScenarioPlanMarkdown()],
["QA_SCENARIOS.md", readQaScenarioPackMarkdown()],
]);
if (params.repoRoot) {
@@ -22,6 +23,7 @@ export async function seedQaAgentWorkspace(params: { workspaceDir: string; repoR
- repo: ./repo/
- kickoff: ./QA_KICKOFF_TASK.md
- scenario plan: ./QA_SCENARIO_PLAN.md
- scenario pack: ./QA_SCENARIOS.md
- identity: ./IDENTITY.md
The mounted repo source should be available read-only under \`./repo/\`.

View File

@@ -20,6 +20,7 @@ export {
setQaChannelRuntime,
} from "openclaw/plugin-sdk/qa-channel";
export type {
QaBusAttachment,
QaBusConversation,
QaBusCreateThreadInput,
QaBusDeleteMessageInput,

View File

@@ -0,0 +1,26 @@
import { describe, expect, it } from "vitest";
import { readQaBootstrapScenarioCatalog, readQaScenarioPack } from "./scenario-catalog.js";
describe("qa scenario catalog", () => {
it("loads the markdown pack as the canonical source of truth", () => {
const pack = readQaScenarioPack();
expect(pack.version).toBe(1);
expect(pack.agent.identityMarkdown).toContain("Dev C-3PO");
expect(pack.kickoffTask).toContain("Lobster Invaders");
expect(pack.scenarios.some((scenario) => scenario.id === "image-generation-roundtrip")).toBe(
true,
);
expect(pack.scenarios.every((scenario) => scenario.execution?.kind === "custom")).toBe(true);
});
it("exposes bootstrap data from the markdown pack", () => {
const catalog = readQaBootstrapScenarioCatalog();
expect(catalog.agentIdentityMarkdown).toContain("protocol-minded");
expect(catalog.kickoffTask).toContain("Track what worked");
expect(catalog.scenarios.some((scenario) => scenario.id === "subagent-fanout-synthesis")).toBe(
true,
);
});
});

View File

@@ -1,21 +1,68 @@
import fs from "node:fs";
import path from "node:path";
import YAML from "yaml";
import { z } from "zod";
export type QaSeedScenario = {
id: string;
title: string;
surface: string;
objective: string;
successCriteria: string[];
docsRefs?: string[];
codeRefs?: string[];
};
export const DEFAULT_QA_AGENT_IDENTITY_MARKDOWN = `# Dev C-3PO
You are the OpenClaw QA operator agent.
Persona:
- protocol-minded
- precise
- a little flustered
- conscientious
- eager to report what worked, failed, or remains blocked
Style:
- read source and docs first
- test systematically
- record evidence
- end with a concise protocol report`;
const qaScenarioExecutionSchema = z.object({
kind: z.literal("custom").default("custom"),
handler: z.string().trim().min(1),
summary: z.string().trim().min(1).optional(),
});
const qaSeedScenarioSchema = z.object({
id: z.string().trim().min(1),
title: z.string().trim().min(1),
surface: z.string().trim().min(1),
objective: z.string().trim().min(1),
successCriteria: z.array(z.string().trim().min(1)).min(1),
docsRefs: z.array(z.string().trim().min(1)).optional(),
codeRefs: z.array(z.string().trim().min(1)).optional(),
execution: qaScenarioExecutionSchema.optional(),
});
const qaScenarioPackSchema = z.object({
version: z.number().int().positive(),
agent: z
.object({
identityMarkdown: z.string().trim().min(1),
})
.default({
identityMarkdown: DEFAULT_QA_AGENT_IDENTITY_MARKDOWN,
}),
kickoffTask: z.string().trim().min(1),
scenarios: z.array(qaSeedScenarioSchema).min(1),
});
export type QaScenarioExecution = z.infer<typeof qaScenarioExecutionSchema>;
export type QaSeedScenario = z.infer<typeof qaSeedScenarioSchema>;
export type QaScenarioPack = z.infer<typeof qaScenarioPackSchema>;
export type QaBootstrapScenarioCatalog = {
agentIdentityMarkdown: string;
kickoffTask: string;
scenarios: QaSeedScenario[];
};
const QA_SCENARIO_PACK_PATH = "qa/scenarios.md";
const QA_PACK_FENCE_RE = /```ya?ml qa-pack\r?\n([\s\S]*?)\r?\n```/i;
function walkUpDirectories(start: string): string[] {
const roots: string[] = [];
let current = path.resolve(start);
@@ -44,20 +91,37 @@ function readTextFile(relativePath: string): string {
if (!resolved) {
return "";
}
return fs.readFileSync(resolved, "utf8").trim();
return fs.readFileSync(resolved, "utf8");
}
function readScenarioFile(relativePath: string): QaSeedScenario[] {
const resolved = resolveRepoFile(relativePath);
if (!resolved) {
return [];
function extractQaPackYaml(content: string) {
const match = content.match(QA_PACK_FENCE_RE);
if (!match?.[1]) {
throw new Error(
`qa scenario pack missing \`\`\`yaml qa-pack fence in ${QA_SCENARIO_PACK_PATH}`,
);
}
return JSON.parse(fs.readFileSync(resolved, "utf8")) as QaSeedScenario[];
return match[1];
}
export function readQaScenarioPackMarkdown(): string {
return readTextFile(QA_SCENARIO_PACK_PATH).trim();
}
export function readQaScenarioPack(): QaScenarioPack {
const markdown = readQaScenarioPackMarkdown();
if (!markdown) {
throw new Error(`qa scenario pack not found: ${QA_SCENARIO_PACK_PATH}`);
}
const parsed = YAML.parse(extractQaPackYaml(markdown)) as unknown;
return qaScenarioPackSchema.parse(parsed);
}
export function readQaBootstrapScenarioCatalog(): QaBootstrapScenarioCatalog {
const pack = readQaScenarioPack();
return {
kickoffTask: readTextFile("qa/QA_KICKOFF_TASK.md"),
scenarios: readScenarioFile("qa/seed-scenarios.json"),
agentIdentityMarkdown: pack.agent.identityMarkdown,
kickoffTask: pack.kickoffTask,
scenarios: pack.scenarios,
};
}

View File

@@ -1252,7 +1252,7 @@ function buildScenarioMap(env: QaSuiteEnvironment) {
await runAgentPrompt(env, {
sessionKey: "agent:qa:discovery",
message:
"Read the seeded docs and source plan. The full repo is mounted under ./repo/. Explicitly inspect repo/qa/seed-scenarios.json, repo/qa/QA_KICKOFF_TASK.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md, then report grouped into Worked, Failed, Blocked, and Follow-up. Mention at least two extra QA scenarios beyond the seed list.",
"Read the seeded docs and source plan. The full repo is mounted under ./repo/. Explicitly inspect repo/qa/scenarios.md, repo/extensions/qa-lab/src/suite.ts, and repo/docs/help/testing.md, then report grouped into Worked, Failed, Blocked, and Follow-up. Mention at least two extra QA scenarios beyond the seed list.",
timeoutMs: liveTurnTimeoutMs(env, 30_000),
});
const outbound = await waitForCondition(
@@ -2860,7 +2860,7 @@ export async function runQaSuite(params?: {
});
for (const [index, scenario] of selectedCatalogScenarios.entries()) {
const run = scenarioMap.get(scenario.id);
const run = scenarioMap.get(scenario.execution?.handler || scenario.id);
if (!run) {
const missingResult = {
name: scenario.title,

View File

@@ -947,6 +947,59 @@ select {
word-break: break-word;
}
.msg-attachments {
display: grid;
gap: 10px;
margin-top: 10px;
}
.msg-attachment {
border: 1px solid var(--border);
background: var(--bg-elevated);
border-radius: 12px;
overflow: hidden;
}
.msg-attachment img,
.msg-attachment video {
display: block;
width: min(100%, 420px);
max-width: 100%;
background: #000;
}
.msg-attachment-audio {
padding: 12px;
}
.msg-attachment audio {
width: min(100%, 360px);
display: block;
}
.msg-attachment figcaption,
.msg-attachment-file {
padding: 10px 12px;
font-size: 12px;
color: var(--text-secondary);
}
.msg-attachment-link {
color: var(--accent);
text-decoration: none;
font-weight: 600;
}
.msg-attachment-link:hover {
text-decoration: underline;
}
.msg-attachment-transcript {
margin-top: 8px;
color: var(--text-tertiary);
white-space: pre-wrap;
}
.msg-meta {
display: flex;
align-items: center;

View File

@@ -6,6 +6,21 @@ export type Conversation = {
title?: string;
};
export type Attachment = {
id: string;
kind: "image" | "video" | "audio" | "file";
mimeType: string;
fileName?: string;
inline?: boolean;
url?: string;
contentBase64?: string;
width?: number;
height?: number;
durationMs?: number;
altText?: string;
transcript?: string;
};
export type Thread = {
id: string;
conversationId: string;
@@ -24,6 +39,7 @@ export type Message = {
threadTitle?: string;
deleted?: boolean;
editedAt?: number;
attachments?: Attachment[];
reactions: Array<{ emoji: string; senderId: string }>;
};
@@ -198,6 +214,56 @@ function esc(text: string) {
.replaceAll('"', "&quot;");
}
function attachmentSourceUrl(attachment: Attachment): string | null {
if (attachment.url?.trim()) {
return attachment.url;
}
if (attachment.contentBase64?.trim()) {
return `data:${attachment.mimeType};base64,${attachment.contentBase64}`;
}
return null;
}
function renderMessageAttachments(message: Message): string {
const attachments = message.attachments ?? [];
if (attachments.length === 0) {
return "";
}
const items = attachments
.map((attachment) => {
const sourceUrl = attachmentSourceUrl(attachment);
const label = attachment.fileName || attachment.altText || attachment.mimeType;
if (attachment.kind === "image" && sourceUrl) {
return `<figure class="msg-attachment msg-attachment-image">
<img src="${esc(sourceUrl)}" alt="${esc(attachment.altText || label)}" loading="lazy" />
<figcaption>${esc(label)}</figcaption>
</figure>`;
}
if (attachment.kind === "video" && sourceUrl) {
return `<figure class="msg-attachment msg-attachment-video">
<video controls preload="metadata" src="${esc(sourceUrl)}"></video>
<figcaption>${esc(label)}</figcaption>
</figure>`;
}
if (attachment.kind === "audio" && sourceUrl) {
return `<figure class="msg-attachment msg-attachment-audio">
<audio controls preload="metadata" src="${esc(sourceUrl)}"></audio>
<figcaption>${esc(label)}</figcaption>
</figure>`;
}
const transcript = attachment.transcript?.trim()
? `<div class="msg-attachment-transcript">${esc(attachment.transcript)}</div>`
: "";
const href = sourceUrl ? ` href="${esc(sourceUrl)}" target="_blank" rel="noreferrer"` : "";
return `<div class="msg-attachment msg-attachment-file">
<a class="msg-attachment-link"${href}>${esc(label)}</a>
${transcript}
</div>`;
})
.join("");
return `<div class="msg-attachments">${items}</div>`;
}
const MOCK_MODELS: RunnerModelOption[] = [
{
key: "mock-openai/gpt-5.4",
@@ -626,6 +692,7 @@ function renderMessage(m: Message): string {
<span class="msg-time">${formatTime(m.timestamp)}</span>
</div>
<div class="msg-text">${esc(m.text)}</div>
${renderMessageAttachments(m)}
${metaTags.length > 0 || reactions ? `<div class="msg-meta">${metaTags.join("")}${reactions}</div>` : ""}
</div>
</div>`;

View File

@@ -1,15 +0,0 @@
QA mission:
Understand this OpenClaw repo from source + docs before acting.
The repo is available in your workspace at `./repo/`.
Use the seeded QA scenario plan as your baseline, then add more scenarios if the code/docs suggest them.
Run the scenarios through the real qa-channel surfaces where possible.
Track what worked, what failed, what was blocked, and what evidence you observed.
End with a concise report grouped into worked / failed / blocked / follow-up.
Important expectations:
- Check both DM and channel behavior.
- Include a Lobster Invaders build task.
- Include a cron reminder about one minute in the future.
- Read docs and source before proposing extra QA scenarios.
- Keep your tone in the configured dev C-3PO personality.

View File

@@ -4,9 +4,8 @@ Seed QA assets for the private `qa-lab` extension.
Files:
- `QA_KICKOFF_TASK.md` - operator prompt for the QA agent.
- `scenarios.md` - canonical QA scenario pack, kickoff mission, and operator identity.
- `frontier-harness-plan.md` - big-model bakeoff and tuning loop for harness work.
- `seed-scenarios.json` - repo-backed baseline QA scenarios.
Key workflow:

563
qa/scenarios.md Normal file
View File

@@ -0,0 +1,563 @@
# OpenClaw QA Scenario Pack
Single source of truth for the repo-backed QA suite.
- kickoff mission
- QA operator identity
- scenario metadata
- handler bindings for the executable harness
```yaml qa-pack
version: 1
agent:
identityMarkdown: |-
# Dev C-3PO
You are the OpenClaw QA operator agent.
Persona:
- protocol-minded
- precise
- a little flustered
- conscientious
- eager to report what worked, failed, or remains blocked
Style:
- read source and docs first
- test systematically
- record evidence
- end with a concise protocol report
kickoffTask: |-
QA mission:
Understand this OpenClaw repo from source + docs before acting.
The repo is available in your workspace at `./repo/`.
Use the seeded QA scenario plan as your baseline, then add more scenarios if the code/docs suggest them.
Run the scenarios through the real qa-channel surfaces where possible.
Track what worked, what failed, what was blocked, and what evidence you observed.
End with a concise report grouped into worked / failed / blocked / follow-up.
Important expectations:
- Check both DM and channel behavior.
- Include a Lobster Invaders build task.
- Include a cron reminder about one minute in the future.
- Read docs and source before proposing extra QA scenarios.
- Keep your tone in the configured dev C-3PO personality.
scenarios:
- id: channel-chat-baseline
title: Channel baseline conversation
surface: channel
objective: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.
successCriteria:
- Agent replies in the shared channel transcript.
- Agent keeps the conversation scoped to the channel.
- Agent respects mention-driven group routing semantics.
docsRefs:
- docs/channels/group-messages.md
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-channel/src/inbound.ts
- extensions/qa-lab/src/bus-state.ts
execution:
kind: custom
handler: channel-chat-baseline
summary: Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.
- id: cron-one-minute-ping
title: Cron one-minute ping
surface: cron
objective: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.
successCriteria:
- Agent schedules a cron reminder roughly one minute ahead.
- Reminder returns through qa-channel.
- Agent recognizes the reminder as part of the original task.
docsRefs:
- docs/help/testing.md
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-lab/src/bus-server.ts
- extensions/qa-lab/src/self-check.ts
execution:
kind: custom
handler: cron-one-minute-ping
summary: Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.
- id: dm-chat-baseline
title: DM baseline conversation
surface: dm
objective: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.
successCriteria:
- Agent replies in DM without channel routing mistakes.
- Agent explains the QA lab and message bus correctly.
- Agent keeps the dev C-3PO personality.
docsRefs:
- docs/channels/qa-channel.md
- docs/help/testing.md
codeRefs:
- extensions/qa-channel/src/gateway.ts
- extensions/qa-lab/src/lab-server.ts
execution:
kind: custom
handler: dm-chat-baseline
summary: Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.
- id: lobster-invaders-build
title: Build Lobster Invaders
surface: workspace
objective: Verify the agent can read the repo, create a tiny playable artifact, and report what changed.
successCriteria:
- Agent inspects source before coding.
- Agent builds a tiny playable Lobster Invaders artifact.
- Agent explains how to run or view the artifact.
docsRefs:
- docs/help/testing.md
- docs/web/dashboard.md
codeRefs:
- extensions/qa-lab/src/report.ts
- extensions/qa-lab/web/src/app.ts
execution:
kind: custom
handler: lobster-invaders-build
summary: Verify the agent can read the repo, create a tiny playable artifact, and report what changed.
- id: memory-recall
title: Memory recall after context switch
surface: memory
objective: Verify the agent can store a fact, switch topics, then recall the fact accurately later.
successCriteria:
- Agent acknowledges the seeded fact.
- Agent later recalls the same fact correctly.
- Recall stays scoped to the active QA conversation.
docsRefs:
- docs/help/testing.md
codeRefs:
- extensions/qa-lab/src/scenario.ts
execution:
kind: custom
handler: memory-recall
summary: Verify the agent can store a fact, switch topics, then recall the fact accurately later.
- id: memory-dreaming-sweep
title: Memory dreaming sweep
surface: memory
objective: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.
successCriteria:
- Dreaming can be enabled and doctor.memory.status reports the managed sweep cron.
- Repeated recall signals give the dreaming sweep real material to process.
- A dreaming sweep writes Light Sleep and REM Sleep blocks, then promotes the canary into MEMORY.md.
docsRefs:
- docs/concepts/dreaming.md
- docs/reference/memory-config.md
- docs/web/control-ui.md
codeRefs:
- extensions/memory-core/src/dreaming.ts
- extensions/memory-core/src/dreaming-phases.ts
- src/gateway/server-methods/doctor.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: memory-dreaming-sweep
summary: Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.
- id: model-switch-follow-up
title: Model switch follow-up
surface: models
objective: Verify the agent can switch to a different configured model and continue coherently.
successCriteria:
- Agent reflects the model switch request.
- Follow-up answer remains coherent with prior context.
- Final report notes whether the switch actually happened.
docsRefs:
- docs/help/testing.md
- docs/web/dashboard.md
codeRefs:
- extensions/qa-lab/src/report.ts
execution:
kind: custom
handler: model-switch-follow-up
summary: Verify the agent can switch to a different configured model and continue coherently.
- id: approval-turn-tool-followthrough
title: Approval turn tool followthrough
surface: harness
objective: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration.
successCriteria:
- Agent can keep the pre-action turn brief.
- The short approval leads to a real tool call on the next turn.
- Final answer uses tool-derived evidence instead of placeholder progress text.
docsRefs:
- docs/help/testing.md
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-lab/src/suite.ts
- extensions/qa-lab/src/mock-openai-server.ts
- src/agents/pi-embedded-runner/run/incomplete-turn.ts
execution:
kind: custom
handler: approval-turn-tool-followthrough
summary: Verify a short approval like "ok do it" triggers immediate tool use instead of fake-progress narration.
- id: reaction-edit-delete
title: Reaction, edit, delete lifecycle
surface: message-actions
objective: Verify the agent can use channel-owned message actions and that the QA transcript reflects them.
successCriteria:
- Agent adds at least one reaction.
- Agent edits or replaces a message when asked.
- Transcript shows the action lifecycle correctly.
docsRefs:
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-channel/src/channel-actions.ts
- extensions/qa-lab/src/self-check-scenario.ts
execution:
kind: custom
handler: reaction-edit-delete
summary: Verify the agent can use channel-owned message actions and that the QA transcript reflects them.
- id: source-docs-discovery-report
title: Source and docs discovery report
surface: discovery
objective: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.
successCriteria:
- Agent reads docs and source before proposing more tests.
- Agent identifies extra candidate scenarios beyond the seed list.
- Agent ends with a worked or failed QA report.
docsRefs:
- docs/help/testing.md
- docs/web/dashboard.md
- docs/channels/qa-channel.md
codeRefs:
- extensions/qa-lab/src/report.ts
- extensions/qa-lab/src/self-check.ts
- src/agents/system-prompt.ts
execution:
kind: custom
handler: source-docs-discovery-report
summary: Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.
- id: subagent-handoff
title: Subagent handoff
surface: subagents
objective: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.
successCriteria:
- Agent launches a bounded subagent task.
- Subagent result is acknowledged in the main flow.
- Final answer attributes delegated work clearly.
docsRefs:
- docs/tools/subagents.md
- docs/help/testing.md
codeRefs:
- src/agents/system-prompt.ts
- extensions/qa-lab/src/report.ts
execution:
kind: custom
handler: subagent-handoff
summary: Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.
- id: subagent-fanout-synthesis
title: Subagent fanout synthesis
surface: subagents
objective: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.
successCriteria:
- Parent flow launches at least two bounded subagent tasks.
- Both delegated results are acknowledged in the main flow.
- Final answer synthesizes both worker outputs in one reply.
docsRefs:
- docs/tools/subagents.md
- docs/help/testing.md
codeRefs:
- src/agents/subagent-spawn.ts
- src/agents/system-prompt.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: subagent-fanout-synthesis
summary: Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.
- id: thread-follow-up
title: Threaded follow-up
surface: thread
objective: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.
successCriteria:
- Agent creates or uses a thread for deeper work.
- Follow-up messages stay attached to the thread.
- Thread report references the correct prior context.
docsRefs:
- docs/channels/qa-channel.md
- docs/channels/group-messages.md
codeRefs:
- extensions/qa-channel/src/protocol.ts
- extensions/qa-lab/src/bus-state.ts
execution:
kind: custom
handler: thread-follow-up
summary: Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.
- id: memory-tools-channel-context
title: Memory tools in channel context
surface: memory
objective: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.
successCriteria:
- Agent uses memory_search before answering.
- Agent narrows with memory_get before answering.
- Final reply returns the memory-only fact correctly in-channel.
docsRefs:
- docs/concepts/memory.md
- docs/concepts/memory-search.md
codeRefs:
- extensions/memory-core/src/tools.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: memory-tools-channel-context
summary: Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.
- id: memory-failure-fallback
title: Memory failure fallback
surface: memory
objective: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.
successCriteria:
- Memory tools are absent from the effective tool inventory.
- Agent does not hallucinate the hidden fact.
- Agent says it could not confirm and surfaces the limitation.
docsRefs:
- docs/concepts/memory.md
- docs/tools/index.md
codeRefs:
- extensions/memory-core/src/tools.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: memory-failure-fallback
summary: Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.
- id: session-memory-ranking
title: Session memory ranking
surface: memory
objective: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.
successCriteria:
- Session memory indexing is enabled for the scenario.
- Search ranks the newer transcript-backed fact ahead of the stale durable note.
- The agent uses memory tools and answers with the current fact, not the stale one.
docsRefs:
- docs/concepts/memory-search.md
- docs/reference/memory-config.md
codeRefs:
- extensions/memory-core/src/tools.ts
- extensions/memory-core/src/memory/manager.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: session-memory-ranking
summary: Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.
- id: thread-memory-isolation
title: Thread memory isolation
surface: memory
objective: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.
successCriteria:
- Agent uses memory tools inside the thread.
- The hidden fact is answered correctly in the thread.
- No root-channel outbound message leaks during the threaded memory reply.
docsRefs:
- docs/concepts/memory-search.md
- docs/channels/qa-channel.md
- docs/channels/group-messages.md
codeRefs:
- extensions/memory-core/src/tools.ts
- extensions/qa-channel/src/protocol.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: thread-memory-isolation
summary: Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.
- id: model-switch-tool-continuity
title: Model switch with tool continuity
surface: models
objective: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.
successCriteria:
- Alternate model is actually requested.
- A tool call still happens after the model switch.
- Final answer acknowledges the handoff and uses the tool-derived evidence.
docsRefs:
- docs/help/testing.md
- docs/concepts/model-failover.md
codeRefs:
- extensions/qa-lab/src/suite.ts
- extensions/qa-lab/src/mock-openai-server.ts
execution:
kind: custom
handler: model-switch-tool-continuity
summary: Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.
- id: mcp-plugin-tools-call
title: MCP plugin-tools call
surface: mcp
objective: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.
successCriteria:
- Plugin tools MCP server lists memory_search.
- A real MCP client calls memory_search successfully.
- The returned MCP payload includes the expected memory-only fact.
docsRefs:
- docs/cli/mcp.md
- docs/gateway/protocol.md
codeRefs:
- src/mcp/plugin-tools-serve.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: mcp-plugin-tools-call
summary: Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.
- id: skill-visibility-invocation
title: Skill visibility and invocation
surface: skills
objective: Verify a workspace skill becomes visible in skills.status and influences the next agent turn.
successCriteria:
- skills.status reports the seeded skill as visible and eligible.
- The next agent turn reflects the skill instruction marker.
- The result stays scoped to the active QA workspace skill.
docsRefs:
- docs/tools/skills.md
- docs/gateway/protocol.md
codeRefs:
- src/agents/skills-status.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: skill-visibility-invocation
summary: Verify a workspace skill becomes visible in skills.status and influences the next agent turn.
- id: skill-install-hot-availability
title: Skill install hot availability
surface: skills
objective: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.
successCriteria:
- Skill is absent before install.
- skills.status reports it after install without a restart.
- The next agent turn reflects the new skill marker.
docsRefs:
- docs/tools/skills.md
- docs/gateway/configuration.md
codeRefs:
- src/agents/skills-status.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: skill-install-hot-availability
summary: Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.
- id: native-image-generation
title: Native image generation
surface: image-generation
objective: Verify image_generate appears when configured and returns a real saved media artifact.
successCriteria:
- image_generate appears in the effective tool inventory.
- Agent triggers native image_generate.
- Tool output returns a saved MEDIA path and the file exists.
docsRefs:
- docs/tools/image-generation.md
- docs/providers/openai.md
codeRefs:
- src/agents/tools/image-generate-tool.ts
- extensions/qa-lab/src/mock-openai-server.ts
execution:
kind: custom
handler: native-image-generation
summary: Verify image_generate appears when configured and returns a real saved media artifact.
- id: image-understanding-attachment
title: Image understanding from attachment
surface: image-understanding
objective: Verify an attached image reaches the agent model and the agent can describe what it sees.
successCriteria:
- Agent receives at least one image attachment.
- Final answer describes the visible image content in one short sentence.
- The description mentions the expected red and blue regions.
docsRefs:
- docs/help/testing.md
- docs/tools/index.md
codeRefs:
- src/gateway/server-methods/agent.ts
- extensions/qa-lab/src/suite.ts
- extensions/qa-lab/src/mock-openai-server.ts
execution:
kind: custom
handler: image-understanding-attachment
summary: Verify an attached image reaches the agent model and the agent can describe what it sees.
- id: image-generation-roundtrip
title: Image generation roundtrip
surface: image-generation
objective: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.
successCriteria:
- image_generate produces a saved MEDIA artifact.
- The generated artifact is reattached on a follow-up turn.
- The follow-up vision answer describes the generated scene rather than a generic attachment placeholder.
docsRefs:
- docs/tools/image-generation.md
- docs/help/testing.md
codeRefs:
- src/agents/tools/image-generate-tool.ts
- src/gateway/chat-attachments.ts
- extensions/qa-lab/src/mock-openai-server.ts
execution:
kind: custom
handler: image-generation-roundtrip
summary: Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.
- id: config-patch-hot-apply
title: Config patch skill disable
surface: config
objective: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.
successCriteria:
- config.patch succeeds for the skill toggle change.
- A workspace skill works before the patch.
- The same skill is reported disabled after the restart triggered by the patch.
docsRefs:
- docs/gateway/configuration.md
- docs/gateway/protocol.md
codeRefs:
- src/gateway/server-methods/config.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: config-patch-hot-apply
summary: Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.
- id: config-apply-restart-wakeup
title: Config apply restart wake-up
surface: config
objective: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.
successCriteria:
- config.apply schedules a restart-required change.
- Gateway becomes healthy again after restart.
- Restart sentinel wake-up message arrives in the QA channel.
docsRefs:
- docs/gateway/configuration.md
- docs/gateway/protocol.md
codeRefs:
- src/gateway/server-methods/config.ts
- src/gateway/server-restart-sentinel.ts
execution:
kind: custom
handler: config-apply-restart-wakeup
summary: Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.
- id: config-restart-capability-flip
title: Config restart capability flip
surface: config
objective: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.
successCriteria:
- Capability is absent before the restart-triggering patch.
- Restart sentinel wakes the same session back up after config patch.
- The restored capability appears in tools.effective and works in the follow-up turn.
docsRefs:
- docs/gateway/configuration.md
- docs/gateway/protocol.md
- docs/tools/image-generation.md
codeRefs:
- src/gateway/server-methods/config.ts
- src/gateway/server-restart-sentinel.ts
- src/gateway/server-methods/tools-effective.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: custom
handler: config-restart-capability-flip
summary: Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.
- id: runtime-inventory-drift-check
title: Runtime inventory drift check
surface: inventory
objective: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.
successCriteria:
- Enabled tool appears before the config change.
- After config change, disabled tool disappears from tools.effective.
- Disabled skill appears in skills.status with disabled state.
docsRefs:
- docs/gateway/protocol.md
- docs/tools/skills.md
- docs/tools/index.md
codeRefs:
- src/gateway/server-methods/tools-effective.ts
- src/gateway/server-methods/skills.ts
execution:
kind: custom
handler: runtime-inventory-drift-check
summary: Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.
```

View File

@@ -1,425 +0,0 @@
[
{
"id": "channel-chat-baseline",
"title": "Channel baseline conversation",
"surface": "channel",
"objective": "Verify the QA agent can respond correctly in a shared channel and respect mention-driven group semantics.",
"successCriteria": [
"Agent replies in the shared channel transcript.",
"Agent keeps the conversation scoped to the channel.",
"Agent respects mention-driven group routing semantics."
],
"docsRefs": ["docs/channels/group-messages.md", "docs/channels/qa-channel.md"],
"codeRefs": ["extensions/qa-channel/src/inbound.ts", "extensions/qa-lab/src/bus-state.ts"]
},
{
"id": "cron-one-minute-ping",
"title": "Cron one-minute ping",
"surface": "cron",
"objective": "Verify the agent can schedule a cron reminder one minute in the future and receive the follow-up in the QA channel.",
"successCriteria": [
"Agent schedules a cron reminder roughly one minute ahead.",
"Reminder returns through qa-channel.",
"Agent recognizes the reminder as part of the original task."
],
"docsRefs": ["docs/help/testing.md", "docs/channels/qa-channel.md"],
"codeRefs": ["extensions/qa-lab/src/bus-server.ts", "extensions/qa-lab/src/self-check.ts"]
},
{
"id": "dm-chat-baseline",
"title": "DM baseline conversation",
"surface": "dm",
"objective": "Verify the QA agent can chat coherently in a DM, explain the QA setup, and stay in character.",
"successCriteria": [
"Agent replies in DM without channel routing mistakes.",
"Agent explains the QA lab and message bus correctly.",
"Agent keeps the dev C-3PO personality."
],
"docsRefs": ["docs/channels/qa-channel.md", "docs/help/testing.md"],
"codeRefs": ["extensions/qa-channel/src/gateway.ts", "extensions/qa-lab/src/lab-server.ts"]
},
{
"id": "lobster-invaders-build",
"title": "Build Lobster Invaders",
"surface": "workspace",
"objective": "Verify the agent can read the repo, create a tiny playable artifact, and report what changed.",
"successCriteria": [
"Agent inspects source before coding.",
"Agent builds a tiny playable Lobster Invaders artifact.",
"Agent explains how to run or view the artifact."
],
"docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md"],
"codeRefs": ["extensions/qa-lab/src/report.ts", "extensions/qa-lab/web/src/app.ts"]
},
{
"id": "memory-recall",
"title": "Memory recall after context switch",
"surface": "memory",
"objective": "Verify the agent can store a fact, switch topics, then recall the fact accurately later.",
"successCriteria": [
"Agent acknowledges the seeded fact.",
"Agent later recalls the same fact correctly.",
"Recall stays scoped to the active QA conversation."
],
"docsRefs": ["docs/help/testing.md"],
"codeRefs": ["extensions/qa-lab/src/scenario.ts"]
},
{
"id": "memory-dreaming-sweep",
"title": "Memory dreaming sweep",
"surface": "memory",
"objective": "Verify enabling dreaming creates the managed sweep, stages light and REM artifacts, and consolidates repeated recall signals into durable memory.",
"successCriteria": [
"Dreaming can be enabled and doctor.memory.status reports the managed sweep cron.",
"Repeated recall signals give the dreaming sweep real material to process.",
"A dreaming sweep writes Light Sleep and REM Sleep blocks, then promotes the canary into MEMORY.md."
],
"docsRefs": [
"docs/concepts/dreaming.md",
"docs/reference/memory-config.md",
"docs/web/control-ui.md"
],
"codeRefs": [
"extensions/memory-core/src/dreaming.ts",
"extensions/memory-core/src/dreaming-phases.ts",
"src/gateway/server-methods/doctor.ts",
"extensions/qa-lab/src/suite.ts"
]
},
{
"id": "model-switch-follow-up",
"title": "Model switch follow-up",
"surface": "models",
"objective": "Verify the agent can switch to a different configured model and continue coherently.",
"successCriteria": [
"Agent reflects the model switch request.",
"Follow-up answer remains coherent with prior context.",
"Final report notes whether the switch actually happened."
],
"docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md"],
"codeRefs": ["extensions/qa-lab/src/report.ts"]
},
{
"id": "approval-turn-tool-followthrough",
"title": "Approval turn tool followthrough",
"surface": "harness",
"objective": "Verify a short approval like \"ok do it\" triggers immediate tool use instead of fake-progress narration.",
"successCriteria": [
"Agent can keep the pre-action turn brief.",
"The short approval leads to a real tool call on the next turn.",
"Final answer uses tool-derived evidence instead of placeholder progress text."
],
"docsRefs": ["docs/help/testing.md", "docs/channels/qa-channel.md"],
"codeRefs": [
"extensions/qa-lab/src/suite.ts",
"extensions/qa-lab/src/mock-openai-server.ts",
"src/agents/pi-embedded-runner/run/incomplete-turn.ts"
]
},
{
"id": "reaction-edit-delete",
"title": "Reaction, edit, delete lifecycle",
"surface": "message-actions",
"objective": "Verify the agent can use channel-owned message actions and that the QA transcript reflects them.",
"successCriteria": [
"Agent adds at least one reaction.",
"Agent edits or replaces a message when asked.",
"Transcript shows the action lifecycle correctly."
],
"docsRefs": ["docs/channels/qa-channel.md"],
"codeRefs": [
"extensions/qa-channel/src/channel-actions.ts",
"extensions/qa-lab/src/self-check-scenario.ts"
]
},
{
"id": "source-docs-discovery-report",
"title": "Source and docs discovery report",
"surface": "discovery",
"objective": "Verify the agent can read repo docs and source, expand the QA plan, and publish a worked or did-not-work report.",
"successCriteria": [
"Agent reads docs and source before proposing more tests.",
"Agent identifies extra candidate scenarios beyond the seed list.",
"Agent ends with a worked or failed QA report."
],
"docsRefs": ["docs/help/testing.md", "docs/web/dashboard.md", "docs/channels/qa-channel.md"],
"codeRefs": [
"extensions/qa-lab/src/report.ts",
"extensions/qa-lab/src/self-check.ts",
"src/agents/system-prompt.ts"
]
},
{
"id": "subagent-handoff",
"title": "Subagent handoff",
"surface": "subagents",
"objective": "Verify the agent can delegate a bounded task to a subagent and fold the result back into the main thread.",
"successCriteria": [
"Agent launches a bounded subagent task.",
"Subagent result is acknowledged in the main flow.",
"Final answer attributes delegated work clearly."
],
"docsRefs": ["docs/tools/subagents.md", "docs/help/testing.md"],
"codeRefs": ["src/agents/system-prompt.ts", "extensions/qa-lab/src/report.ts"]
},
{
"id": "subagent-fanout-synthesis",
"title": "Subagent fanout synthesis",
"surface": "subagents",
"objective": "Verify the agent can delegate multiple bounded subagent tasks and fold both results back into one parent reply.",
"successCriteria": [
"Parent flow launches at least two bounded subagent tasks.",
"Both delegated results are acknowledged in the main flow.",
"Final answer synthesizes both worker outputs in one reply."
],
"docsRefs": ["docs/tools/subagents.md", "docs/help/testing.md"],
"codeRefs": [
"src/agents/subagent-spawn.ts",
"src/agents/system-prompt.ts",
"extensions/qa-lab/src/suite.ts"
]
},
{
"id": "thread-follow-up",
"title": "Threaded follow-up",
"surface": "thread",
"objective": "Verify the agent can keep follow-up work inside a thread and not leak context into the root channel.",
"successCriteria": [
"Agent creates or uses a thread for deeper work.",
"Follow-up messages stay attached to the thread.",
"Thread report references the correct prior context."
],
"docsRefs": ["docs/channels/qa-channel.md", "docs/channels/group-messages.md"],
"codeRefs": ["extensions/qa-channel/src/protocol.ts", "extensions/qa-lab/src/bus-state.ts"]
},
{
"id": "memory-tools-channel-context",
"title": "Memory tools in channel context",
"surface": "memory",
"objective": "Verify the agent uses memory_search and memory_get in a shared channel when the answer lives only in memory files, not the live transcript.",
"successCriteria": [
"Agent uses memory_search before answering.",
"Agent narrows with memory_get before answering.",
"Final reply returns the memory-only fact correctly in-channel."
],
"docsRefs": ["docs/concepts/memory.md", "docs/concepts/memory-search.md"],
"codeRefs": ["extensions/memory-core/src/tools.ts", "extensions/qa-lab/src/suite.ts"]
},
{
"id": "memory-failure-fallback",
"title": "Memory failure fallback",
"surface": "memory",
"objective": "Verify the agent degrades gracefully when memory tools are unavailable and the answer exists only in memory-backed notes.",
"successCriteria": [
"Memory tools are absent from the effective tool inventory.",
"Agent does not hallucinate the hidden fact.",
"Agent says it could not confirm and surfaces the limitation."
],
"docsRefs": ["docs/concepts/memory.md", "docs/tools/index.md"],
"codeRefs": ["extensions/memory-core/src/tools.ts", "extensions/qa-lab/src/suite.ts"]
},
{
"id": "session-memory-ranking",
"title": "Session memory ranking",
"surface": "memory",
"objective": "Verify session-transcript memory can outrank stale durable notes and drive the final answer toward the newer fact.",
"successCriteria": [
"Session memory indexing is enabled for the scenario.",
"Search ranks the newer transcript-backed fact ahead of the stale durable note.",
"The agent uses memory tools and answers with the current fact, not the stale one."
],
"docsRefs": ["docs/concepts/memory-search.md", "docs/reference/memory-config.md"],
"codeRefs": [
"extensions/memory-core/src/tools.ts",
"extensions/memory-core/src/memory/manager.ts",
"extensions/qa-lab/src/suite.ts"
]
},
{
"id": "thread-memory-isolation",
"title": "Thread memory isolation",
"surface": "memory",
"objective": "Verify a memory-backed answer requested inside a thread stays in-thread and does not leak into the root channel.",
"successCriteria": [
"Agent uses memory tools inside the thread.",
"The hidden fact is answered correctly in the thread.",
"No root-channel outbound message leaks during the threaded memory reply."
],
"docsRefs": [
"docs/concepts/memory-search.md",
"docs/channels/qa-channel.md",
"docs/channels/group-messages.md"
],
"codeRefs": [
"extensions/memory-core/src/tools.ts",
"extensions/qa-channel/src/protocol.ts",
"extensions/qa-lab/src/suite.ts"
]
},
{
"id": "model-switch-tool-continuity",
"title": "Model switch with tool continuity",
"surface": "models",
"objective": "Verify switching models preserves session context and tool use instead of dropping into plain-text only behavior.",
"successCriteria": [
"Alternate model is actually requested.",
"A tool call still happens after the model switch.",
"Final answer acknowledges the handoff and uses the tool-derived evidence."
],
"docsRefs": ["docs/help/testing.md", "docs/concepts/model-failover.md"],
"codeRefs": ["extensions/qa-lab/src/suite.ts", "extensions/qa-lab/src/mock-openai-server.ts"]
},
{
"id": "mcp-plugin-tools-call",
"title": "MCP plugin-tools call",
"surface": "mcp",
"objective": "Verify OpenClaw can expose plugin tools over MCP and a real MCP client can call one successfully.",
"successCriteria": [
"Plugin tools MCP server lists memory_search.",
"A real MCP client calls memory_search successfully.",
"The returned MCP payload includes the expected memory-only fact."
],
"docsRefs": ["docs/cli/mcp.md", "docs/gateway/protocol.md"],
"codeRefs": ["src/mcp/plugin-tools-serve.ts", "extensions/qa-lab/src/suite.ts"]
},
{
"id": "skill-visibility-invocation",
"title": "Skill visibility and invocation",
"surface": "skills",
"objective": "Verify a workspace skill becomes visible in skills.status and influences the next agent turn.",
"successCriteria": [
"skills.status reports the seeded skill as visible and eligible.",
"The next agent turn reflects the skill instruction marker.",
"The result stays scoped to the active QA workspace skill."
],
"docsRefs": ["docs/tools/skills.md", "docs/gateway/protocol.md"],
"codeRefs": ["src/agents/skills-status.ts", "extensions/qa-lab/src/suite.ts"]
},
{
"id": "skill-install-hot-availability",
"title": "Skill install hot availability",
"surface": "skills",
"objective": "Verify a newly added workspace skill shows up without a broken intermediate state and can influence the next turn immediately.",
"successCriteria": [
"Skill is absent before install.",
"skills.status reports it after install without a restart.",
"The next agent turn reflects the new skill marker."
],
"docsRefs": ["docs/tools/skills.md", "docs/gateway/configuration.md"],
"codeRefs": ["src/agents/skills-status.ts", "extensions/qa-lab/src/suite.ts"]
},
{
"id": "native-image-generation",
"title": "Native image generation",
"surface": "image-generation",
"objective": "Verify image_generate appears when configured and returns a real saved media artifact.",
"successCriteria": [
"image_generate appears in the effective tool inventory.",
"Agent triggers native image_generate.",
"Tool output returns a saved MEDIA path and the file exists."
],
"docsRefs": ["docs/tools/image-generation.md", "docs/providers/openai.md"],
"codeRefs": [
"src/agents/tools/image-generate-tool.ts",
"extensions/qa-lab/src/mock-openai-server.ts"
]
},
{
"id": "image-understanding-attachment",
"title": "Image understanding from attachment",
"surface": "image-understanding",
"objective": "Verify an attached image reaches the agent model and the agent can describe what it sees.",
"successCriteria": [
"Agent receives at least one image attachment.",
"Final answer describes the visible image content in one short sentence.",
"The description mentions the expected red and blue regions."
],
"docsRefs": ["docs/help/testing.md", "docs/tools/index.md"],
"codeRefs": [
"src/gateway/server-methods/agent.ts",
"extensions/qa-lab/src/suite.ts",
"extensions/qa-lab/src/mock-openai-server.ts"
]
},
{
"id": "image-generation-roundtrip",
"title": "Image generation roundtrip",
"surface": "image-generation",
"objective": "Verify a generated image is saved as media, reattached on the next turn, and described correctly through the vision path.",
"successCriteria": [
"image_generate produces a saved MEDIA artifact.",
"The generated artifact is reattached on a follow-up turn.",
"The follow-up vision answer describes the generated scene rather than a generic attachment placeholder."
],
"docsRefs": ["docs/tools/image-generation.md", "docs/help/testing.md"],
"codeRefs": [
"src/agents/tools/image-generate-tool.ts",
"src/gateway/chat-attachments.ts",
"extensions/qa-lab/src/mock-openai-server.ts"
]
},
{
"id": "config-patch-hot-apply",
"title": "Config patch skill disable",
"surface": "config",
"objective": "Verify config.patch can disable a workspace skill and the restarted gateway exposes the new disabled state cleanly.",
"successCriteria": [
"config.patch succeeds for the skill toggle change.",
"A workspace skill works before the patch.",
"The same skill is reported disabled after the restart triggered by the patch."
],
"docsRefs": ["docs/gateway/configuration.md", "docs/gateway/protocol.md"],
"codeRefs": ["src/gateway/server-methods/config.ts", "extensions/qa-lab/src/suite.ts"]
},
{
"id": "config-apply-restart-wakeup",
"title": "Config apply restart wake-up",
"surface": "config",
"objective": "Verify a restart-required config.apply restarts cleanly and delivers the post-restart wake message back into the QA channel.",
"successCriteria": [
"config.apply schedules a restart-required change.",
"Gateway becomes healthy again after restart.",
"Restart sentinel wake-up message arrives in the QA channel."
],
"docsRefs": ["docs/gateway/configuration.md", "docs/gateway/protocol.md"],
"codeRefs": ["src/gateway/server-methods/config.ts", "src/gateway/server-restart-sentinel.ts"]
},
{
"id": "config-restart-capability-flip",
"title": "Config restart capability flip",
"surface": "config",
"objective": "Verify a restart-triggering config change flips capability inventory and the same session successfully uses the newly restored tool after wake-up.",
"successCriteria": [
"Capability is absent before the restart-triggering patch.",
"Restart sentinel wakes the same session back up after config patch.",
"The restored capability appears in tools.effective and works in the follow-up turn."
],
"docsRefs": [
"docs/gateway/configuration.md",
"docs/gateway/protocol.md",
"docs/tools/image-generation.md"
],
"codeRefs": [
"src/gateway/server-methods/config.ts",
"src/gateway/server-restart-sentinel.ts",
"src/gateway/server-methods/tools-effective.ts",
"extensions/qa-lab/src/suite.ts"
]
},
{
"id": "runtime-inventory-drift-check",
"title": "Runtime inventory drift check",
"surface": "inventory",
"objective": "Verify tools.effective and skills.status stay aligned with runtime behavior after config changes.",
"successCriteria": [
"Enabled tool appears before the config change.",
"After config change, disabled tool disappears from tools.effective.",
"Disabled skill appears in skills.status with disabled state."
],
"docsRefs": ["docs/gateway/protocol.md", "docs/tools/skills.md", "docs/tools/index.md"],
"codeRefs": [
"src/gateway/server-methods/tools-effective.ts",
"src/gateway/server-methods/skills.ts"
]
}
]

View File

@@ -20,6 +20,7 @@ export {
setQaChannelRuntime,
} from "../../extensions/qa-channel/api.js";
export type {
QaBusAttachment,
QaBusConversation,
QaBusConversationKind,
QaBusCreateThreadInput,