openclaw/docs/tools/image-generation.md at 425592cf9cebaceec1f2a2ebcb4955cbf709aafd

Mirrors/openclaw

Fork 0

mirror of https://github.com/openclaw/openclaw.git synced 2026-04-11 11:53:32 +02:00

Files

Peter Steinberger 425592cf9c refactor: share media normalization across runtimes

2026-04-06 23:50:27 +01:00

8.0 KiB

Raw Blame History

summary, read_when, title

summary

read_when

title

Generate and edit images using configured providers (OpenAI, Google Gemini, fal, MiniMax, ComfyUI, Vydra)

Generating images via the agent

Configuring image generation providers and models

Understanding the image_generate tool parameters

Image Generation

The image_generate tool lets the agent create and edit images using your configured providers. Generated images are delivered automatically as media attachments in the agent's reply.

The tool only appears when at least one image generation provider is available. If you don't see `image_generate` in your agent's tools, configure `agents.defaults.imageGenerationModel` or set up a provider API key.

Quick start

Set an API key for at least one provider (for example OPENAI_API_KEY or GEMINI_API_KEY).
Optionally set your preferred model:

{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "openai/gpt-image-1",
      },
    },
  },
}

Ask the agent: "Generate an image of a friendly lobster mascot."

The agent calls image_generate automatically. No tool allow-listing needed — it's enabled by default when a provider is available.

Supported providers

Provider	Default model	Edit support	API key
OpenAI	`gpt-image-1`	Yes (up to 5 images)	`OPENAI_API_KEY`
Google	`gemini-3.1-flash-image-preview`	Yes	`GEMINI_API_KEY` or `GOOGLE_API_KEY`
fal	`fal-ai/flux/dev`	Yes	`FAL_KEY`
MiniMax	`image-01`	Yes (subject reference)	`MINIMAX_API_KEY` or MiniMax OAuth (`minimax-portal`)
ComfyUI	`workflow`	Yes (1 image, workflow-configured)	`COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud
Vydra	`grok-imagine`	No	`VYDRA_API_KEY`

Use action: "list" to inspect available providers and models at runtime:

/tool image_generate action=list

Tool parameters

Parameter	Type	Description
`prompt`	string	Image generation prompt (required for `action: "generate"`)
`action`	string	`"generate"` (default) or `"list"` to inspect providers
`model`	string	Provider/model override, e.g. `openai/gpt-image-1`
`image`	string	Single reference image path or URL for edit mode
`images`	string[]	Multiple reference images for edit mode (up to 5)
`size`	string	Size hint: `1024x1024`, `1536x1024`, `1024x1536`, `1024x1792`, `1792x1024`
`aspectRatio`	string	Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`
`resolution`	string	Resolution hint: `1K`, `2K`, or `4K`
`count`	number	Number of images to generate (1–4)
`filename`	string	Output filename hint

Not all providers support all parameters. When a fallback provider supports a nearby geometry option instead of the exact requested one, OpenClaw remaps to the closest supported size, aspect ratio, or resolution before submission. Truly unsupported overrides are still reported in the tool result.

Tool results report the applied settings. When OpenClaw remaps geometry during provider fallback, the returned size, aspectRatio, and resolution values reflect what was actually sent, and details.normalization captures the requested-to-applied translation.

Configuration

Model selection

{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "openai/gpt-image-1",
        fallbacks: ["google/gemini-3.1-flash-image-preview", "fal/fal-ai/flux/dev"],
      },
    },
  },
}

Provider selection order

When generating an image, OpenClaw tries providers in this order:

model parameter from the tool call (if the agent specifies one)
imageGenerationModel.primary from config
imageGenerationModel.fallbacks in order
Auto-detection — uses auth-backed provider defaults only:
- current default provider first
- remaining registered image-generation providers in provider-id order

If a provider fails (auth error, rate limit, etc.), the next candidate is tried automatically. If all fail, the error includes details from each attempt.

Notes:

Auto-detection is auth-aware. A provider default only enters the candidate list when OpenClaw can actually authenticate that provider.
Auto-detection is enabled by default. Set agents.defaults.mediaGenerationAutoProviderFallback: false if you want image generation to use only the explicit model, primary, and fallbacks entries.
Use action: "list" to inspect the currently registered providers, their default models, and auth env-var hints.

Image editing

OpenAI, Google, fal, MiniMax, and ComfyUI support editing reference images. Pass a reference image path or URL:

"Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"

OpenAI and Google support up to 5 reference images via the images parameter. fal, MiniMax, and ComfyUI support 1.

MiniMax image generation is available through both bundled MiniMax auth paths:

minimax/image-01 for API-key setups
minimax-portal/image-01 for OAuth setups

Provider capabilities

Capability	OpenAI	Google	fal	MiniMax	ComfyUI	Vydra
Generate	Yes (up to 4)	Yes (up to 4)	Yes (up to 4)	Yes (up to 9)	Yes (workflow-defined outputs)	Yes (1)
Edit/reference	Yes (up to 5 images)	Yes (up to 5 images)	Yes (1 image)	Yes (1 image, subject ref)	Yes (1 image, workflow-configured)	No
Size control	Yes	Yes	Yes	No	No	No
Aspect ratio	No	Yes	Yes (generate only)	Yes	No	No
Resolution (1K/2K/4K)	No	Yes	Yes	No	No	No

Tools Overview — all available agent tools
fal — fal image and video provider setup
ComfyUI — local ComfyUI and Comfy Cloud workflow setup
Google (Gemini) — Gemini image provider setup
MiniMax — MiniMax image provider setup
OpenAI — OpenAI Images provider setup
Vydra — Vydra image, video, and speech setup
Configuration Reference — imageGenerationModel config
Models — model configuration and failover

8.0 KiB Raw Blame History Unescape Escape