The image-generation page was 395 lines with a 3-step quick-start
written as plain numbered prose, a sprawling 'OpenAI gpt-image-2'
section that mixed routing/legacy/OpenAI options with five inline
slash-command examples, and provider tables that mixed alphabetic
and recency order.
Restructure for scan-first reading without losing technical content:
- Wrap Quick start in a Steps component (auth -> default model ->
ask the agent), pulling the Codex OAuth note inline with the model
step where it belongs and surfacing the LAN/SSRF caveat as a
Warning callout.
- Alphabetize the Supported providers table (ComfyUI, fal, Google,
LiteLLM, MiniMax, OpenAI, OpenRouter, Vydra, xAI) and the Provider
capabilities table (same order across both). Convert the Yes/No
capability table to checkmarks plus exact counts for readability.
- Replace the long inline OpenAI / OpenRouter / MiniMax / xAI prose
with a 'Provider deep dives' AccordionGroup so each backend's
routing, legacy URL handling, and provider-specific knobs collapse
by default.
- Move the four provider-selection-order notes into a small
AccordionGroup ('Per-call overrides are exact', 'Auto-detection is
auth-aware', 'Timeouts', 'Inspect at runtime').
- Collapse the five flat slash-command examples into a single Tabs
component (4K landscape / transparent PNG / two-square /
edit-one-ref / edit-multi-ref) with the matching CLI variant inline
on the transparent-PNG tab.
- Sentence-case the Related list (Tools overview, Configuration
reference) and drop the redundant generic introductory wording.
- Add sidebarTitle so the nav reads 'Image generation' explicitly.
Wording, schema fields, defaults, model refs, env vars, and the
detailed OpenAI/OpenRouter/Codex routing rules are unchanged.
18 KiB
summary, read_when, title, sidebarTitle
| summary | read_when | title | sidebarTitle | |||
|---|---|---|---|---|---|---|
| Generate and edit images via image_generate across OpenAI, Google, fal, MiniMax, ComfyUI, OpenRouter, LiteLLM, xAI, Vydra |
|
Image generation | Image generation |
The image_generate tool lets the agent create and edit images using your
configured providers. Generated images are delivered automatically as media
attachments in the agent's reply.
Quick start
Set an API key for at least one provider (for example `OPENAI_API_KEY`, `GEMINI_API_KEY`, `OPENROUTER_API_KEY`) or sign in with OpenAI Codex OAuth. ```json5 { agents: { defaults: { imageGenerationModel: { primary: "openai/gpt-image-2", timeoutMs: 180_000, }, }, }, } ```Codex OAuth uses the same `openai/gpt-image-2` model ref. When an
`openai-codex` OAuth profile is configured, OpenClaw routes image
requests through that OAuth profile instead of first trying
`OPENAI_API_KEY`. Explicit `models.providers.openai` config (API key,
custom/Azure base URL) opts back into the direct OpenAI Images API
route.
_"Generate an image of a friendly robot mascot."_
The agent calls `image_generate` automatically. No tool allow-listing
needed — it is enabled by default when a provider is available.
For OpenAI-compatible LAN endpoints such as LocalAI, keep the custom
`models.providers.openai.baseUrl` and explicitly opt in with
`browser.ssrfPolicy.dangerouslyAllowPrivateNetwork: true`. Private and
internal image endpoints remain blocked by default.
Common routes
| Goal | Model ref | Auth |
|---|---|---|
| OpenAI image generation with API billing | openai/gpt-image-2 |
OPENAI_API_KEY |
| OpenAI image generation with Codex subscription auth | openai/gpt-image-2 |
OpenAI Codex OAuth |
| OpenAI transparent-background PNG/WebP | openai/gpt-image-1.5 |
OPENAI_API_KEY or OpenAI Codex OAuth |
| OpenRouter image generation | openrouter/google/gemini-3.1-flash-image-preview |
OPENROUTER_API_KEY |
| LiteLLM image generation | litellm/gpt-image-2 |
LITELLM_API_KEY |
| Google Gemini image generation | google/gemini-3.1-flash-image-preview |
GEMINI_API_KEY or GOOGLE_API_KEY |
The same image_generate tool handles text-to-image and reference-image
editing. Use image for one reference or images for multiple references.
Provider-supported output hints such as quality, outputFormat, and
background are forwarded when available and reported as ignored when a
provider does not support them. Bundled transparent-background support is
OpenAI-specific; other providers may still preserve PNG alpha if their
backend emits it.
Supported providers
| Provider | Default model | Edit support | Auth |
|---|---|---|---|
| ComfyUI | workflow |
Yes (1 image, workflow-configured) | COMFY_API_KEY or COMFY_CLOUD_API_KEY for cloud |
| fal | fal-ai/flux/dev |
Yes | FAL_KEY |
gemini-3.1-flash-image-preview |
Yes | GEMINI_API_KEY or GOOGLE_API_KEY |
|
| LiteLLM | gpt-image-2 |
Yes (up to 5 input images) | LITELLM_API_KEY |
| MiniMax | image-01 |
Yes (subject reference) | MINIMAX_API_KEY or MiniMax OAuth (minimax-portal) |
| OpenAI | gpt-image-2 |
Yes (up to 4 images) | OPENAI_API_KEY or OpenAI Codex OAuth |
| OpenRouter | google/gemini-3.1-flash-image-preview |
Yes (up to 5 input images) | OPENROUTER_API_KEY |
| Vydra | grok-imagine |
No | VYDRA_API_KEY |
| xAI | grok-imagine-image |
Yes (up to 5 images) | XAI_API_KEY |
Use action: "list" to inspect available providers and models at runtime:
/tool image_generate action=list
Provider capabilities
| Capability | ComfyUI | fal | MiniMax | OpenAI | Vydra | xAI | |
|---|---|---|---|---|---|---|---|
| Generate (max count) | Workflow-defined | 4 | 4 | 9 | 4 | 1 | 4 |
| Edit / reference | 1 image (workflow) | 1 image | Up to 5 images | 1 image (subject ref) | Up to 5 images | — | Up to 5 images |
| Size control | — | ✓ | ✓ | — | Up to 4K | — | — |
| Aspect ratio | — | ✓ (generate only) | ✓ | ✓ | — | — | ✓ |
| Resolution (1K/2K/4K) | — | ✓ | ✓ | — | — | — | 1K, 2K |
Tool parameters
Image generation prompt. Required for `action: "generate"`. Use `"list"` to inspect available providers and models at runtime. Provider/model override (e.g. `openai/gpt-image-2`). Use `openai/gpt-image-1.5` for transparent OpenAI backgrounds. Single reference image path or URL for edit mode. Multiple reference images for edit mode (up to 5 on supporting providers). Size hint: `1024x1024`, `1536x1024`, `1024x1536`, `2048x2048`, `3840x2160`. Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`. Resolution hint. Quality hint when the provider supports it. Output format hint when the provider supports it. Background hint when the provider supports it. Use `transparent` with `outputFormat: "png"` or `"webp"` for transparency-capable providers. Number of images to generate (1–4). Optional provider request timeout in milliseconds. Output filename hint. OpenAI-only hints: `background`, `moderation`, `outputCompression`, and `user`. Not all providers support all parameters. When a fallback provider supports a nearby geometry option instead of the exact requested one, OpenClaw remaps to the closest supported size, aspect ratio, or resolution before submission. Unsupported output hints are dropped for providers that do not declare support and reported in the tool result. Tool results report the applied settings; `details.normalization` captures any requested-to-applied translation.Configuration
Model selection
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openai/gpt-image-2",
timeoutMs: 180_000,
fallbacks: [
"openrouter/google/gemini-3.1-flash-image-preview",
"google/gemini-3.1-flash-image-preview",
"fal/fal-ai/flux/dev",
],
},
},
},
}
Provider selection order
OpenClaw tries providers in this order:
modelparameter from the tool call (if the agent specifies one).imageGenerationModel.primaryfrom config.imageGenerationModel.fallbacksin order.- Auto-detection — auth-backed provider defaults only:
- current default provider first;
- remaining registered image-generation providers in provider-id order.
If a provider fails (auth error, rate limit, etc.), the next configured candidate is tried automatically. If all fail, the error includes details from each attempt.
A per-call `model` override tries only that provider/model and does not continue to configured primary/fallback or auto-detected providers. A provider default only enters the candidate list when OpenClaw can actually authenticate that provider. Set `agents.defaults.mediaGenerationAutoProviderFallback: false` to use only explicit `model`, `primary`, and `fallbacks` entries. Set `agents.defaults.imageGenerationModel.timeoutMs` for slow image backends. A per-call `timeoutMs` tool parameter overrides the configured default. Use `action: "list"` to inspect the currently registered providers, their default models, and auth env-var hints.Image editing
OpenAI, OpenRouter, Google, fal, MiniMax, ComfyUI, and xAI support editing reference images. Pass a reference image path or URL:
"Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"
OpenAI, OpenRouter, Google, and xAI support up to 5 reference images via the
images parameter. fal, MiniMax, and ComfyUI support 1.
Provider deep dives
OpenAI image generation defaults to `openai/gpt-image-2`. If an `openai-codex` OAuth profile is configured, OpenClaw reuses the same OAuth profile used by Codex subscription chat models and sends the image request through the Codex Responses backend. Legacy Codex base URLs such as `https://chatgpt.com/backend-api` are canonicalized to `https://chatgpt.com/backend-api/codex` for image requests. OpenClaw does **not** silently fall back to `OPENAI_API_KEY` for that request — to force direct OpenAI Images API routing, configure `models.providers.openai` explicitly with an API key, custom base URL, or Azure endpoint.The `openai/gpt-image-1.5`, `openai/gpt-image-1`, and
`openai/gpt-image-1-mini` models can still be selected explicitly. Use
`gpt-image-1.5` for transparent-background PNG/WebP output; the current
`gpt-image-2` API rejects `background: "transparent"`.
`gpt-image-2` supports both text-to-image generation and
reference-image editing through the same `image_generate` tool.
OpenClaw forwards `prompt`, `count`, `size`, `quality`, `outputFormat`,
and reference images to OpenAI. OpenAI does **not** receive
`aspectRatio` or `resolution` directly; when possible OpenClaw maps
those into a supported `size`, otherwise the tool reports them as
ignored overrides.
OpenAI-specific options live under the `openai` object:
```json
{
"quality": "low",
"outputFormat": "jpeg",
"openai": {
"background": "opaque",
"moderation": "low",
"outputCompression": 60,
"user": "end-user-42"
}
}
```
`openai.background` accepts `transparent`, `opaque`, or `auto`;
transparent outputs require `outputFormat` `png` or `webp` and a
transparency-capable OpenAI image model. OpenClaw routes default
`gpt-image-2` transparent-background requests to `gpt-image-1.5`.
`openai.outputCompression` applies to JPEG/WebP outputs.
The top-level `background` hint is provider-neutral and currently maps
to the same OpenAI `background` request field when the OpenAI provider
is selected. Providers that do not declare background support return
it in `ignoredOverrides` instead of receiving the unsupported parameter.
To route OpenAI image generation through an Azure OpenAI deployment
instead of `api.openai.com`, see
[Azure OpenAI endpoints](/providers/openai#azure-openai-endpoints).
OpenRouter image generation uses the same `OPENROUTER_API_KEY` and
routes through OpenRouter's chat completions image API. Select
OpenRouter image models with the `openrouter/` prefix:
```json5
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openrouter/google/gemini-3.1-flash-image-preview",
},
},
},
}
```
OpenClaw forwards `prompt`, `count`, reference images, and
Gemini-compatible `aspectRatio` / `resolution` hints to OpenRouter.
Current built-in OpenRouter image model shortcuts include
`google/gemini-3.1-flash-image-preview`,
`google/gemini-3-pro-image-preview`, and `openai/gpt-5.4-image-2`. Use
`action: "list"` to see what your configured plugin exposes.
MiniMax image generation is available through both bundled MiniMax
auth paths:
- `minimax/image-01` for API-key setups
- `minimax-portal/image-01` for OAuth setups
The bundled xAI provider uses `/v1/images/generations` for prompt-only
requests and `/v1/images/edits` when `image` or `images` is present.
- Models: `xai/grok-imagine-image`, `xai/grok-imagine-image-pro`
- Count: up to 4
- References: one `image` or up to five `images`
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2`
- Resolutions: `1K`, `2K`
- Outputs: returned as OpenClaw-managed image attachments
OpenClaw intentionally does not expose xAI-native `quality`, `mask`,
`user`, or extra native-only aspect ratios until those controls exist
in the shared cross-provider `image_generate` contract.
Examples
```text /tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1 ``` ```text /tool image_generate action=generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png background=transparent ```Equivalent CLI:
openclaw infer image generate \
--model openai/gpt-image-1.5 \
--output-format png \
--background transparent \
--prompt "A simple red circle sticker on a transparent background" \
--json
The same --output-format and --background flags are available on
openclaw infer image edit; --openai-background remains as an
OpenAI-specific alias. Bundled providers other than OpenAI do not declare
explicit background control today, so background: "transparent" is reported
as ignored for them.
Related
- Tools overview — all available agent tools
- ComfyUI — local ComfyUI and Comfy Cloud workflow setup
- fal — fal image and video provider setup
- Google (Gemini) — Gemini image provider setup
- MiniMax — MiniMax image provider setup
- OpenAI — OpenAI Images provider setup
- Vydra — Vydra image, video, and speech setup
- xAI — Grok image, video, search, code execution, and TTS setup
- Configuration reference —
imageGenerationModelconfig - Models — model configuration and failover