From fbd6b3ce3cb4eecff4cc7eddfec4153d8f3e95f5 Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Sat, 25 Apr 2026 22:05:46 -0700 Subject: [PATCH] docs(tts): A-Z order providers and add tools/tts to Tools nav group - docs/tools/tts.md: alphabetize providers in three places that listed them: the supported-providers table (Azure Speech ... Xiaomi MiMo), the configuration Tabs (12 provider presets in A-Z), and the field reference AccordionGroup. Top-level fields stay first; provider tabs/accordions follow strict alphabetical order. Wording, schema, and defaults unchanged. - docs/docs.json: add tools/tts to the main Tools sidebar group (slotted between trajectory and video-generation, matching the alphabetical neighborhood with image-generation, music-generation, video-generation). Previously tts only appeared under Nodes > Media capabilities, which was a discoverability gap for readers looking for TTS alongside the other generation tools. --- docs/docs.json | 1 + docs/tools/tts.md | 326 +++++++++++++++++++++++----------------------- 2 files changed, 164 insertions(+), 163 deletions(-) diff --git a/docs/docs.json b/docs/docs.json index 9252d129954..7157f88bfd2 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -1238,6 +1238,7 @@ "tools/tokenjuice", "tools/loop-detection", "tools/trajectory", + "tools/tts", "tools/video-generation", { "group": "Web browser", diff --git a/docs/tools/tts.md b/docs/tools/tts.md index cb17494312d..0a0392a2f87 100644 --- a/docs/tools/tts.md +++ b/docs/tools/tts.md @@ -53,20 +53,20 @@ OpenClaw picks the first configured provider in registry auto-select order. | Provider | Auth | Notes | | ----------------- | ---------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- | -| **OpenAI** | `OPENAI_API_KEY` | Also used for auto-summary; supports persona `instructions`. | +| **Azure Speech** | `AZURE_SPEECH_KEY` + `AZURE_SPEECH_REGION` (also `AZURE_SPEECH_API_KEY`, `SPEECH_KEY`, `SPEECH_REGION`) | Native Ogg/Opus voice-note output and telephony. | | **ElevenLabs** | `ELEVENLABS_API_KEY` or `XI_API_KEY` | Voice cloning, multilingual, deterministic via `seed`. | | **Google Gemini** | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | Gemini API TTS; persona-aware via `promptTemplate: "audio-profile-v1"`. | -| **Azure Speech** | `AZURE_SPEECH_KEY` + `AZURE_SPEECH_REGION` (also `AZURE_SPEECH_API_KEY`, `SPEECH_KEY`, `SPEECH_REGION`) | Native Ogg/Opus voice-note output and telephony. | +| **Gradium** | `GRADIUM_API_KEY` | Voice-note and telephony output. | +| **Inworld** | `INWORLD_API_KEY` | Streaming TTS API. Native Opus voice-note and PCM telephony. | +| **Local CLI** | none | Runs a configured local TTS command. | | **Microsoft** | none | Public Edge neural TTS via `node-edge-tts`. Best-effort, no SLA. | | **MiniMax** | `MINIMAX_API_KEY` (or Token Plan: `MINIMAX_OAUTH_TOKEN`, `MINIMAX_CODE_PLAN_KEY`, `MINIMAX_CODING_API_KEY`) | T2A v2 API. Defaults to `speech-2.8-hd`. | -| **Inworld** | `INWORLD_API_KEY` | Streaming TTS API. Native Opus voice-note and PCM telephony. | -| **xAI** | `XAI_API_KEY` | xAI batch TTS. Native Opus voice-note is **not** supported. | -| **Volcengine** | `VOLCENGINE_TTS_API_KEY` or `BYTEPLUS_SEED_SPEECH_API_KEY` (legacy AppID/token: `VOLCENGINE_TTS_APPID`/`_TOKEN`) | BytePlus Seed Speech HTTP API. | -| **Xiaomi MiMo** | `XIAOMI_API_KEY` | MiMo TTS through Xiaomi chat completions. | +| **OpenAI** | `OPENAI_API_KEY` | Also used for auto-summary; supports persona `instructions`. | | **OpenRouter** | `OPENROUTER_API_KEY` (can reuse `models.providers.openrouter.apiKey`) | Default model `hexgrad/kokoro-82m`. | -| **Gradium** | `GRADIUM_API_KEY` | Voice-note and telephony output. | +| **Volcengine** | `VOLCENGINE_TTS_API_KEY` or `BYTEPLUS_SEED_SPEECH_API_KEY` (legacy AppID/token: `VOLCENGINE_TTS_APPID`/`_TOKEN`) | BytePlus Seed Speech HTTP API. | | **Vydra** | `VYDRA_API_KEY` | Shared image, video, and speech provider. | -| **Local CLI** | none | Runs a configured local TTS command. | +| **xAI** | `XAI_API_KEY` | xAI batch TTS. Native Opus voice-note is **not** supported. | +| **Xiaomi MiMo** | `XIAOMI_API_KEY` | MiMo TTS through Xiaomi chat completions. | If multiple providers are configured, the selected one is used first and the others are fallback options. Auto-summary uses `summaryModel` (or @@ -87,28 +87,21 @@ TTS config lives under `messages.tts` in `~/.openclaw/openclaw.json`. Pick a preset and adapt the provider block: - + ```json5 { messages: { tts: { auto: "always", - provider: "openai", - summaryModel: "openai/gpt-4.1-mini", - modelOverrides: { enabled: true }, + provider: "azure-speech", providers: { - openai: { - apiKey: "${OPENAI_API_KEY}", - model: "gpt-4o-mini-tts", - voice: "alloy", - }, - elevenlabs: { - apiKey: "${ELEVENLABS_API_KEY}", - model: "eleven_multilingual_v2", - voiceId: "EXAVITQu4vr4xnSDxMaL", - voiceSettings: { stability: 0.5, similarityBoost: 0.75, style: 0.0, useSpeakerBoost: true, speed: 1.0 }, - applyTextNormalization: "auto", - languageCode: "en", + "azure-speech": { + apiKey: "${AZURE_SPEECH_KEY}", + region: "eastus", + voice: "en-US-JennyNeural", + lang: "en-US", + outputFormat: "audio-24khz-48kbitrate-mono-mp3", + voiceNoteOutputFormat: "ogg-24khz-16bit-mono-opus", }, }, }, @@ -116,7 +109,7 @@ preset and adapt the provider block: } ``` - + ```json5 { messages: { @@ -157,21 +150,57 @@ preset and adapt the provider block: } ``` - + ```json5 { messages: { tts: { auto: "always", - provider: "azure-speech", + provider: "gradium", providers: { - "azure-speech": { - apiKey: "${AZURE_SPEECH_KEY}", - region: "eastus", - voice: "en-US-JennyNeural", - lang: "en-US", - outputFormat: "audio-24khz-48kbitrate-mono-mp3", - voiceNoteOutputFormat: "ogg-24khz-16bit-mono-opus", + gradium: { + apiKey: "${GRADIUM_API_KEY}", + voiceId: "YTpq7expH9539ERJ", + }, + }, + }, + }, +} +``` + + +```json5 +{ + messages: { + tts: { + auto: "always", + provider: "inworld", + providers: { + inworld: { + apiKey: "${INWORLD_API_KEY}", + modelId: "inworld-tts-1.5-max", + voiceId: "Sarah", + temperature: 0.7, + }, + }, + }, + }, +} +``` + + +```json5 +{ + messages: { + tts: { + auto: "always", + provider: "tts-local-cli", + providers: { + "tts-local-cli": { + command: "say", + args: ["-o", "{{OutputPath}}", "{{Text}}"], + outputFormat: "wav", + timeoutMs: 120000, }, }, }, @@ -223,78 +252,28 @@ preset and adapt the provider block: } ``` - + ```json5 { messages: { tts: { auto: "always", - provider: "inworld", + provider: "openai", + summaryModel: "openai/gpt-4.1-mini", + modelOverrides: { enabled: true }, providers: { - inworld: { - apiKey: "${INWORLD_API_KEY}", - modelId: "inworld-tts-1.5-max", - voiceId: "Sarah", - temperature: 0.7, + openai: { + apiKey: "${OPENAI_API_KEY}", + model: "gpt-4o-mini-tts", + voice: "alloy", }, - }, - }, - }, -} -``` - - -```json5 -{ - messages: { - tts: { - auto: "always", - provider: "xai", - providers: { - xai: { - apiKey: "${XAI_API_KEY}", - voiceId: "eve", - language: "en", - responseFormat: "mp3", - }, - }, - }, - }, -} -``` - - -```json5 -{ - messages: { - tts: { - auto: "always", - provider: "volcengine", - providers: { - volcengine: { - apiKey: "${VOLCENGINE_TTS_API_KEY}", - resourceId: "seed-tts-1.0", - voice: "en_female_anna_mars_bigtts", - }, - }, - }, - }, -} -``` - - -```json5 -{ - messages: { - tts: { - auto: "always", - provider: "xiaomi", - providers: { - xiaomi: { - apiKey: "${XIAOMI_API_KEY}", - model: "mimo-v2.5-tts", - voice: "mimo_default", - format: "mp3", + elevenlabs: { + apiKey: "${ELEVENLABS_API_KEY}", + model: "eleven_multilingual_v2", + voiceId: "EXAVITQu4vr4xnSDxMaL", + voiceSettings: { stability: 0.5, similarityBoost: 0.75, style: 0.0, useSpeakerBoost: true, speed: 1.0 }, + applyTextNormalization: "auto", + languageCode: "en", }, }, }, @@ -322,17 +301,18 @@ preset and adapt the provider block: } ``` - + ```json5 { messages: { tts: { auto: "always", - provider: "gradium", + provider: "volcengine", providers: { - gradium: { - apiKey: "${GRADIUM_API_KEY}", - voiceId: "YTpq7expH9539ERJ", + volcengine: { + apiKey: "${VOLCENGINE_TTS_API_KEY}", + resourceId: "seed-tts-1.0", + voice: "en_female_anna_mars_bigtts", }, }, }, @@ -340,19 +320,39 @@ preset and adapt the provider block: } ``` - + ```json5 { messages: { tts: { auto: "always", - provider: "tts-local-cli", + provider: "xai", providers: { - "tts-local-cli": { - command: "say", - args: ["-o", "{{OutputPath}}", "{{Text}}"], - outputFormat: "wav", - timeoutMs: 120000, + xai: { + apiKey: "${XAI_API_KEY}", + voiceId: "eve", + language: "en", + responseFormat: "mp3", + }, + }, + }, + }, +} +``` + + +```json5 +{ + messages: { + tts: { + auto: "always", + provider: "xiaomi", + providers: { + xiaomi: { + apiKey: "${XIAOMI_API_KEY}", + model: "mimo-v2.5-tts", + voice: "mimo_default", + format: "mp3", }, }, }, @@ -735,14 +735,14 @@ OpenAI and ElevenLabs output formats are fixed per channel as listed above. - - Falls back to `OPENAI_API_KEY`. - OpenAI TTS model id (e.g. `gpt-4o-mini-tts`). - Voice name (e.g. `alloy`, `cedar`). - Explicit OpenAI `instructions` field. When set, persona prompt fields are **not** auto-mapped. - - Override the OpenAI TTS endpoint. Resolution order: config → `OPENAI_TTS_BASE_URL` → `https://api.openai.com/v1`. Non-default values are treated as OpenAI-compatible TTS endpoints, so custom model and voice names are accepted. - + + Env: `AZURE_SPEECH_KEY`, `AZURE_SPEECH_API_KEY`, or `SPEECH_KEY`. + Azure Speech region (e.g. `eastus`). Env: `AZURE_SPEECH_REGION` or `SPEECH_REGION`. + Optional Azure Speech endpoint override (alias `baseUrl`). + Azure voice ShortName. Default `en-US-JennyNeural`. + SSML language code. Default `en-US`. + Azure `X-Microsoft-OutputFormat` for standard audio. Default `audio-24khz-48kbitrate-mono-mp3`. + Azure `X-Microsoft-OutputFormat` for voice-note output. Default `ogg-24khz-16bit-mono-opus`. @@ -769,14 +769,27 @@ OpenAI and ElevenLabs output formats are fixed per channel as listed above. Only `https://generativelanguage.googleapis.com` is accepted. - - Env: `AZURE_SPEECH_KEY`, `AZURE_SPEECH_API_KEY`, or `SPEECH_KEY`. - Azure Speech region (e.g. `eastus`). Env: `AZURE_SPEECH_REGION` or `SPEECH_REGION`. - Optional Azure Speech endpoint override (alias `baseUrl`). - Azure voice ShortName. Default `en-US-JennyNeural`. - SSML language code. Default `en-US`. - Azure `X-Microsoft-OutputFormat` for standard audio. Default `audio-24khz-48kbitrate-mono-mp3`. - Azure `X-Microsoft-OutputFormat` for voice-note output. Default `ogg-24khz-16bit-mono-opus`. + + Env: `GRADIUM_API_KEY`. + Default `https://api.gradium.ai`. + Default Emma (`YTpq7expH9539ERJ`). + + + + Env: `INWORLD_API_KEY`. + Default `https://api.inworld.ai`. + Default `inworld-tts-1.5-max`. Also: `inworld-tts-1.5-mini`, `inworld-tts-1-max`, `inworld-tts-1`. + Default `Sarah`. + Sampling temperature `0..2`. + + + + Local executable or command string for CLI TTS. + Command arguments. Supports `{{Text}}`, `{{OutputPath}}`, `{{OutputDir}}`, `{{OutputBase}}` placeholders. + Expected CLI output format. Default `mp3` for audio attachments. + Command timeout in milliseconds. Default `120000`. + Optional command working directory. + Optional environment overrides for the command. @@ -801,20 +814,22 @@ OpenAI and ElevenLabs output formats are fixed per channel as listed above. Integer `-12..12`. Default `0`. Fractional values are truncated before the request. - - Env: `INWORLD_API_KEY`. - Default `https://api.inworld.ai`. - Default `inworld-tts-1.5-max`. Also: `inworld-tts-1.5-mini`, `inworld-tts-1-max`, `inworld-tts-1`. - Default `Sarah`. - Sampling temperature `0..2`. + + Falls back to `OPENAI_API_KEY`. + OpenAI TTS model id (e.g. `gpt-4o-mini-tts`). + Voice name (e.g. `alloy`, `cedar`). + Explicit OpenAI `instructions` field. When set, persona prompt fields are **not** auto-mapped. + + Override the OpenAI TTS endpoint. Resolution order: config → `OPENAI_TTS_BASE_URL` → `https://api.openai.com/v1`. Non-default values are treated as OpenAI-compatible TTS endpoints, so custom model and voice names are accepted. + - - Env: `XAI_API_KEY`. - Default `https://api.x.ai/v1`. Env: `XAI_BASE_URL`. - Default `eve`. Live voices: `ara`, `eve`, `leo`, `rex`, `sal`, `una`. - BCP-47 language code or `auto`. Default `en`. - Default `mp3`. + + Env: `OPENROUTER_API_KEY`. Can reuse `models.providers.openrouter.apiKey`. + Default `https://openrouter.ai/api/v1`. Legacy `https://openrouter.ai/v1` is normalized. + Default `hexgrad/kokoro-82m`. Alias: `modelId`. + Default `af_alloy`. Alias: `voiceId`. + Default `mp3`. Provider-native speed override. @@ -829,6 +844,15 @@ OpenAI and ElevenLabs output formats are fixed per channel as listed above. Legacy Volcengine Speech Console fields. Env: `VOLCENGINE_TTS_APPID`, `VOLCENGINE_TTS_TOKEN`, `VOLCENGINE_TTS_CLUSTER` (default `volcano_tts`). + + Env: `XAI_API_KEY`. + Default `https://api.x.ai/v1`. Env: `XAI_BASE_URL`. + Default `eve`. Live voices: `ara`, `eve`, `leo`, `rex`, `sal`, `una`. + BCP-47 language code or `auto`. Default `en`. + Default `mp3`. + Provider-native speed override. + + Env: `XIAOMI_API_KEY`. Default `https://api.xiaomimimo.com/v1`. Env: `XIAOMI_BASE_URL`. @@ -837,30 +861,6 @@ OpenAI and ElevenLabs output formats are fixed per channel as listed above. Default `mp3`. Env: `XIAOMI_TTS_FORMAT`. Optional natural-language style instruction sent as the user message; not spoken. - - - Env: `OPENROUTER_API_KEY`. Can reuse `models.providers.openrouter.apiKey`. - Default `https://openrouter.ai/api/v1`. Legacy `https://openrouter.ai/v1` is normalized. - Default `hexgrad/kokoro-82m`. Alias: `modelId`. - Default `af_alloy`. Alias: `voiceId`. - Default `mp3`. - Provider-native speed override. - - - - Env: `GRADIUM_API_KEY`. - Default `https://api.gradium.ai`. - Default Emma (`YTpq7expH9539ERJ`). - - - - Local executable or command string for CLI TTS. - Command arguments. Supports `{{Text}}`, `{{OutputPath}}`, `{{OutputDir}}`, `{{OutputBase}}` placeholders. - Expected CLI output format. Default `mp3` for audio attachments. - Command timeout in milliseconds. Default `120000`. - Optional command working directory. - Optional environment overrides for the command. - ## Agent tool