Whisper
OpenAI Whisper speech-to-text, built in on your FlyMyAI account. Give it a public HTTPS audio URL and it returns recognized text you can feed into the next step - an LLM reply, TTS, or an avatar pipeline. This is the preferred STT tool for voice-note and avatar workflows.
What it can do
| Method | What it does |
|---|---|
transcribe_audio | Download a public HTTPS audio file (mp3, wav, m4a, webm, etc.) and return recognized speech as plain text. Set task=transcribe to keep the original language or task=translate to get an English translation. Optional language (ISO-639-1 code, omit for auto-detection) and prompt (style/vocabulary hint for names and product terms). Returns the text plus the detected language and audio duration when available. |
Pass audio_url as a public HTTPS link, such as an agent_file.public_url
from an uploaded voice message or a prior tool. A typical chain is: user
voice -> agent_file.public_url -> transcribe_audio -> compose a text
reply -> (optional) elevenlabs.text_to_speech -> (optional)
musetalk.lip_sync.
How to get credentials
None - Whisper is a built-in tool. It runs on FlyMyAI infrastructure using your account, with no API key to configure. Just enable it.
Fields to fill in FlyMyAI
None.
Troubleshooting
- Input URL not fetched -
audio_urlmust be a direct public HTTPS link. Pre-upload viadownload_linkor another agent tool if the source requires auth. - Wrong language detected - pass an explicit
languagecode (e.g.en,ru) instead of relying on auto-detection. - English output when you wanted the original language - use
task=transcribe;task=translatealways returns English.