Skip to main content

Whisper

OpenAI Whisper speech-to-text, built in on your FlyMyAI account. Give it a public HTTPS audio URL and it returns recognized text you can feed into the next step - an LLM reply, TTS, or an avatar pipeline. This is the preferred STT tool for voice-note and avatar workflows.

What it can do

MethodWhat it does
transcribe_audioDownload a public HTTPS audio file (mp3, wav, m4a, webm, etc.) and return recognized speech as plain text. Set task=transcribe to keep the original language or task=translate to get an English translation. Optional language (ISO-639-1 code, omit for auto-detection) and prompt (style/vocabulary hint for names and product terms). Returns the text plus the detected language and audio duration when available.

Pass audio_url as a public HTTPS link, such as an agent_file.public_url from an uploaded voice message or a prior tool. A typical chain is: user voice -> agent_file.public_url -> transcribe_audio -> compose a text reply -> (optional) elevenlabs.text_to_speech -> (optional) musetalk.lip_sync.

How to get credentials

None - Whisper is a built-in tool. It runs on FlyMyAI infrastructure using your account, with no API key to configure. Just enable it.

Fields to fill in FlyMyAI

None.

Troubleshooting

  • Input URL not fetched - audio_url must be a direct public HTTPS link. Pre-upload via download_link or another agent tool if the source requires auth.
  • Wrong language detected - pass an explicit language code (e.g. en, ru) instead of relying on auto-detection.
  • English output when you wanted the original language - use task=transcribe; task=translate always returns English.