MuseTalk
Built-in lip-sync on your FlyMyAI account. Give it a reference avatar video
and a speech audio track, and it renders a talking-head MP4 where the avatar's
mouth follows the audio. This is a non-realtime batch step - it submits the job,
waits for it to finish, and saves the result as an agent_file with a
public_url you can feed into the next tool.
What it can do
| Method | What it does |
|---|---|
lip_sync | Lip-sync a reference avatar video (video_url) to speech audio (audio_url) and return a talking-head MP4 agent_file plus an output_url on object storage. Optional bbox_shift (mouth-region tuning, leave 0) and fps override. |
Both video_url and audio_url must be public HTTPS URLs reachable by the
service. The audio is typically the agent_file.public_url from a prior
elevenlabs.text_to_speech call.
Typical pipeline
Lip-sync is usually the last step in an avatar pipeline:
elevenlabs.text_to_speech- turn your script into speech audio, then takeresult.agent_file.public_url.musetalk.lip_sync- pass that audio asaudio_urland your avatar base clip asvideo_url.
Call lip_sync on its own when you already have both URLs and only need the
lip-sync.
How to get credentials
None - MuseTalk is a built-in tool. It runs on FlyMyAI infrastructure using your account. Just enable it.
Fields to fill in FlyMyAI
None.
Troubleshooting
- Input URL not fetched - both
video_urlandaudio_urlmust be direct public HTTPS links reachable by the service. Pre-upload viadownload_linkor another agent tool if the source requires auth. - Job timed out - long videos take longer to render. Keep the avatar clip and audio short, or split the audio into segments.
- Wrong tool for realtime - MuseTalk is a batch render. For a live realtime
avatar endpoint, use
flymyai_deployinstead.