Skip to main content
FlyMyAI logo

MuseTalk

Built-in lip-sync on your FlyMyAI account. Give it a reference avatar video and a speech audio track, and it renders a talking-head MP4 where the avatar's mouth follows the audio. This is a non-realtime batch step - it submits the job, waits for it to finish, and saves the result as an agent_file with a public_url you can feed into the next tool.

What it can do

MethodWhat it does
lip_syncLip-sync a reference avatar video (video_url) to speech audio (audio_url) and return a talking-head MP4 agent_file plus an output_url on object storage. Optional bbox_shift (mouth-region tuning, leave 0) and fps override.

Both video_url and audio_url must be public HTTPS URLs reachable by the service. The audio is typically the agent_file.public_url from a prior elevenlabs.text_to_speech call.

Typical pipeline

Lip-sync is usually the last step in an avatar pipeline:

  1. elevenlabs.text_to_speech - turn your script into speech audio, then take result.agent_file.public_url.
  2. musetalk.lip_sync - pass that audio as audio_url and your avatar base clip as video_url.

Call lip_sync on its own when you already have both URLs and only need the lip-sync.

How to get credentials

None - MuseTalk is a built-in tool. It runs on FlyMyAI infrastructure using your account. Just enable it.

Fields to fill in FlyMyAI

None.

Troubleshooting

  • Input URL not fetched - both video_url and audio_url must be direct public HTTPS links reachable by the service. Pre-upload via download_link or another agent tool if the source requires auth.
  • Job timed out - long videos take longer to render. Keep the avatar clip and audio short, or split the audio into segments.
  • Wrong tool for realtime - MuseTalk is a batch render. For a live realtime avatar endpoint, use flymyai_deploy instead.
Built with care by FlyMy.AI.