Overview
Sync (by Synchronize Labs) applies lip synchronization to existing videos. Provide a video and an audio file, and the model will make the person in the video appear to speak the audio. Two quality tiers are available.| Model ID | Quality | Speed | Credits | ~Cost |
|---|---|---|---|---|
sync-v2-pro | Best | ~60-90s | 80 | $0.80 |
sync-v2 | Good | ~45-60s | 50 | $0.50 |
lipsync | Basic | varies | 50 | $0.50 |
Quick start
Parameters
Two files: one video and one audio. The gateway auto-detects file types by extension.
Sync models don’t use prompt, duration, or aspect_ratio parameters. The output matches the input video dimensions and the audio duration.
Full talking head pipeline
The typical workflow: generate character image, create video, generate speech, apply lipsync.Pricing
| Model | Credits | USD |
|---|---|---|
sync-v2-pro | 80 | $0.80 |
sync-v2 | 50 | $0.50 |
lipsync | 50 | $0.50 |
Tips
- Pro is recommended for production content. The quality difference is noticeable, especially around mouth movements.
- Input video should have a clear face — front-facing, well-lit, with the face occupying a good portion of the frame.
- Audio quality matters — clean speech audio produces much better lipsync results.
- Combine with ElevenLabs for the full pipeline: TTS -> Sync V2 Pro.
Related models
VEED Fabric
Simpler pipeline — image + audio, no video needed.
OmniHuman
Full-body animation, not just lips.
ElevenLabs
Generate speech audio for lipsync.