Skip to main content

Overview

Sync (by Synchronize Labs) applies lip synchronization to existing videos. Provide a video and an audio file, and the model will make the person in the video appear to speak the audio. Two quality tiers are available.
Model IDQualitySpeedCredits~Cost
sync-v2-proBest~60-90s80$0.80
sync-v2Good~45-60s50$0.50
lipsyncBasicvaries50$0.50

Quick start

import { createVarg } from "vargai/ai"

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! })

const result = await varg.videoModel("sync-v2-pro").generate({
  videoUrl: "https://example.com/talking-head.mp4",
  audioUrl: "https://example.com/speech.mp3",
})

console.log(result.video.url)

Parameters

files
array
required
Two files: one video and one audio. The gateway auto-detects file types by extension.
Sync models don’t use prompt, duration, or aspect_ratio parameters. The output matches the input video dimensions and the audio duration.

Full talking head pipeline

The typical workflow: generate character image, create video, generate speech, apply lipsync.
// 1. Generate character
const character = Image({
  model: varg.imageModel("soul"),
  prompt: "professional presenter, neutral expression",
  aspectRatio: "9:16",
})

// 2. Generate base video
const baseVideo = Video({
  model: varg.videoModel("kling-v3"),
  prompt: { text: "subtle head movement, blinking", images: [character] },
  duration: 10,
})

// 3. Generate speech
const speech = Speech({ model: varg.speechModel("eleven_v3"), text: "Welcome to our product demo..." })

// 4. Apply lipsync
const talkingHead = Video({
  model: varg.videoModel("sync-v2-pro"),
  prompt: { video: baseVideo, audio: speech },
})

Pricing

ModelCreditsUSD
sync-v2-pro80$0.80
sync-v250$0.50
lipsync50$0.50

Tips

  • Pro is recommended for production content. The quality difference is noticeable, especially around mouth movements.
  • Input video should have a clear face — front-facing, well-lit, with the face occupying a good portion of the frame.
  • Audio quality matters — clean speech audio produces much better lipsync results.
  • Combine with ElevenLabs for the full pipeline: TTS -> Sync V2 Pro.

VEED Fabric

Simpler pipeline — image + audio, no video needed.

OmniHuman

Full-body animation, not just lips.

ElevenLabs

Generate speech audio for lipsync.