Sync Lipsync

Overview

Sync (by Synchronize Labs) applies lip synchronization to existing videos. Provide a video and an audio file, and the model will make the person in the video appear to speak the audio. Two quality tiers are available.

Model ID	Quality	Speed	Credits	~Cost
`sync-v2-pro`	Best	~60-90s	80	$0.80
`sync-v2`	Good	~45-60s	50	$0.50
`lipsync`	Basic	varies	50	$0.50

Quick start

import { createVarg } from "vargai/ai"

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! })

const result = await varg.videoModel("sync-v2-pro").generate({
  videoUrl: "https://example.com/talking-head.mp4",
  audioUrl: "https://example.com/speech.mp3",
})

console.log(result.video.url)

Parameters

files

array

required

Two files: one video and one audio. The gateway auto-detects file types by extension.

Sync models don’t use prompt, duration, or aspect_ratio parameters. The output matches the input video dimensions and the audio duration.

Full talking head pipeline

The typical workflow: generate character image, create video, generate speech, apply lipsync.

// 1. Generate character
const character = Image({
  model: varg.imageModel("soul"),
  prompt: "professional presenter, neutral expression",
  aspectRatio: "9:16",
})

// 2. Generate base video
const baseVideo = Video({
  model: varg.videoModel("kling-v3"),
  prompt: { text: "subtle head movement, blinking", images: [character] },
  duration: 10,
})

// 3. Generate speech
const speech = Speech({ model: varg.speechModel("eleven_v3"), text: "Welcome to our product demo..." })

// 4. Apply lipsync
const talkingHead = Video({
  model: varg.videoModel("sync-v2-pro"),
  prompt: { video: baseVideo, audio: speech },
})

Pricing

Model	Credits	USD
`sync-v2-pro`	80	$0.80
`sync-v2`	50	$0.50
`lipsync`	50	$0.50

Tips

Pro is recommended for production content. The quality difference is noticeable, especially around mouth movements.
Input video should have a clear face — front-facing, well-lit, with the face occupying a good portion of the frame.
Audio quality matters — clean speech audio produces much better lipsync results.
Combine with ElevenLabs for the full pipeline: TTS -> Sync V2 Pro.

VEED Fabric

Simpler pipeline — image + audio, no video needed.

OmniHuman

Full-body animation, not just lips.

ElevenLabs

Generate speech audio for lipsync.

​Overview

​Quick start

​Parameters

​Full talking head pipeline

​Pricing

​Tips

​Related models

VEED Fabric

OmniHuman

ElevenLabs

Overview

Quick start

Parameters

Full talking head pipeline

Pricing

Tips

Related models