Skip to main content
Create professional talking character videos with AI-generated characters, voice synthesis, lip synchronization, and auto-generated captions.

Simple Version

Basic talking character without lipsync:
/** @jsxImportSource vargai */
import { Render, Clip, Image, Video, Speech, Captions, Music } from "vargai/react"
import { createVarg } from "vargai/ai"

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! })

const character = Image({
  model: varg.imageModel("nano-banana-pro"),
  prompt: "friendly tech influencer, casual hoodie, ring light, bedroom background",
  aspectRatio: "9:16",
})

const voiceover = Speech({
  model: varg.speechModel("eleven_v3"),
  voice: "josh",
  children: "Hey everyone! Welcome back to my channel. Today I want to share something that completely changed how I approach productivity.",
})

export default (
  <Render width={1080} height={1920}>
    <Music prompt="upbeat tech podcast" model={varg.musicModel()} volume={0.1} duration={10} />
    <Clip duration={5}>
      <Video
        prompt={{ text: "person talking naturally, subtle head movements, gestures", images: [character] }}
        model={varg.videoModel("kling-v3")}
      />
    </Clip>
    <Captions src={voiceover} style="tiktok" color="#ffffff" withAudio />
  </Render>
)
Cost: ~$2.00 (image + video + speech + music)

With Lipsync

Full pipeline: generate image, animate, then sync lips to audio:
/** @jsxImportSource vargai */
import { Render, Clip, Image, Video, Speech, Captions, Music } from "vargai/react"
import { createVarg } from "vargai/ai"

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! })

// 1. Generate character
const character = Image({
  model: varg.imageModel("nano-banana-pro"),
  prompt: "friendly tech influencer, casual hoodie, ring light, close-up face, looking at camera",
  aspectRatio: "9:16",
})

// 2. Generate voiceover
const voiceover = Speech({
  model: varg.speechModel("eleven_v3"),
  voice: "rachel",
  children: "Hey everyone! Today I want to show you something amazing that will change your workflow.",
})

// 3. Animate character
const animated = Video({
  model: varg.videoModel("kling-v3"),
  prompt: { text: "person speaking naturally, subtle head movements, occasional nodding", images: [character] },
  duration: 5,
})

// 4. Sync lips to audio
const lipsynced = Video({
  model: varg.videoModel("sync-v2-pro"),
  prompt: { video: animated, audio: voiceover },
})

export default (
  <Render width={1080} height={1920}>
    <Music prompt="upbeat tech podcast intro" model={varg.musicModel()} volume={0.1} duration={8} />
    <Clip duration={5}>{lipsynced}</Clip>
    <Captions src={voiceover} style="tiktok" color="#ffffff" withAudio />
  </Render>
)
Cost: ~$3.20 (image + animation + lipsync + speech + music)

Voice Options

VoiceGenderStyleBest For
rachelFemaleCalm, warmNarration, tutorials
bellaFemaleSoft, gentleStorytelling
domiFemaleConfidentPresentations
elliFemaleYoung, cheerfulSocial media, UGC
adamMaleDeep, warmNarration
joshMaleYoung, energeticSocial media, UGC
samMaleRaspyCharacter voices
antoniMaleCalmPodcasts
arnoldMaleAuthoritativeAnnouncements

Caption Styles

<Captions src={voiceover} style="tiktok" withAudio />    // Word-by-word highlight
<Captions src={voiceover} style="karaoke" withAudio />   // Fill left-to-right
<Captions src={voiceover} style="bounce" withAudio />    // Words bounce in
<Captions src={voiceover} style="typewriter" withAudio /> // Typing effect

Multi-Scene Talking Video

/** @jsxImportSource vargai */
import { Render, Clip, Image, Video, Speech, Captions, Music } from "vargai/react"
import { createVarg } from "vargai/ai"

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! })

const character = Image({
  model: varg.imageModel("nano-banana-pro"),
  prompt: "friendly young woman, casual outfit, natural lighting, looking at camera",
  aspectRatio: "9:16",
})

const SCENES = [
  { text: "Hey guys! Today I'm reviewing the best coffee makers under 100 dollars.", motion: "person smiling, waving at camera" },
  { text: "Number one is the AeroPress. It's portable, easy to clean, and makes amazing coffee.", motion: "person talking enthusiastically, gesturing" },
  { text: "That's it for today! Don't forget to like and subscribe.", motion: "person waving goodbye, big smile" },
]

const voiceovers = SCENES.map(scene =>
  Speech({
    model: varg.speechModel("eleven_v3"),
    voice: "elli",
    children: scene.text,
  })
)

export default (
  <Render width={1080} height={1920}>
    <Music prompt="upbeat pop, cheerful, catchy" model={varg.musicModel()} volume={0.1} duration={20} />
    {SCENES.map((scene, i) => (
      <Clip key={i} duration={5} transition={{ name: "fade", duration: 0.3 }}>
        <Video
          prompt={{ text: scene.motion, images: [character] }}
          model={varg.videoModel("kling-v3")}
        />
      </Clip>
    ))}
    {voiceovers.map((vo, i) => (
      <Captions key={i} src={vo} style="tiktok" color="#ffffff" withAudio />
    ))}
  </Render>
)

Tips

  • Character: Use close-up, front-facing images for best lipsync results
  • Voiceover: Keep individual clips to 5-10 seconds for best quality
  • Lipsync: sync-v2-pro gives best results but costs more. sync-v2 is faster and cheaper.
  • Animation: Include “subtle head movements” or “natural gestures” in video prompts
  • Captions: style="tiktok" is the most popular for social media content
  • Music volume: Keep at 0.1-0.15 when there’s voiceover