Skip to main content

Overview

Transcribe audio to text using OpenAI’s Whisper model, available through two providers with different speed/cost profiles.
Model IDProviderSpeedCredits~Cost
whisperfalStandard10$0.10
whisper-large-v3falStandard10$0.10
groq-whisperGroqFast5$0.05
groq-whisper-large-v3GroqFast5$0.05
groq-whisper-large-v3-turboGroqFastest3$0.03

Quick start

import { createVarg } from "vargai/ai"

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! })

const result = await varg.transcriptionModel("whisper").generate({
  file: "https://example.com/audio.mp3",
})

console.log(result.text)

Parameters

file
string
required
URL or local path to the audio file. Supports mp3, wav, m4a, ogg, flac, webm.
language
string
Language code (e.g., "en", "es", "fr"). Auto-detected if not specified.
prompt
string
Optional context to guide transcription. Useful for domain-specific terms or names.
temperature
number
default:"0"
Sampling temperature. 0 = deterministic, higher = more creative (not usually needed for transcription).

Choosing a model

ScenarioRecommendedWhy
Cheapestgroq-whisper-large-v3-turbo (3 credits)70% cheaper than fal Whisper
Best qualitywhisper-large-v3 (10 credits)Full Large V3 model on fal
Fast + cheapgroq-whisper (5 credits)Good balance
Defaultwhisper (10 credits)Reliable, well-tested

Use with captions

Transcription is commonly used to generate captions for videos:
const narration = Speech({
  model: varg.speechModel("eleven_v3"),
  text: "Welcome to the future of video creation.",
})

<Clip duration={5}>
  <Video model={varg.videoModel("kling-v3")} prompt="futuristic city" duration={5} />
  {narration}
  <Captions source={narration} />
</Clip>

Pricing

ModelCreditsUSD
whisper / whisper-large-v310$0.10
groq-whisper / groq-whisper-large-v35$0.05
groq-whisper-large-v3-turbo3$0.03

ElevenLabs Speech

Generate speech from text (the reverse operation).