Overview
Transcribe audio to text using OpenAI’s Whisper model, available through two providers with different speed/cost profiles.| Model ID | Provider | Speed | Credits | ~Cost |
|---|---|---|---|---|
whisper | fal | Standard | 10 | $0.10 |
whisper-large-v3 | fal | Standard | 10 | $0.10 |
groq-whisper | Groq | Fast | 5 | $0.05 |
groq-whisper-large-v3 | Groq | Fast | 5 | $0.05 |
groq-whisper-large-v3-turbo | Groq | Fastest | 3 | $0.03 |
Quick start
Parameters
URL or local path to the audio file. Supports mp3, wav, m4a, ogg, flac, webm.
Language code (e.g.,
"en", "es", "fr"). Auto-detected if not specified.Optional context to guide transcription. Useful for domain-specific terms or names.
Sampling temperature. 0 = deterministic, higher = more creative (not usually needed for transcription).
Choosing a model
| Scenario | Recommended | Why |
|---|---|---|
| Cheapest | groq-whisper-large-v3-turbo (3 credits) | 70% cheaper than fal Whisper |
| Best quality | whisper-large-v3 (10 credits) | Full Large V3 model on fal |
| Fast + cheap | groq-whisper (5 credits) | Good balance |
| Default | whisper (10 credits) | Reliable, well-tested |
Use with captions
Transcription is commonly used to generate captions for videos:Pricing
| Model | Credits | USD |
|---|---|---|
whisper / whisper-large-v3 | 10 | $0.10 |
groq-whisper / groq-whisper-large-v3 | 5 | $0.05 |
groq-whisper-large-v3-turbo | 3 | $0.03 |
Related models
ElevenLabs Speech
Generate speech from text (the reverse operation).