
whisper
Robust Speech Recognition via Large-Scale Weak Supervision
The Lens
Whisper turns speech into text, and it set the bar the moment OpenAI released it. Feed it an audio file in almost any of 99 languages and you get back a transcript, optionally translated to English. The model weights and the code are MIT licensed, so you can run the whole thing on your own machine for nothing.
Running it yourself is a pip install and an ffmpeg dependency away, but the catch is hardware. The tiny model fits in about 1GB of VRAM and is fast and rough; the large model wants roughly 10GB and a real GPU to run at a sane speed. On a CPU it works, but you will wait. A newer turbo model is much faster for plain transcription, though it drops the translation trick.
For a one-off transcript, OpenAI's hosted Whisper API runs about half a cent per minute and saves you the setup. Run it locally when the audio is sensitive, when you are processing a lot of it, or when you just do not want a per-minute bill. Solo and small teams: local on a decent GPU is plenty. Higher volume: budget a GPU box and self-host.
The catch is that Whisper is a model, not an app. It does straight transcription, not speaker labels or live captioning out of the box. If you want a GUI with those niceties, look at buzz or vibe, which both wrap this exact model.
Free vs Self-Hosted vs Paid
fully freeWhat's Free
Everything in the open-source release. MIT license covers both the code and the model weights, for all model sizes (tiny through large) plus the turbo model. No paid tier on the open-source side.
Self-Hosted
pip install -U openai-whisper, plus ffmpeg and PyTorch. Cost is hardware, not licensing:
- CPU only: works, slow on anything but the tiny model.
- Consumer GPU (8-12GB): comfortably runs the large model.
- Apple Silicon: runs well via community ports like whisper.cpp.
Paid Cloud Option
OpenAI's hosted Whisper API charges roughly $0.006/min (about $0.36/hour of audio). Worth it for low volume or when you don't want to manage a GPU. At scale, local wins on cost and privacy.
vs Alternatives
- Local GUIs (buzz, vibe): same model, friendlier interface, still free.
- Hosted APIs (OpenAI, Deepgram, AssemblyAI): zero setup, per-minute pricing, your audio leaves your machine.
Free and open source, weights included. Run it locally for free, or pay ~$0.006/min for OpenAI's hosted API to skip the GPU.
Get tools like this every Wednesday
One featured tool, three on the radar. No fluff.
Similar Tools

the subtitle editor :)

本地优先的一站式桌面字幕工具,内置 6 种 ASR 引擎与全平台 GPU 加速及 17+ 翻译服务商,覆盖音视频转写、翻译、校对、字幕烧录封装全流程,跨 Windows/macOS/Linux 运行

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Transcribe on your own!
About
- Owner
- openai (Organization)
- Stars
- 103,584
- Forks
- 12,612
Explore Further
More tools in the directory
openclaw
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
380.3k ★everything-claude-code
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
221.3k ★hermes-agent
The agent that grows with you
202.3k ★