The Lens

aTrain turns speech recordings into text on your own machine, with no cloud upload and no subscription. It runs OpenAI's Whisper model locally for transcription in 99 languages and adds speaker diarization (working out who said what) through pyannote. It is a real desktop app with installers on the Microsoft Store and Flathub, not a script you have to babysit. AGPL-3.0, fully free.

Because everything runs on your device, nothing you record leaves your computer, which is the whole point for anyone handling interview, medical, or legal audio. On a plain CPU it is slow; with an NVIDIA GPU and the CUDA toolkit installed, the best model runs at roughly three times the audio length. It exports straight into MAXQDA, ATLAS.ti, and NVivo, so qualitative researchers are clearly the target audience.

Weigh this against Otter.ai, Rev, and Trint, which are faster and need no setup but send your audio to their servers and bill you monthly. If privacy matters, or you transcribe enough hours that subscriptions add up, aTrain wins outright. Solo researchers and journalists can install it and stop paying per minute. Teams with sensitive recordings may find local processing is the only option compliance allows.

The catch: local means your hardware is the bottleneck. Without a decent GPU, long recordings take real time, and accuracy still depends on audio quality the way every transcription tool does. It trades a monthly bill for your own patience and a CUDA install.