Kalam

Local speech-to-text for macOS, powered by Whisper

By Harshvardhan in coding tools

February 20, 2026

GitHub Repository Website

THIS IS A WORK-IN-PROGRESS APP. KINDLY EXCUSE THE ROUGH EDGES. FEEL FREE TO TRY IT OUT AND OPEN ISSUES OR CONTRIBUTIONS ON GITHUB.

Kalam (कलम) means “pen” in Hindi and Urdu, but it also carries the sense of speech and words — the things you write with a pen. It felt like the right name for an app that turns your voice into text.

Website: https://harshvardhaniimi.github.io/kalam/

The Problem

macOS has built-in dictation, but it sends your audio to Apple’s servers. There are third-party transcription services too, but they all want a subscription and your data. I wanted something that runs entirely on my Mac — no cloud, no subscriptions, no audio leaving my device.

OpenAI’s Whisper model is one of the best open-source speech recognition systems available, and thanks to WhisperKit by Argmax, it runs natively on Apple Silicon using CoreML and the Neural Engine. That’s the foundation Kalam is built on.

How It Works

Kalam sits in your menu bar as a small waveform icon. The fastest way to use it:

Press Cmd+Shift+Space from anywhere in macOS
Speak
Press Cmd+Shift+Space again
The transcribed text appears at your cursor and is copied to your clipboard

That’s it. Writing an email? Click in the body, speak, and the text appears. Taking notes in Obsidian? Same thing. The global hotkey works across all applications.

You can also open the menu bar popover for a more visual interface with a waveform display, or use the full window mode for drag-and-drop file transcription.

Models

Kalam supports five Whisper model sizes, from Tiny (75 MB) to Large (2.9 GB). The Base model is the default — it downloads automatically on first launch (142 MB) and can transcribe a minute of audio in about 6 seconds on Apple Silicon. If you need better accuracy, you can switch to a larger model in settings. All models support 50+ languages.

Privacy

This was the whole point of building Kalam:

All processing happens locally on your Mac
No internet connection required (except for the initial model download)
No telemetry, no analytics, no cloud APIs
Audio never leaves your device
Transcription history is stored locally

Models are downloaded once from Hugging Face and cached in ~/Library/Application Support/Kalam/models/.

Install

The quickest way:

curl -sL https://raw.githubusercontent.com/harshvardhaniimi/kalam/main/install.sh | bash

Or clone the repo and run ./build-app.sh to build from source. Requires macOS 14+ (Sonoma).

Posted on:: February 20, 2026

Length:: 2 minute read, 391 words

Categories:: coding tools

Tags:: macos swift speech-to-text whisper

See Also:: Macchi Trash