
mlx-lm
Run LLMs with MLX
The Lens
mlx-lm runs and fine-tunes large language models directly on a Mac. Point it at a model on Hugging Face and one command pulls it down and runs it locally, using Apple's own MLX engine instead of a cloud API or a separate GPU rig. MIT licensed, free, and built by Apple's own ml-explore team, the same group behind MLX itself.
It does more than run models. You can quantize them down to 4-bit, fine-tune with LoRA or full-model training, serve with streaming and prompt caching, and even split work across multiple machines. Setup is close to trivial: pip install mlx-lm, then a single command chats with a model. The real constraint is memory. MLX uses the Mac's unified memory, so the model has to roughly fit in RAM, and pushing past that needs macOS 15 or newer plus some system tuning. And it's Apple Silicon only. No M-series chip, no mlx-lm.
The honest framing on competition: this is a building block, not a finished app. llama.cpp is the closest peer and runs on more hardware; Ollama and LM Studio are more packaged and app-like, and increasingly use MLX under the hood anyway; vLLM is for datacenter GPUs, a different world. mlx-lm's edge is being the MLX-native option, which means the best raw performance on a Mac and the cleanest fine-tuning story. Solo developers and researchers on Apple Silicon: this is the fast path. Small teams can build on it; larger production serving will want something server-side.
The catch is that you're trading convenience and reach for Mac-native speed. It's lower-level than Ollama, locked to Apple hardware, and capped by how much RAM you bought. Within those lines, nothing runs models on a Mac better.
Free vs Self-Hosted vs Paid
fully freeFree: MIT-licensed, fully open, no paid tier. Running, quantizing, fine-tuning, and serving, all free.
Cost you'll actually pay: A Mac with enough RAM. The software is free; the hardware is the spend, and RAM is the limiting factor for model size.
The trade: Self-hosting on a Mac you already own means no per-token cloud bill. The ceiling is your RAM, not your budget.
Free and open source. Your only cost is owning an Apple Silicon Mac with enough RAM.
Get tools like this every Wednesday
One featured tool, three on the radar. No fluff.
License: MIT License
Use freely, including commercial. Just keep the license.
Commercial use: ✓ Yes
About
- Owner
- ml-explore (Organization)
- Stars
- 5,745
Explore Further
More tools in the directory
phoenix
AI Observability & Evaluation
10.1k ★lance
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
6.6k ★adk-java
An open-source, code-first Java toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
1.6k ★