
llama.cpp
LLM inference in C/C++
Coldcast Lens
llama.cpp is the project that proved you don't need a data center to run an LLM. Pure C/C++ inference for large language models — no Python, no PyTorch, no CUDA requirement. It runs Llama, Mistral, Phi, and dozens of other models on CPUs, Apple Silicon, and consumer GPUs. The engine behind nearly every local AI app.
If you want to run AI models locally — for privacy, cost savings, or offline use — llama.cpp is the foundation everything else is built on. Ollama wraps it in a friendly CLI. LM Studio wraps it in a GUI. vLLM is faster for GPU serving but Python-only. ExLlamaV2 squeezes more performance from NVIDIA GPUs.
Best for developers building local AI products or anyone who wants to understand how LLM inference actually works at the metal level.
The catch: it's C/C++, so building from source and debugging isn't for everyone. Model quantization tradeoffs (quality vs. speed vs. memory) require experimentation. Performance tuning is hardware-specific. And the project moves so fast that tutorials from three months ago may already be outdated.
About
- Stars
- 99,301
- Forks
- 15,772
Explore Further
More tools in the directory
Get tools like this delivered weekly
The Open Source Drop — the best new open source tools, analyzed. Free.