Run and serve large language models — local inference, production serving, and model management.
Ranked by Coldcast Score. Updated weekly.
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
LLM inference in C/C++
Model framework for state-of-the-art ML
High-throughput LLM inference and serving engine
Open-source AI engine, run any model locally
Framework for building data apps fast
Build and share ML demo apps in Python
AI compute engine for ML workloads at scale
Multi-type data labeling and annotation
Open source AI/ML lifecycle platform
Array framework for Apple silicon
ML experiment tracking
LLM inference server with continuous batching and SSD caching for Apple Silicon, managed from the macOS menu bar.
Local LLM interface with text, vision, and training
SDK and proxy to call 100+ LLM APIs in OpenAI format
Open source LLM engineering platform
Running a big model on a small laptop