Run and serve large language models: local inference, production serving, and model management.
Ranked by score. Updated weekly.
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
The official Python library for the OpenAI API
Model framework for state-of-the-art ML
LLM inference in C/C++
High-throughput LLM inference and serving engine
SDK and proxy to call 100+ LLM APIs in OpenAI format
Self-hosted AI interface for LLMs
Open-source AI engine, run any model locally
Build and share ML demo apps in Python
AI compute engine for ML workloads at scale
Wrap Gemini CLI, Antigravity, ChatGPT Codex, Claude Code, Qwen Code, iFlow as an OpenAI/Gemini/Claude/Codex compatible API service, allowing you to enjoy the free Gemini 2.5 Pro, GPT 5, Claude, Qwen model through API
Multi-type data labeling and annotation
Array framework for Apple silicon
Open source AI/ML lifecycle platform
LLM inference server with continuous batching and SSD caching for Apple Silicon, managed from the macOS menu bar.
Open Multi-Agent Interactive Classroom — Get an immersive, multi-agent learning experience in just one click
Open source LLM engineering platform
ML experiment tracking
Local LLM interface with text, vision, and training
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.