Run and serve large language models: local inference, production serving, and model management.
Ranked by score. Updated weekly.
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
The official Python library for the OpenAI API
Model framework for state-of-the-art ML
LLM inference in C/C++
High-throughput LLM inference and serving engine
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
SDK and proxy to call 100+ LLM APIs in OpenAI format
SGLang is a high-performance serving framework for large language models and multimodal models.
Self-hosted AI interface for LLMs
Open-source AI engine, run any model locally
AI compute engine for ML workloads at scale
Build and share ML demo apps in Python
Wrap Gemini CLI, Antigravity, ChatGPT Codex, Claude Code, Qwen Code, iFlow as an OpenAI/Gemini/Claude/Codex compatible API service, allowing you to enjoy the free Gemini 2.5 Pro, GPT 5, Claude, Qwen model through API
Multi-type data labeling and annotation
Array framework for Apple silicon
Open source AI/ML lifecycle platform
Open source LLM engineering platform
Tensors and Dynamic neural networks in Python with strong GPU acceleration
The Context Optimization Layer for LLM Applications
Local LLM interface with text, vision, and training