Best LLM Inference Tools

Run and serve large language models: local inference, production serving, and model management.

Ranked by score. Updated weekly.

1

ollama

100

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

174,817Gopermissive
2

openai-python

97

The official Python library for the OpenAI API

31,068Pythonpermissive
3

Transformers

91

Model framework for state-of-the-art ML

161,851Pythonpermissive
4

llama.cpp

91

LLM inference in C/C++

117,873C++permissive
5

vLLM

91

High-throughput LLM inference and serving engine

83,682Pythonpermissive
6

ragflow

90

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

83,486Pythonpermissive
7

LiteLLM

89

SDK and proxy to call 100+ LLM APIs in OpenAI format

51,337Pythonpermissive
8

sglang

85

SGLang is a high-performance serving framework for large language models and multimodal models.

29,583Pythonpermissive
9

Open WebUI

84

Self-hosted AI interface for LLMs

142,804TypeScriptpermissive
10

LocalAI

83

Open-source AI engine, run any model locally

47,093Gopermissive
11

Ray

83

AI compute engine for ML workloads at scale

42,998Pythonpermissive
12

Gradio

83

Build and share ML demo apps in Python

42,981Pythonpermissive
13

CLIProxyAPI

83

Wrap Gemini CLI, Antigravity, ChatGPT Codex, Claude Code, Qwen Code, iFlow as an OpenAI/Gemini/Claude/Codex compatible API service, allowing you to enjoy the free Gemini 2.5 Pro, GPT 5, Claude, Qwen model through API

38,231Gopermissive
14

Label Studio

83

Multi-type data labeling and annotation

27,678TypeScriptpermissive
15

MLX

83

Array framework for Apple silicon

27,216C++permissive
16

MLflow

83

Open source AI/ML lifecycle platform

26,708Pythonpermissive
17

Langfuse

81

Open source LLM engineering platform

29,641TypeScriptpermissive
18

pytorch

80

Tensors and Dynamic neural networks in Python with strong GPU acceleration

101,024Pythonunknown
19

headroom

77

The Context Optimization Layer for LLM Applications

48,841Pythonpermissive
20

text-generation-webui

71

Local LLM interface with text, vision, and training

47,369Pythonstrong-copyleft

Explore More Categories