Best LLM Inference Tools

Run and serve large language models: local inference, production serving, and model management.

Ranked by score. Updated weekly.

1

ollama

100

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

170,963Gopermissive
2

openai-python

97

The official Python library for the OpenAI API

30,708Pythonpermissive
3

Transformers

91

Model framework for state-of-the-art ML

160,371Pythonpermissive
4

llama.cpp

91

LLM inference in C/C++

108,903C++permissive
5

vLLM

91

High-throughput LLM inference and serving engine

79,229Pythonpermissive
6

LiteLLM

86

SDK and proxy to call 100+ LLM APIs in OpenAI format

46,094Pythonpermissive
7

Open WebUI

84

Self-hosted AI interface for LLMs

136,019TypeScriptpermissive
8

LocalAI

83

Open-source AI engine, run any model locally

46,120Gopermissive
9

Gradio

83

Build and share ML demo apps in Python

42,528Pythonpermissive
10

Ray

83

AI compute engine for ML workloads at scale

42,442Pythonpermissive
11

CLIProxyAPI

83

Wrap Gemini CLI, Antigravity, ChatGPT Codex, Claude Code, Qwen Code, iFlow as an OpenAI/Gemini/Claude/Codex compatible API service, allowing you to enjoy the free Gemini 2.5 Pro, GPT 5, Claude, Qwen model through API

31,040Gopermissive
12

Label Studio

83

Multi-type data labeling and annotation

27,229TypeScriptpermissive
13

MLX

83

Array framework for Apple silicon

26,036C++permissive
14

MLflow

83

Open source AI/ML lifecycle platform

25,815Pythonpermissive
15

omlx

83

LLM inference server with continuous batching and SSD caching for Apple Silicon, managed from the macOS menu bar.

12,528Pythonpermissive
16

OpenMAIC

82

Open Multi-Agent Interactive Classroom — Get an immersive, multi-agent learning experience in just one click

16,923TypeScriptstrong-copyleft
17

Langfuse

81

Open source LLM engineering platform

26,786TypeScriptpermissive
18

Weights & Biases

81

ML experiment tracking

11,048Pythonpermissive
19

text-generation-webui

71

Local LLM interface with text, vision, and training

46,958Pythonstrong-copyleft
20

TensorRT-LLM

71

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

13,578Pythonunknown

Explore More Categories