
LiteRT-LM
LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.
The Lens
LiteRT-LM runs language models directly on a device, no cloud and no internet required. The model lives on the phone, laptop, smartwatch, or even in the browser, so data never leaves the hardware, it works offline, and there's no per-query bill. This is Google's own framework, and Google uses it to power on-device AI in Chrome, Chromebook Plus, and the Pixel Watch. Apache 2.0, completely free.
It's cross-platform by design, targeting Android, iOS, desktop, the web via WebGPU, and small boards like Raspberry Pi, and it taps GPU and NPU acceleration instead of grinding on the CPU. It runs open models like Gemma, Llama, Phi, and Qwen. The work isn't running a server, because there is no server. The work is on the build side: you obtain and convert models into the right format, then wire up the native SDK for each platform you ship to, and manage on-device memory per device class. Heavier than calling a cloud API, far lighter than operating an inference cluster.
The real competition is other on-device runtimes. llama.cpp has broader model coverage and a bigger community; Meta's ExecuTorch is the closest vendor-backed rival; Apple's MLX wins on Macs but only on Macs. LiteRT-LM's edge is tight, official integration with Android and Google silicon. It doesn't replace a paid product so much as move certain workloads off the paid-API meter: the small and mid-size models you'd otherwise rent from a cloud. Solo and small teams shipping mobile or edge apps: this is the Google-blessed path. Larger teams already on Android get first-party support.
The catch is maturity. The core runtime is production-ready and shipping in real Google products, but some bindings, Swift and JavaScript among them, are still early preview, and the project is young. And on-device models are not frontier models. If you need GPT-class quality, this isn't that. It's for when private, offline, free, and good-enough beats cloud-quality.
Free vs Self-Hosted vs Paid
fully freeFree: Everything. Apache 2.0 framework with no paid tier, no hosted upsell, no license fee. The models it runs (Gemma and friends) are downloaded separately and are themselves free.
Cost you'll actually pay: Your own device compute and the engineering time to integrate it per platform. That's it.
The trade: Running inference on-device means you avoid paid cloud-API costs entirely for the workloads it can handle. The ceiling is model quality, not price.
Free and open source. The only cost is your device's compute and the engineering to integrate it.
Get tools like this every Wednesday
One featured tool, three on the radar. No fluff.
License: Apache License 2.0
Use freely. Patent grant included.
Commercial use: ✓ Yes
About
- Owner
- google-ai-edge (Organization)
- Stars
- 5,585
- Forks
- 577
Explore Further
More tools in the directory
openclaw
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
378.6k ★everything-claude-code
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
215.0k ★claw-code
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
193.8k ★