
flash-moe
Running a big model on a small laptop
The Lens
Flash-moe makes that possible. It uses a technique called Mixture of Experts (MoE) to run only the parts of the model that matter for each request, dramatically cutting the memory and compute needed.
The pitch is simple: big model intelligence on small hardware. Models that normally need 32GB+ of VRAM can run on a laptop with 8-16GB of regular RAM. It's slower than running on a GPU, but it works.
The catch: growing explosively but very early. The 'runs on a laptop' promise depends heavily on the model and your hardware. And MoE optimization is an active research area. Expect the approach to evolve fast.
Get tools like this every Wednesday
One featured tool, three on the radar. No fluff.
Free vs Self-Hosted vs Paid
fully freeOpen source, no paid tier. You clone it and run it. The license isn't specified in the repo metadata. Check the repo directly before commercial use.
Free. Check the license for commercial use.
Similar Tools

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

LLM inference in C/C++

LLM inference server with continuous batching and SSD caching for Apple Silicon, managed from the macOS menu bar.

Open-source AI engine, run any model locally
About
- Owner
- Dan Woods (User)
- Stars
- 3,820
- Forks
- 473
- discussed