AI Research

13 open source tools compared. Sorted by stars. Scroll down for our analysis.

Tool	Stars	Velocity	Language	License	Score
cs249r_book Machine Learning Systems	25.0k	+61/wk	JavaScript	CC BY-SA 4.0	87
tinyrenderer	23.8k	+42/wk	C++	Not specified (educational)	71
AutoResearchClaw Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞	13.6k	+118/wk	Python	MIT License	86
Auto-claude-code-research-in-sleep ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works with Claude Code, Codex, OpenClaw, or any LLM agent.	12.6k	+385/wk	Python	MIT License	86
pi-autoresearch Autonomous experiment loop extension for pi	7.1k	+34/wk	TypeScript	MIT License	78
canvas-lms The open LMS by Instructure, Inc.	6.7k	+10/wk	Ruby	GNU Affero General Public License v3.0	69
autoresearch Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat forever.	5.2k	+101/wk	Shell	MIT License	78
tribev2 This repository contains the code to train and evaluate TRIBE v2, a multimodal model for brain response prediction	2.9k	+49/wk	Jupyter Notebook	CC BY-NC 4.0	59
kana-dojo Aesthetic, minimalist platform for learning Japanese inspired by Duolingo and Monkeytype, built with Next.js and sponsored by Vercel. Beginner-friendly with plenty of good first issues - all contributions are welcome!	2.7k	+61/wk	TypeScript	GNU Affero General Public License v3.0	65
HyperAgents Self-referential self-improving agents that can optimize for any computable task	2.6k	+17/wk	Python	CC-BY-NC-SA-4.0	71
HRM-Text HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.	1.5k	+154/wk	Python	Apache License 2.0	69
OpenGauss	1.2k	+5/wk	Python	MIT License	69
autoresearch-genealogy Structured prompts, vault templates, and archive guides for AI-assisted genealogy research. Built for Claude Code.	1.1k	+3/wk		MIT License	69

Stay ahead of the category

New tools and momentum shifts, every Wednesday.

Our Analysis

cs249r_book25.0k★

This is a free textbook from Harvard's CS249r course that covers exactly that. It's not a tool you install. It's a book you read at mlsysbook.ai. Covers the full ML systems stack: hardware architectures, model optimization, deployment on microcontrollers, on-device training, benchmarking, and security. Written by Harvard professors with contributions from industry practitioners. Regular updates as the field moves. Fully free. No paywall, no premium chapters, no course enrollment required. The entire book is available online at mlsysbook.ai and the source is on GitHub. This is for anyone from students to senior engineers who want to understand ML systems beyond 'call the API.' If you're deploying models to production and don't understand quantization, pruning, or hardware-aware optimization, this fills that gap. The catch: it's an academic textbook. The writing is thorough but dense. If you want a quick practical guide to deploying a model on a Raspberry Pi, this will give you the theory but not the step-by-step tutorial. And most of those are from students bookmarking it; the GitHub engagement doesn't reflect active development in the traditional sense.

tinyrenderer23.8k★

This isn't a tool you install; it's a course. Tinyrenderer teaches you how 3D rendering actually works by having you build a software renderer from scratch in C++. No GPU, no OpenGL, no libraries. Just math and pixels. You start with drawing lines, move to triangles, add textures, implement lighting, build a z-buffer, and by the end you've written a basic 3D engine that can render textured, lit models. The whole thing is about 500 lines of code. Each lesson has theory, code, and visual output so you can see what each step does. Completely free. The lessons are in the GitHub wiki. The code is public. No paid tier, no course fee, no upsell. One of the most popular computer graphics educational resources on GitHub, and it's been a go-to in graphics education for years. The catch: this is not for beginners who've never coded. You need to be comfortable with C++ (or at least C-like syntax) and basic linear algebra (vectors, matrices, dot products). If that sounds intimidating, start with a gentler intro to graphics programming first. This teaches software rendering, useful for understanding how GPUs work, but you won't use this approach in production. For actual 3D work, you'll move to OpenGL, Vulkan, or a game engine.

AutoResearchClaw13.6k★

AutoResearchClaw automates research by letting AI agents search, analyze, and synthesize information from multiple sources, then compile structured reports on the results. You give it an idea, it designs experiments, runs them, writes the paper, and iterates on the results. Basically a research assistant that never sleeps. It's built for academic and ML research specifically, not general-purpose AI automation. The pipeline handles literature review, experiment design, code generation, result analysis, and paper writing in a loop. The catch: this is bleeding edge. growing fast, but it's a research tool built by a research lab. The output still needs human review; don't submit a paper without reading it. And the quality depends heavily on the underlying LLM you point it at.

Auto-claude-code-research-in-sleep12.6k★

ARIS (Auto-Research-In-Sleep) automates ML research overnight. You give Claude Code a research question or experiment, walk away, and come back to results, with cross-model review gates designed to catch hallucinations and fabricated citations along the way. It's a Markdown-only skills bundle, no framework, no lock-in. Claude Code is the driver, but the review loop pulls in GPT, DeepSeek, and other providers to check each other's work. The skill set (67+ bundled) covers literature review, idea discovery, experiment automation, paper writing, and rebuttal drafting. MIT licensed, with hardening that came from real NeurIPS submission cycles, not vibes. Solo ML researchers and small academic teams running into the same paper-writing grind every conference cycle get the biggest win. Industry research teams with full MLOps platforms probably don't need this. The cross-model review piece is the differentiator. Most overnight-agent setups skip it and ship hallucinated citations. The catch: autonomous overnight loops only work if your constraints are tight. Vague goals produce vague output and burn through credits. And the cross-model review means every iteration costs multiple LLM calls instead of one. Budget accordingly, especially if you're routing to GPT or Claude Opus for the reviewer role.

pi-autoresearch7.1k★

Pi-autoresearch adds that capability. You define an experiment, walk away, and it loops through modify-test-evaluate cycles until it converges on a result. This is the same autonomous research pattern as ARIS and autoresearch, but built specifically for pi.dev's ecosystem. The integration is tighter because it's native to the platform rather than a bolt-on. MIT licensed, TypeScript. The catch: pi.dev has its own pricing model, and autonomous loops burn through whatever credits or API calls pi charges for. The homepage points to pi.dev/pricing, which means there's a paid component to the platform this runs on. And it's locked to pi.dev. If you switch to Claude Code or Codex, you can't take this with you.

canvas-lms6.7k★

Canvas LMS is what universities and K-12 schools actually use. It's the open source version of the same platform that Instructure sells commercially to thousands of educational institutions. AGPL-3.0, written in Ruby on Rails. You get the full LMS: course management, a gradebook, discussion boards, a calendar, file storage, video conferencing integration, and LTI (Learning Tools Interoperability) support for third-party tool plugins. Self-hosting is free under AGPL. Instructure sells the hosted version (Canvas Cloud); pricing is per-institution and not publicly listed, but think five to six figures annually for universities. The catch: self-hosting Canvas is a serious undertaking. It requires Ruby on Rails, PostgreSQL, Redis, a job queue, file storage (S3 or local), and real ops knowledge. The AGPL license means any modifications you make must be open sourced if you offer it as a service. And without Instructure's support, you're on your own for updates, security patches, and integration issues. This is enterprise software that happens to be open source, not a weekend project.

autoresearch5.2k★

Autoresearch turns Claude Code into a relentless iteration machine. You set a goal, point it at your codebase, and it runs a modify-verify-keep-or-discard cycle until the goal is met or you pull the plug. Inspired by Karpathy's approach to autonomous ML research, it generalizes that pattern to any domain where you can write a test for success. This isn't a framework you build on. It's a skill you add to Claude Code, so setup is dropping it into your skills directory. The work happens inside the loop: it proposes a change, runs your verification, and only keeps what passes. You define what done looks like, then walk away. Solo developers and researchers who already live in Claude Code get the most here, a way to throw a well-scoped, testable problem at an agent and let it grind. Teams without a clear success signal get less, because the loop is only as good as the check it runs against. The catch: it lives entirely inside the Claude Code ecosystem, so you need that subscription. And autonomous iteration burns API credits fast. Give it a fuzzy goal or a slow test and it will happily spend your money chasing its tail. Scope it tight and watch it.

tribev22.9k★

TRIBE v2 is a research benchmark from Meta AI for evaluating video understanding models, specifically how well AI can track multiple objects across video frames. If you're doing computer vision research on multi-object tracking, this provides training code, evaluation tools, and pretrained models. Jupyter Notebook-based, CC BY-NC 4.0 license (non-commercial use only). The benchmark includes datasets and evaluation protocols that let you compare your tracking model against established baselines. The catch: non-commercial license means you cannot use this in a product. It's for research evaluation only. Requires significant GPU resources to run the training and evaluation pipelines. And as an academic benchmark, it's designed for researchers who already understand multi-object tracking. Not a tool for building applications.

kana-dojo2.7k★

Kana Dojo is a minimalist Japanese language learning app inspired by Duolingo, built with Next.js. Covers hiragana, katakana, and vocabulary with spaced repetition. Clean UI, gamified progress tracking, and a focus on the fundamentals that trip up beginners. Fully free, no ads, no paid tier. Sponsored by Vercel. The codebase is beginner-friendly with labeled good-first-issues, so it doubles as a learning project for Next.js developers. The catch: Japanese only. No other languages supported, and the curriculum covers kana and basic vocabulary, not grammar or conversation. If you're past the beginner stage, you'll outgrow it quickly. For comprehensive Japanese learning, Anki with community decks goes much deeper.

HyperAgents2.6k★

A 'meta agent' modifies the 'task agent' and can also modify itself. It's agents all the way down. The architecture combines a task agent (solves the problem) and a meta agent (optimizes both the task agent and its own optimization strategy) into a single editable Python repository. The meta agent can modify any file, including its own source code. In benchmarks, this self-referential loop outperforms agents without self-improvement, and the meta-level improvements transfer across domains. This is from the paper 'Hyperagents' (arxiv 2603.19461). The catch: this is a research project, not a production tool. The license says 'Other' which likely means Meta's research license. Check before using commercially. Running self-modifying agents requires serious compute (the paper uses multiple LLM calls per iteration) and serious trust in the guardrails. Fascinating research. Not something you deploy to production tomorrow.

HRM-Text1.5k★

HRM-Text is a pretraining framework for building your own small language model from scratch, cheaply. The claim is foundation-model pretraining with 130 to 600 times less compute and far less data than usual, using a Hierarchical Reasoning Model architecture instead of a standard transformer. Apache-2.0 and free, with a pre-trained 1B checkpoint on Hugging Face if you'd rather not train. This is research infrastructure, not a product. It's a full training pipeline: prepare tokenized data, launch distributed training, evaluate on benchmarks like MATH and MMLU, export to Hugging Face format. The architecture leans on recurrent reasoning layers, sequence packing, and FlashAttention 3 kernels to squeeze efficiency out of the run. The catch: cheap is relative. You still need a cluster of 8 to 16 H100 GPUs and roughly 800 to 1,500 dollars in compute for a full training run. This is for researchers and teams exploring efficient architectures, not for anyone who just wants to use a model. If that's you, grab the checkpoint and skip the training code.

OpenGauss1.2k★

OpenGauss is the best open source tool for enterprise Postgres right now. It's a workflow orchestrator from Math, Inc, that gives the Gauss AI agent a multi-agent frontend for proof engineering: proving, drafting, auto-proving, formalizing, and auto-formalizing. On FormalQualBench, it beats Harmonic's Aristotle agent (which has no time limit) running with just a 4-hour timeout. You can stay interactive or let it run autonomously, coordinate subagents in parallel, and inspect everything. MIT licensed. Built in Python. The catch: this is an extremely niche tool. If you're not doing formal mathematics or proof verification in Lean, this does nothing for you. The audience is mathematicians, formal methods researchers, and teams building verified software. Math, Inc. is pushing the frontier here, but the Lean ecosystem itself is still small compared to mainstream programming languages.

autoresearch-genealogy1.1k★

This gives you structured prompts, Obsidian vault templates, and archive research guides specifically for genealogy work with Claude Code. It's not a tool you install. It's a research methodology packaged as Claude Code skills. You get templates for organizing findings, prompts that know how to search genealogy databases, and guides for navigating archives like Ancestry.com, FamilySearch, and government records. MIT licensed. The catch: this is a niche application of AI research skills. If you're not doing genealogy, it's useless. The quality of results depends heavily on what records are available online. AI can't access physical archives or read handwritten documents that haven't been digitized. And it's prompts and templates, not a tool. The actual research still requires human judgment about which sources to trust.