
promptfoo
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.
The Lens
Promptfoo answers a question every team shipping LLM features eventually hits: did that prompt change actually make things better, or did I just break something I cannot see? It is a developer-first tool to evaluate prompts, compare models side by side, and red-team your app for vulnerabilities like prompt injection and data leakage. It runs as a CLI or library, supports the major providers, and is built to live in CI so regressions get caught before users do.
Ops burden is close to zero. You run npx promptfoo, write your test cases in YAML, and it runs entirely on your machine. There is no server to stand up for the core eval workflow, which is a big part of why developers like it. It plugs into your pipeline the same way your unit tests do.
The core is MIT and free, and the free tier does real work: full local evals, model comparisons, and red-teaming up to 10,000 probes a month. The paid Enterprise tier is custom-priced and buys the team layer, a centralized security dashboard, access controls with SSO, and managed or on-prem deployment. Solo developers and small teams can live on the free CLI indefinitely. Reach for Enterprise when you need shared results and SSO across a security team. It substitutes for paid eval platforms like Braintrust and LangSmith. Worth noting: Promptfoo is now part of OpenAI, while staying MIT open source.
The catch is the one that defines most open-core tools. The free tier is yours, local and uncapped for evals, but the moment you want a shared dashboard, team controls, and collaboration, that is the paid pitch. For most individual developers, that line never gets crossed.
Free vs Self-Hosted vs Paid
free self hosted paid cloudFree / self-hosted: The MIT core. Full local prompt evals, side-by-side model comparisons, and red-teaming up to 10,000 probes/month. Runs on your machine and in CI with no server.
Enterprise / Cloud: Custom pricing (dollar amounts unpublished). Adds advanced vuln detection, a centralized security dashboard, team access controls with SSO, CI/CD and API integrations, webhooks, priority SLA support, and managed cloud or on-prem deployment.
The free local CLI does real work. Pay for Enterprise only when you need a shared dashboard and SSO.
Get tools like this every Wednesday
One featured tool, three on the radar. No fluff.
Similar Tools
License: MIT License
Use freely, including commercial. Just keep the license.
Commercial use: ✓ Yes
About
- Owner
- promptfoo (Organization)
- Stars
- 22,256
- Forks
- 1,983
Explore Further
More tools in the directory
everything-claude-code
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
216.9k ★hermes-agent
The agent that grows with you
195.6k ★ollama
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
174.3k ★

