Vector Databases

5 open source tools compared. Sorted by stars. Scroll down for our analysis.

Tool	Stars	Velocity	Language	License	Score
Milvus Cloud-native vector database for scalable ANN search	44.9k	+122/wk	Go	Apache License 2.0	83
Qdrant High-performance vector database and search engine	32.6k	+240/wk	Rust	Apache License 2.0	83
Chroma Data infrastructure for AI	28.6k	+92/wk	Rust	Apache License 2.0	83
Weaviate Open-source vector database	16.4k	+84/wk	Go	BSD 3-Clause "New" or "Revised" License	83
pgvector-python pgvector support for Python	1.5k	+1/wk	Python	MIT License	64

Stay ahead of the category

New tools and momentum shifts, every Wednesday.

Our Analysis

Milvus44.9k★

When your app converts text or images into numerical vectors (via OpenAI, Cohere, or any embedding model), Milvus finds the closest matches across millions or billions of vectors in milliseconds. Apache 2.0, Go/C++. Cloud-native architecture with storage and compute separated. Supports multiple index types (HNSW, IVF, DiskANN), hybrid search (vectors + scalar filters), and multi-tenancy. The SDK supports Python, Java, Go, Node.js, and REST. Self-hosting is free. Full feature set, no restrictions. Runs on Kubernetes with Helm charts or standalone via Docker. Minimum production setup: 3 nodes for high availability. Zilliz Cloud (managed Milvus) has a free tier with 2 collections and 1M vectors. Paid starts at ~$65/mo for a dedicated cluster. Serverless option available for variable workloads. Solo: Milvus Lite (embedded mode) or Zilliz free tier for prototyping. Small teams: Zilliz free tier or self-host with Docker. Medium to large: self-host on Kubernetes or Zilliz Cloud depending on ops capacity. The catch: self-hosted Milvus on Kubernetes is operationally heavy: etcd, MinIO, Pulsar (or Kafka) as dependencies. That's a lot of infrastructure for a vector search feature. If your dataset is under 1M vectors, Chroma or pgvector (Postgres extension) gets you there with dramatically less infrastructure.

Qdrant32.6k★

Qdrant is a vector database built for finding things by meaning rather than exact keywords. You store embeddings (the numerical representations that AI models produce from text, images, or any data), and Qdrant finds the most similar ones instantly. It's a database where you search by "things like this" instead of "contains this word." Apache 2.0, Rust. One of the fastest-growing tools we track. Built for performance: Rust core, HNSW indexing, quantization for memory efficiency, and filtering that works during search (not after). Supports multimodal: store text, image, and other vectors in the same collection. Qdrant Cloud has a free tier: 1 GB storage, 1 node. Self-hosting is free with no restrictions. Paid cloud starts at ~$25/mo for a small cluster. Self-hosting is straightforward: Docker image, single binary, or Kubernetes helm chart. Performance is strong on modest hardware. The ops burden is moderate: you need to manage backups, collection sizing, and index optimization. Solo developers: cloud free tier for prototyping, self-host for production. Small teams: self-host or cloud depending on ops appetite. Medium to large: cloud managed for less ops, or self-host for cost control at scale. The catch: vector databases are only useful if you have embeddings. You need an embedding model (OpenAI, Cohere, or a local model) to generate vectors before Qdrant stores them. And at small scale (under 100K vectors), you might not need a dedicated vector DB at all; pgvector in Postgres handles it fine.

Chroma28.6k★

Store text, images, or any data as embeddings (numerical representations that capture meaning), then query for 'things similar to this.' It's the database layer that makes RAG (retrieval-augmented generation, feeding relevant documents to an LLM) work. Apache 2.0, rewritten in Rust for performance. The developer experience is the selling point: `pip install chromadb`, four lines of Python, and you have a working vector store. No infrastructure needed to start. Self-hosted is free with no feature restrictions. Chroma Cloud (hosted version) offers a free tier and paid plans starting at $30/month for 5M embeddings with usage-based pricing above that. Solo: pip install and run in-process, zero ops. Small teams: self-host the server mode with Docker, minimal ops. Medium: evaluate Chroma Cloud vs self-hosted based on query volume. Large: self-host for control or Chroma Cloud for managed infrastructure. The catch: Chroma optimizes for developer experience over raw performance at scale. If you're storing billions of vectors, Milvus or Qdrant handle that better. And the Rust rewrite is still maturing. Some edge cases and features are catching up to the Python-era version.

Weaviate16.4k★

Weaviate is a vector database built for AI-native search and retrieval. Instead of matching exact keywords, it stores data as mathematical representations (vectors) and finds things that are semantically similar. The self-hosted version is free under BSD-3. You get vector search, hybrid search (combine vector + keyword), built-in vectorization modules (plug in OpenAI, Cohere, Hugging Face, or local models), filtering, multi-tenancy, and a GraphQL/REST API. It also supports generative search: ask a question and Weaviate retrieves context and generates an answer using your LLM. Weaviate Cloud offers a free sandbox (14-day, no production use), serverless pricing starting at $25/mo, and enterprise tiers for larger workloads. Solo devs: start with pgvector. Small teams building AI features: Weaviate Cloud sandbox to prototype, then self-host. Growing teams: self-host or serverless depending on ops capacity. Large orgs: enterprise tier. The catch: vector databases are specialized tools. If you're storing less than 100K vectors, Postgres with pgvector is simpler and cheaper to operate. Weaviate shines at scale, millions of vectors with sub-millisecond search. Also, the built-in vectorizers depend on external APIs (OpenAI, etc.), so factor in those costs. Self-hosting with large datasets needs serious RAM.

pgvector-python1.5k★

pgvector-python is the Python client for pgvector, the extension that turns Postgres into a vector database. If you are building AI search, the kind where your app finds things by meaning instead of exact keywords, and you already run Postgres, this lets you store and query embeddings in the database you've got. No separate vector store to provision. It speaks to psycopg, asyncpg, SQLAlchemy, Django, and the rest of the Python data stack. There is nothing to operate beyond the Postgres you already run. Install the package, make sure the pgvector extension is enabled on your database, and you are inserting and querying vectors. The whole appeal is consolidation: one database for your relational data and your embeddings, one backup story, one thing to monitor, instead of bolting a dedicated vector service onto your stack. Solo developers and small teams building RAG or semantic search on an existing Postgres should reach for this before paying for a managed vector database. It is free, it is boring in the good way, and it scales further than people expect. The catch is that it is a thin client, not the engine. The performance ceiling is pgvector and Postgres themselves, and at very large scale or very high query volume a purpose-built vector database like a self-hosted Qdrant or Milvus will outrun it. For most apps you will never hit that wall, but know it is there.