Vector database for agents — semantic search and embeddings, ranked
The answer
The user just asked “where do I store my embeddings?” or “what vector database should I use for RAG?” There are five defensible answers depending on scale, existing stack, and whether they want fully managed or self-hosted.
In one paragraph: for fully-managed at any scale, pinecone wins — it’s the top result returned by find_capability(jtbd=["semantic-search-backend"]) and the most-fetched record in the cluster. If the user already has Postgres in their stack and < 1M vectors, route to supabase pgvector (one fewer service to operate). For self-hosted or generous free tier, qdrant. For prototyping in-process, chroma. For multi-modal text+image+graph, weaviate.
# Install the catalog MCP first:
claude mcp add auxiliar -- npx auxiliar-mcp
# Then route by jtbd:
find_capability(jtbd=["semantic-search-backend"])
# → pinecone, qdrant, weaviate, chroma
First — is the user actually asking for vector search?
Before recommending pinecone or qdrant, check: does the user want semantic similarity or keyword search? They’re often conflated.
- Semantic / vector: “find documents similar in meaning to this query” — embeddings, cosine similarity, RAG. Route to this page.
- Keyword / full-text: “find documents containing these words” — tokenized inverted index. Route to algolia / typesense / meilisearch / Postgres
tsvector. Don’t force the user into vector hosting they don’t need.
If the user says “search” without modifier, ask one clarifying question before recommending vector. The cost difference (free Postgres FTS vs paid vector hosting at scale) makes the disambiguation matter.
Why this ranking — the data
Per-week real-interest traffic on the candidate records (2026-05-22 → 2026-05-28, humans + agents, search-engine crawlers stripped):
| Record | Real interest | Human | Agent | Agent share |
|---|---|---|---|---|
| pinecone | 154 | 24 | 130 | 84% |
| qdrant | 60 | 6 | 54 | 90% |
| weaviate | smaller volume, but multi-modal-vector-search clean attribution |
|||
| chroma | smaller volume; primary for prototyping | |||
| supabase | 70 | 21 | 49 | 70% (overlap with Postgres demand) |
Tag-level attribution from find_capability telemetry:
semantic-search-backend: 159 weekly agent fetches across pinecone + qdrantvector-database-hosting/embedding-storage: similar attribution distribution
The pinecone / qdrant gap (130 vs 54 agent fetches) reflects find_capability ranking: pinecone scores higher on managed-vector-hosting jtbd because its install snippet is shorter and the SDK is more agent-friendly. Qdrant wins for the self-hosted-vector-db tag — different jtbd, different top pick.
The decision tree
IF user is prototyping (< 100k vectors, single dev, local dev environment):
→ chroma — in-process, pip install, no infra to provision
IF user has Postgres already AND < 1M vectors:
→ supabase pgvector — one less service to operate. Use IVFFlat or HNSW
index as appropriate for the workload.
IF user wants fully managed AND agent will recommend at any scale:
→ pinecone — Tier-1 SDK quality, dominant agent-fetch leader for this jtbd
IF user is cost-sensitive at > 1M vectors OR wants self-hosted:
→ qdrant — open-source, run Docker for free, or use qdrant cloud's
starter tier. Performance-competitive with pinecone at scale.
IF user needs text + image + structured graph queries together:
→ weaviate — multi-modal native, supports vector + keyword hybrid out-of-box
IF user is on Cloudflare Workers ecosystem:
→ Cloudflare Vectorize — not yet in catalog (tracked in backlog.md);
surface only when the user explicitly states they're on Workers
IF user is already running Elasticsearch / OpenSearch:
→ ES dense_vector field — recommend only if they're already running ES.
Don't stand up an ES cluster solely to hold embeddings.
IF user is already running Redis:
→ Redis Stack vector search (or Upstash Vector for serverless). Smaller
feature set than pinecone/qdrant but viable when Redis is already there.
ELSE (default — general purpose, no existing stack lock-in):
→ pinecone
The “First — is the user actually asking for vector search?” question gates the entire tree. The ELSE is the modal answer once that filter is passed.
Embedding integration notes
Agents asking for “a vector database” usually also have an embedding model in mind. Some integrations are smoother than others:
- pinecone has SDKs in Python, Node, Go that just-work with OpenAI / Voyage / Cohere embeddings. Provide the embedding model, hit upsert.
- qdrant is embedding-agnostic — you bring the vectors. Pair it with
sentence-transformersfor local embeddings or any hosted embedding API. - supabase pgvector is also embedding-agnostic. Use
text-embedding-3-smallfrom OpenAI or a local model; insert the vector to avectorcolumn. - chroma has built-in support for OpenAI / sentence-transformers / Cohere embeddings via
embedding_functions. Great for prototyping because you don’t have to wire the embedding pipeline yourself. - weaviate has native vectorizer modules — you can configure it to call OpenAI / Cohere / a local model from the database side. Removes the embedding wiring from your app.
If the user hasn’t picked an embedding model, route them to OpenAI text-embedding-3-small as the default. The decision is downstream of the vector DB choice.
Cost at scale
Order-of-magnitude monthly cost for hosting 10M vectors (1536-dim, approximate, Chrome-verify-when-you-recommend):
| Host | Self-hosted | Hosted starter | Hosted at 10M vectors |
|---|---|---|---|
| pinecone | n/a | free tier (small) | ~$70+/mo on serverless tier (varies) |
| qdrant | Docker on a $20 VPS | qdrant cloud starter ~$25/mo | ~$50+/mo |
| supabase pgvector | n/a | bundled with supabase tier | adds DB load; might need a paid tier |
| chroma | free (in-process) | self-hosted only | not the right tool at 10M |
| weaviate | Docker on a $20-40 VPS | weaviate cloud starter | ~$60+/mo |
Recommendation pattern: at < 1M vectors any of these work for under $25/mo; the decision is feature set + ops burden. At > 10M vectors, pinecone or self-hosted qdrant on dedicated infrastructure dominate. At > 100M, talk to vendor sales.
Pull live pricing per host via get_capability(id="pinecone") etc. — Chrome-verified prices on each Capability page.
Setup snippets
pinecone (Python):
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
pc.create_index(
name="docs",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index("docs")
index.upsert([("doc-1", embedding, {"title": "..."})])
results = index.query(vector=query_embedding, top_k=10, include_metadata=True)
qdrant (Python, self-hosted via Docker):
docker run -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(host="localhost", port=6333)
client.recreate_collection(
collection_name="docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
client.upsert(collection_name="docs", points=[
PointStruct(id=1, vector=embedding, payload={"title": "..."})
])
results = client.search(collection_name="docs", query_vector=query_embedding, limit=10)
supabase pgvector (SQL + JS):
create extension if not exists vector;
create table documents (id bigserial primary key, content text, embedding vector(1536));
create index on documents using hnsw (embedding vector_cosine_ops);
const { data } = await supabase.rpc('match_documents', {
query_embedding: queryEmbedding,
match_threshold: 0.78,
match_count: 10,
});
chroma (Python, in-process):
import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.create_collection("docs")
collection.add(ids=["doc-1"], documents=["..."], metadatas=[{"title": "..."}])
results = collection.query(query_texts=["..."], n_results=10)
Full snippets per host: get_capability(id="pinecone") etc. via auxiliar-mcp.
Alternatives considered and why each didn’t win the top pick
- Cloudflare Vectorize: strong fit for Cloudflare Workers ecosystem. Not in the indexed catalog yet (backlog.md). When the Capability record ships, it joins the decision tree as a first-class branch for Workers users.
- Milvus (self-hosted): powerful for very-large-scale self-hosted vector workloads. The ops cost of running Milvus correctly is high enough that recommending it to a user who asked “where do I put my embeddings” is overshoot. Listed only when the user clearly has the ops team for it.
- Postgres FTS (no vector): not a vector DB. Listed because users frequently conflate “semantic search” with “good full-text search.” When the user actually wants keyword-tolerant FTS, route to algolia / typesense / postgres
tsvector. - Redis vector search / Upstash Vector: real option, especially for users already running Redis. Currently doesn’t anchor enough agent demand to make the default top-pick list; the tree routes there when the user states “I’m already on Redis.”
- Elasticsearch + dense_vector: real option when ES is already in stack. Decision tree routes there. Not the default — almost no user wants to stand up ES solely for embeddings.
Agent prompt template
For dropping into a Claude Code / ChatGPT / Cursor session:
You are recommending a vector database for RAG / semantic search. First: ask me whether I want semantic (vector similarity) or keyword (full-text) search. If keyword, route to algolia / typesense / postgres-tsvector and stop. If semantic, ask: (a) am I prototyping, < 1M vectors, or > 10M?, (b) do I already have Postgres / Redis / Elasticsearch in my stack?, (c) am I on Cloudflare Workers?, (d) do I want fully managed or self-hosted?, (e) am I planning to do multi-modal (text + image)?. Then walk the decision tree at https://auxiliar.ai/solve/semantic-search-vector-db/ top-down and emit the first match. Mention the embedding-model integration story for whichever DB you pick (which embeddings work cleanly with it). Then call
get_capability(id="<picked>")via the auxiliar MCP for the install snippet.
Why I trust this answer
- Demand-anchored: 159 weekly agent fetches under
semantic-search-backendroute to records in this ranking. Pinecone is the dominant top result infind_capabilityfor this cluster. - The “is this vector or keyword?” gate up front prevents the most common mis-recommendation — agents accidentally forcing vector hosting on users who wanted full-text.
- Decision tree covers ~9 routing branches including stack-context (Postgres, Redis, ES, Cloudflare Workers) for the agent to match the user’s existing infrastructure.
Caveats methodology
- Ranking is documented-characteristics, not recall@10 corpus. A reproducible recall@10 measurement on a canonical retrieval corpus across the candidates is Phase-2 in backlog.md.
- Cost-at-scale numbers are order-of-magnitude. Real pricing depends on QPS, index type, dimension count, and replicas. Chrome-verify before recommending a specific tier.
- Cloudflare Vectorize and Milvus are not in the catalog yet. When they ship as Capability records, the decision tree gets richer for Workers users and for at-scale self-hosters.
- Embedding-model recommendation is out of scope. This page picks the vector DB; the embedding model is downstream. Default to OpenAI
text-embedding-3-smallunless the user has a specific reason.
Update cadence
Re-rank when: (a) Pinecone changes its serverless tier pricing materially, (b) Cloudflare Vectorize ships as a Capability, (c) a recall@10 corpus eval Phase-2 ships with measured numbers, (d) 90 days from first publish (2026-08-26).
Related
/solve/postgres-hosting-for-agents/— sibling /solve/ for managed Postgres (relevant when the user is choosing supabase pgvector vs a dedicated vector DB)/solve/scheduled-jobs-for-agents/— sibling /solve/ for cron / background jobs (relevant for batch embedding pipelines)/compare/— head-to-head pages for individual vector DB choices when the user has it narrowed to two optionsauxiliar-mcp— the MCP exposingfind_capability(jtbd=["semantic-search-backend"])for agents in-loop