Vector database for agents — semantic search and embeddings, ranked

When an AI agent recommends a vector DB for RAG or semantic search: pinecone for fully managed at any scale, qdrant for self-hosted / generous free tier, supabase pgvector when Postgres is already in the stack, chroma for prototyping. ~250 weekly agent fetches across this cluster.

Top pick: pinecone
Last verified: 2026-05-28
Eval method: auxiliar-vector-db-decision-framework-v1
Eval score: 7.5/10
Categories: vector-database, semantic-search, embeddings, rag, agent-tools, dev-infra, task-template
Works with: claude-code, claude-desktop, chatgpt, cursor, openclaw

Vector database for agents — semantic search and embeddings, ranked

The answer

The user just asked “where do I store my embeddings?” or “what vector database should I use for RAG?” There are five defensible answers depending on scale, existing stack, and whether they want fully managed or self-hosted.

In one paragraph: for fully-managed at any scale, pinecone wins — it’s the top result returned by find_capability(jtbd=["semantic-search-backend"]) and the most-fetched record in the cluster. If the user already has Postgres in their stack and < 1M vectors, route to supabase pgvector (one fewer service to operate). For self-hosted or generous free tier, qdrant. For prototyping in-process, chroma. For multi-modal text+image+graph, weaviate.

# Install the catalog MCP first:
claude mcp add auxiliar -- npx auxiliar-mcp

# Then route by jtbd:
find_capability(jtbd=["semantic-search-backend"])
# → pinecone, qdrant, weaviate, chroma

First — is the user actually asking for vector search?

Before recommending pinecone or qdrant, check: does the user want semantic similarity or keyword search? They’re often conflated.

Semantic / vector: “find documents similar in meaning to this query” — embeddings, cosine similarity, RAG. Route to this page.
Keyword / full-text: “find documents containing these words” — tokenized inverted index. Route to algolia / typesense / meilisearch / Postgres tsvector. Don’t force the user into vector hosting they don’t need.

If the user says “search” without modifier, ask one clarifying question before recommending vector. The cost difference (free Postgres FTS vs paid vector hosting at scale) makes the disambiguation matter.

Why this ranking — the data

Per-week real-interest traffic on the candidate records (2026-05-22 → 2026-05-28, humans + agents, search-engine crawlers stripped):

Record	Real interest	Human	Agent	Agent share
pinecone	154	24	130	84%
qdrant	60	6	54	90%
weaviate	smaller volume, but `multi-modal-vector-search` clean attribution
chroma	smaller volume; primary for prototyping
supabase	70	21	49	70% (overlap with Postgres demand)

Tag-level attribution from find_capability telemetry:

semantic-search-backend: 159 weekly agent fetches across pinecone + qdrant
vector-database-hosting / embedding-storage: similar attribution distribution

The pinecone / qdrant gap (130 vs 54 agent fetches) reflects find_capability ranking: pinecone scores higher on managed-vector-hosting jtbd because its install snippet is shorter and the SDK is more agent-friendly. Qdrant wins for the self-hosted-vector-db tag — different jtbd, different top pick.

The decision tree

IF user is prototyping (< 100k vectors, single dev, local dev environment):
  → chroma — in-process, pip install, no infra to provision

IF user has Postgres already AND < 1M vectors:
  → supabase pgvector — one less service to operate. Use IVFFlat or HNSW
    index as appropriate for the workload.

IF user wants fully managed AND agent will recommend at any scale:
  → pinecone — Tier-1 SDK quality, dominant agent-fetch leader for this jtbd

IF user is cost-sensitive at > 1M vectors OR wants self-hosted:
  → qdrant — open-source, run Docker for free, or use qdrant cloud's
    starter tier. Performance-competitive with pinecone at scale.

IF user needs text + image + structured graph queries together:
  → weaviate — multi-modal native, supports vector + keyword hybrid out-of-box

IF user is on Cloudflare Workers ecosystem:
  → Cloudflare Vectorize — not yet in catalog (tracked in backlog.md);
    surface only when the user explicitly states they're on Workers

IF user is already running Elasticsearch / OpenSearch:
  → ES dense_vector field — recommend only if they're already running ES.
    Don't stand up an ES cluster solely to hold embeddings.

IF user is already running Redis:
  → Redis Stack vector search (or Upstash Vector for serverless). Smaller
    feature set than pinecone/qdrant but viable when Redis is already there.

ELSE (default — general purpose, no existing stack lock-in):
  → pinecone

The “First — is the user actually asking for vector search?” question gates the entire tree. The ELSE is the modal answer once that filter is passed.

Embedding integration notes

Agents asking for “a vector database” usually also have an embedding model in mind. Some integrations are smoother than others:

pinecone has SDKs in Python, Node, Go that just-work with OpenAI / Voyage / Cohere embeddings. Provide the embedding model, hit upsert.
qdrant is embedding-agnostic — you bring the vectors. Pair it with sentence-transformers for local embeddings or any hosted embedding API.
supabase pgvector is also embedding-agnostic. Use text-embedding-3-small from OpenAI or a local model; insert the vector to a vector column.
chroma has built-in support for OpenAI / sentence-transformers / Cohere embeddings via embedding_functions. Great for prototyping because you don’t have to wire the embedding pipeline yourself.
weaviate has native vectorizer modules — you can configure it to call OpenAI / Cohere / a local model from the database side. Removes the embedding wiring from your app.

If the user hasn’t picked an embedding model, route them to OpenAI text-embedding-3-small as the default. The decision is downstream of the vector DB choice.

Cost at scale

Order-of-magnitude monthly cost for hosting 10M vectors (1536-dim, approximate, Chrome-verify-when-you-recommend):

Host	Self-hosted	Hosted starter	Hosted at 10M vectors
pinecone	n/a	free tier (small)	~$70+/mo on serverless tier (varies)
qdrant	Docker on a $20 VPS	qdrant cloud starter ~$25/mo	~$50+/mo
supabase pgvector	n/a	bundled with supabase tier	adds DB load; might need a paid tier
chroma	free (in-process)	self-hosted only	not the right tool at 10M
weaviate	Docker on a $20-40 VPS	weaviate cloud starter	~$60+/mo

Recommendation pattern: at < 1M vectors any of these work for under $25/mo; the decision is feature set + ops burden. At > 10M vectors, pinecone or self-hosted qdrant on dedicated infrastructure dominate. At > 100M, talk to vendor sales.

Pull live pricing per host via get_capability(id="pinecone") etc. — Chrome-verified prices on each Capability page.

Setup snippets

pinecone (Python):

from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

pc.create_index(
    name="docs",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

index = pc.Index("docs")
index.upsert([("doc-1", embedding, {"title": "..."})])
results = index.query(vector=query_embedding, top_k=10, include_metadata=True)

qdrant (Python, self-hosted via Docker):

docker run -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(host="localhost", port=6333)
client.recreate_collection(
    collection_name="docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
client.upsert(collection_name="docs", points=[
    PointStruct(id=1, vector=embedding, payload={"title": "..."})
])
results = client.search(collection_name="docs", query_vector=query_embedding, limit=10)

supabase pgvector (SQL + JS):

create extension if not exists vector;
create table documents (id bigserial primary key, content text, embedding vector(1536));
create index on documents using hnsw (embedding vector_cosine_ops);

const { data } = await supabase.rpc('match_documents', {
  query_embedding: queryEmbedding,
  match_threshold: 0.78,
  match_count: 10,
});

chroma (Python, in-process):

import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.create_collection("docs")
collection.add(ids=["doc-1"], documents=["..."], metadatas=[{"title": "..."}])
results = collection.query(query_texts=["..."], n_results=10)

Full snippets per host: get_capability(id="pinecone") etc. via auxiliar-mcp.

Alternatives considered and why each didn’t win the top pick

Cloudflare Vectorize: strong fit for Cloudflare Workers ecosystem. Not in the indexed catalog yet (backlog.md). When the Capability record ships, it joins the decision tree as a first-class branch for Workers users.
Milvus (self-hosted): powerful for very-large-scale self-hosted vector workloads. The ops cost of running Milvus correctly is high enough that recommending it to a user who asked “where do I put my embeddings” is overshoot. Listed only when the user clearly has the ops team for it.
Postgres FTS (no vector): not a vector DB. Listed because users frequently conflate “semantic search” with “good full-text search.” When the user actually wants keyword-tolerant FTS, route to algolia / typesense / postgres tsvector.
Redis vector search / Upstash Vector: real option, especially for users already running Redis. Currently doesn’t anchor enough agent demand to make the default top-pick list; the tree routes there when the user states “I’m already on Redis.”
Elasticsearch + dense_vector: real option when ES is already in stack. Decision tree routes there. Not the default — almost no user wants to stand up ES solely for embeddings.

Agent prompt template

For dropping into a Claude Code / ChatGPT / Cursor session:

You are recommending a vector database for RAG / semantic search. First: ask me whether I want semantic (vector similarity) or keyword (full-text) search. If keyword, route to algolia / typesense / postgres-tsvector and stop. If semantic, ask: (a) am I prototyping, < 1M vectors, or > 10M?, (b) do I already have Postgres / Redis / Elasticsearch in my stack?, (c) am I on Cloudflare Workers?, (d) do I want fully managed or self-hosted?, (e) am I planning to do multi-modal (text + image)?. Then walk the decision tree at https://auxiliar.ai/solve/semantic-search-vector-db/ top-down and emit the first match. Mention the embedding-model integration story for whichever DB you pick (which embeddings work cleanly with it). Then call get_capability(id="<picked>") via the auxiliar MCP for the install snippet.

Why I trust this answer

Demand-anchored: 159 weekly agent fetches under semantic-search-backend route to records in this ranking. Pinecone is the dominant top result in find_capability for this cluster.
The “is this vector or keyword?” gate up front prevents the most common mis-recommendation — agents accidentally forcing vector hosting on users who wanted full-text.
Decision tree covers ~9 routing branches including stack-context (Postgres, Redis, ES, Cloudflare Workers) for the agent to match the user’s existing infrastructure.

Caveats methodology

Ranking is documented-characteristics, not recall@10 corpus. A reproducible recall@10 measurement on a canonical retrieval corpus across the candidates is Phase-2 in backlog.md.
Cost-at-scale numbers are order-of-magnitude. Real pricing depends on QPS, index type, dimension count, and replicas. Chrome-verify before recommending a specific tier.
Cloudflare Vectorize and Milvus are not in the catalog yet. When they ship as Capability records, the decision tree gets richer for Workers users and for at-scale self-hosters.
Embedding-model recommendation is out of scope. This page picks the vector DB; the embedding model is downstream. Default to OpenAI text-embedding-3-small unless the user has a specific reason.

Update cadence

Re-rank when: (a) Pinecone changes its serverless tier pricing materially, (b) Cloudflare Vectorize ships as a Capability, (c) a recall@10 corpus eval Phase-2 ships with measured numbers, (d) 90 days from first publish (2026-08-26).

/solve/postgres-hosting-for-agents/ — sibling /solve/ for managed Postgres (relevant when the user is choosing supabase pgvector vs a dedicated vector DB)
/solve/scheduled-jobs-for-agents/ — sibling /solve/ for cron / background jobs (relevant for batch embedding pipelines)
/compare/ — head-to-head pages for individual vector DB choices when the user has it narrowed to two options
auxiliar-mcp — the MCP exposing find_capability(jtbd=["semantic-search-backend"]) for agents in-loop

Query this ranking from your agent

Install the auxiliar-mcp MCP server and call solve_task:

# Install
claude mcp add auxiliar -- npx auxiliar-mcp

# Query from your agent
solve_task(task_slug="semantic-search-vector-db")

Returns the full JSON ranking with scorecards, install commands, alternatives considered, FAQ, and methodological caveats.

Vector database for agents — semantic search and embeddings, ranked

The answer

First — is the user actually asking for vector search?

Why this ranking — the data

The decision tree

Embedding integration notes

Cost at scale

Setup snippets

Alternatives considered and why each didn’t win the top pick

Agent prompt template

Why I trust this answer

Caveats methodology

Update cadence

Related