which vector database should i use for production rag

For self-hosted production RAG with cost constraints, use Qdrant. For managed offerings where operations are not a concern, choose Pinecone. For multi-tenant SaaS with built-in generative modules, select Weaviate.

how much does qdrant cost compared to pinecone

At 10 million vectors with moderate query load, self-hosted Qdrant with binary quantization costs around 40 euros per month on Hetzner, while Pinecone serverless costs 150-500 euros per month. For 1 million vectors with light queries, self-hosted Qdrant is effectively free if you already rent a VPS.

what is binary quantization in vector databases

Binary quantization is a compression technique that reduces memory usage by roughly 32x with minimal recall loss (typically 2-5 percent drop). This allows a 10M vector index that normally requires 60GB of RAM to fit in under 2GB, making it a key feature for cost-sensitive deployments.

how do you migrate vectors from pinecone to another database

Migration involves exporting vectors with their metadata using the source database's list-IDs or scroll endpoint, then upserting them in batches to the target database. The main challenges are rewriting retrieval SDK code for new query syntax, reshaping metadata to match the target format, and re-validating recall on your evaluation set, typically taking 2-5 engineering days.

Qdrant vs Pinecone vs Weaviate: Production Vector DB Comparison 2026

April 3, 2026 · 16 min read · vector-database, qdrant, pinecone, weaviate, rag

Three vector databases keep showing up on every RAG stack in 2026: Qdrant, Pinecone, and Weaviate. I get asked which one to pick at least once a week, usually by someone who already spent two days reading benchmarks and still has no answer.

The short version, because you have real work to do: Qdrant for most self-hosted production RAG in 2026. Pinecone when the requirement is “managed, don’t touch the servers”. Weaviate when you need the extra primitives like GraphQL or the module ecosystem. I run Qdrant in production for Teedian and recommend it to most consulting clients. The reasons are below, including the edge cases where I pick something else.

This guide is a head to head qdrant vs pinecone vs weaviate comparison, with a feature matrix, realistic performance numbers, and a cost model. It is not a generic listicle. If you want the 30 second answer, read the next section and close the tab.

Verdict up front

Self-hosted, cost sensitive, EU data residency, single team: Qdrant. One binary, excellent filtering, binary quantization, runs on a €20/mo Hetzner box up to a few million vectors.
Want managed, don’t want to think about ops, budget is flexible: Pinecone. The serverless tier is the least hands-on vector DB on the market.
Need a module ecosystem (generative, rerankers, multi-tenant SaaS out of the box), OK with GraphQL: Weaviate.
Already on Postgres, under 1M vectors, query volume modest: pgvector. Don’t add a new database.
Over 100M vectors, billion scale, dedicated ML platform team: Milvus.

Everything else is details. If your situation matches one of those lines, you can stop reading and start building. If you want the reasoning, keep going.

The three on the table

Qdrant is a Rust vector database, open source (Apache 2.0), self-host or managed via Qdrant Cloud. Single binary, no external dependencies, native hybrid search, scalar and binary quantization. Started shipping in 2021 and has become the default self-hosted choice for teams that want predictable performance without running a cluster babysitter.

Pinecone is fully managed, closed source, US company. The serverless tier (Pinecone v3) is the flagship now: you upsert vectors, you query, you pay per read and per gigabyte stored. Zero infrastructure exposure. You cannot self-host it, you cannot run it in your VPC in most plans, and that is the whole point.

Weaviate is a Go-written open source vector DB (BSD 3-clause). GraphQL-first API, modular architecture with official modules for generative-openai, text2vec-cohere, reranker-cohere, ref2vec, and more. Multi-tenancy is first class. Self-host or Weaviate Cloud Services.

Also worth naming briefly:

Milvus (Zilliz): scale monster, runs on Kubernetes, overkill for under 10M vectors but the right answer at billions.
Chroma: developer-first, great for notebooks and prototypes, I would not put it in production yet for anything serious.
pgvector: a Postgres extension. If you already run Postgres and have fewer than about 1M vectors, this is the answer 80% of the time.

Feature comparison matrix

Feature	Qdrant	Pinecone	Weaviate
Hosting	Self-host + Cloud	Cloud only	Self-host + Cloud
Open source	Yes (Apache 2.0)	No	Yes (BSD 3)
Written in	Rust	Closed	Go
Max vector dim	65,536	20,000	65,535
Distance metrics	Cosine, Dot, Euclidean, Manhattan	Cosine, Dot, Euclidean	Cosine, Dot, L2, Hamming
Metadata filtering	Rich (nested, range, geo, full-text)	Solid (eq, in, range)	Rich (GraphQL where)
Hybrid search (dense + sparse)	Native	Native	Native (BM25 + vector)
Scalar quantization	Yes	Managed	Yes
Binary quantization	Yes (up to 32x compression)	Managed	Yes (experimental)
Product quantization	Yes	Managed	Yes
Replication / sharding	Yes	Managed	Yes
Snapshots / backup	First class	Managed	Yes
RBAC	Yes (API keys, JWT)	Yes	Yes
Multi-tenancy	Collection per tenant, native since 1.8	Namespaces	First class, per-tenant isolation
Air-gap / on-prem	Yes	No	Yes
Client SDKs	TS, Python, Rust, Go, Java, .NET	TS, Python, Java, Go	TS, Python, Go, Java

A few of these rows deserve more than a one-liner, and I cover them in the deep dives below.

Performance at realistic scales

I get suspicious when vendors publish benchmarks, so take what follows as order of magnitude, not gospel. Numbers below match what I see in production and what independent benchmarks (ann-benchmarks, VectorDBBench) have shown repeatedly.

1 million vectors, 1536 dimensions (OpenAI ada-002 / text-embedding-3-small shape), top-10 recall: all three hit sub-10ms p95 latency on a single modest node (8 vCPU, 16GB RAM). At this scale the database choice is not the bottleneck, your embedding model is. Don’t over-optimize.

10 million vectors, same shape: Qdrant and Milvus scale the hardest with the fewest foot-guns. Pinecone is an abstraction, the answer is “it works, you pay more”. Weaviate works but expect to tune HNSW parameters and watch memory more carefully.

100 million vectors plus: Milvus territory. Qdrant can do it with cluster mode and binary quantization. Pinecone does it but cost climbs. Weaviate is possible with sharding but not the first pick.

Ingestion throughput: Qdrant with batched upserts pushes over 5,000 vectors/second on a 4 vCPU box without breaking a sweat. Pinecone serverless batches up to 100 vectors per request and throughput is rate-limited by the managed tier. Weaviate is in the same order as Qdrant on similar hardware.

Memory footprint with quantization: this is where Qdrant’s binary quantization earns its keep. For high-dimensional embeddings and approximate top-k search with rescoring, binary quantization gives roughly 32x memory reduction with minimal recall loss (typically 2-5% recall drop). A 10M vector index that wants 60GB of RAM in full precision fits in under 2GB with binary quantization. That is the difference between a €40/mo VPS and a €300/mo one.

Filter performance: this is where real systems differ from benchmarks. Many vector DBs have great raw ANN performance but collapse when you add metadata filters (tenant_id, doc_id, category). Qdrant was built around filter-first search from the start and handles filtered queries without the typical 10-50x slowdown you see elsewhere. If your RAG involves per-user or per-tenant filtering (and it almost always does in production), test this specifically.

Qdrant deep dive

Why Qdrant wins for most self-hosted production RAG:

Single binary, no external deps. No Kafka, no Zookeeper, no Etcd. docker run qdrant/qdrant and you have a vector DB. I cannot overstate how much this matters when you are debugging production at 2am.
Filter-first search. Metadata filters do not tank latency. In a benchmark I ran on 5M product vectors with a tenant_id filter, Qdrant stayed under 15ms p95 while another popular DB went from 8ms to 180ms.
Binary quantization. Free 32x memory compression, tunable rescoring for accuracy recovery. The killer feature for cost-sensitive deployments.
Snapshots. First class. POST /collections/{name}/snapshots, restore from S3, done. Backup story is solved.
Hybrid search. Native sparse and dense in the same query. Works with SPLADE, BM25-style sparse vectors, and dense embeddings side by side.
Deployment. Docker, Kubernetes Helm chart, static binary. Runs on 1GB RAM for small indexes. Scales to multi-node clusters with replication when you need it.
Qdrant Cloud. If you want managed, their hosted offering starts cheap (around $0.014/hour for tiny) and scales. I still mostly self-host.

Weak spots, because nothing is perfect:

Fewer generative/rerank modules than Weaviate. If you want “vector DB plus built-in OpenAI generation”, Qdrant does not ship that. You write the generation layer yourself (which I prefer anyway, one less leaky abstraction).
Smaller community than Pinecone in raw numbers. Documentation is excellent but if you want 500 Stack Overflow answers, Pinecone wins.
Multi-tenancy is solid but Weaviate is still a step ahead for SaaS isolation patterns.

Install and run locally for a test:

docker run -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant

That is your development vector DB. For production I run it under systemd on a Hetzner VPS with a volume for persistence. Related: Hetzner vs AWS for AI workloads.

Pinecone deep dive

Pinecone’s pitch is simple: you never deal with infrastructure. For a lot of teams that is worth real money.

Pinecone v3 serverless is the current flagship. Pay per read, per write, per gigabyte stored. No pod sizing, no capacity planning, no “oh we need to resize the index”. You upsert, you query, you get billed monthly. This is the best managed vector DB UX on the market.

Pinecone Pods (the legacy offering) is still available and gives you more predictable pricing if your query volume is steady. You pick a pod size, you pay that per hour regardless of usage.

Where Pinecone is strong:

Zero ops. No servers, no backups, no scaling decisions.
Solid metadata filtering with a clean query syntax.
Hybrid search (sparse-dense) is supported.
SDK is mature, documentation is abundant, Stack Overflow has answers.
SOC 2, HIPAA available on higher plans.

Where Pinecone hurts:

Closed source, vendor lock-in. If you want out later, you export vectors and re-upsert somewhere else. Not the end of the world but a real cost.
Serverless pricing is unpredictable. Your bill can move 5x in a month if query patterns shift. Budget accordingly.
No on-prem, no air-gap, no VPC deployment except on their most expensive tiers.
EU data residency is available but fewer regions than AWS or Hetzner give you natively.
Cannot run it in a local dev container. Every integration test hits their API or a mock.

I use Pinecone on two client projects where the non-negotiable was “we don’t have DevOps, and we won’t hire for it”. That is a real requirement, Pinecone solves it well.

Weaviate deep dive

Weaviate is the most feature-rich of the three. It is also the most complex.

GraphQL first. Every query goes through GraphQL. If you love GraphQL, you will love this. If you don’t, you will tolerate it. They shipped a gRPC/REST option more recently, but GraphQL is still the canonical path.

Module ecosystem. This is Weaviate’s superpower. Official modules:

text2vec-openai, text2vec-cohere, text2vec-huggingface: auto-vectorize text on upsert.
generative-openai, generative-anthropic, generative-cohere: ask Weaviate to run the LLM call for you and return the generated answer. RAG in one query.
reranker-cohere, reranker-transformers: rerank results in-database.
ref2vec-centroid: reference-based recommendations.

If you want “vector DB that does the whole RAG pipeline”, Weaviate gets closer than anyone. Whether that abstraction is the right call is your judgment. I prefer composing the pipeline myself (see my RAG pipeline tutorial), but I understand the pitch.

Multi-tenancy. Weaviate has the best per-tenant isolation story of the three. For SaaS apps where you want each customer’s vectors in a logically separate space with independent HNSW indexes, this is the one to pick.

Hybrid search. BM25 plus vector in a single query, tunable alpha weight. Works well.

Weak spots:

More moving parts. Self-hosting a production Weaviate cluster is more involved than Qdrant. Expect Kubernetes if you scale past a single node.
GraphQL adds ceremony. Simple queries get verbose.
Resource-heavier than Qdrant for the same workload in my tests.
Module config lives in database config, not application code, which some teams find awkward.

Self-hosted vs managed: the real cost math

This is where most comparison posts wave their hands. Real numbers, based on 2026-04 pricing (verify before committing, this moves).

Scenario A: 1 million vectors, 1536 dim, light query load (10k queries/day)

Option	Rough monthly cost
Qdrant self-host on Hetzner CX32 (4 vCPU, 8GB RAM)	€13
Qdrant self-host on Hetzner CCX13 (dedicated 2 vCPU, 8GB)	€24
Qdrant Cloud (smallest production tier)	~$50
Pinecone serverless (1M vectors, 10k queries/day)	~$40-60
Weaviate Cloud (sandbox to starter)	~$25-60

At this scale, self-hosting Qdrant is effectively free (you’re already paying for a VPS for something). The Hetzner box also runs your app, Redis, and Postgres. Setup is 1-2 hours if you have done it before.

Scenario B: 10 million vectors, 1536 dim, moderate query load (500k queries/day)

Option	Rough monthly cost
Qdrant self-host on Hetzner CCX23 (4 vCPU, 16GB RAM) + binary quantization	€40
Qdrant Cloud production cluster	~$200-400
Pinecone serverless	~$150-500 depending on query mix
Weaviate Cloud production	~$200-500

Self-hosted Qdrant with binary quantization is the clear cost winner, roughly 5-10x cheaper than any managed option. The tradeoff is that you are responsible for uptime.

Scenario C: 100 million vectors, enterprise query load

At this scale you are in managed cluster territory regardless of vendor. Managed Qdrant and Pinecone are in the $2-10k/month range depending on query volume and replication. Self-host Qdrant works but you will want a platform engineer watching it.

The breakeven: self-hosted Qdrant dominates economically up to about 20M vectors with moderate query volume, assuming you have the ops capacity. Above that, the engineering cost of running it starts to approach the price of managed.

Pricing snapshot

As of 2026-04 (always verify on vendor sites, these change):

Vendor	Entry point	Scaling model
Qdrant self-host	€5-50/mo VPS	Linear with VPS size
Qdrant Cloud	~$0.014/hr tiny cluster	Cluster size + replicas
Pinecone serverless	Pay per use, ~$40 minimum	$0.15/M reads, $2/M writes, $0.33/GB/mo storage
Pinecone Pods (legacy)	~$70/mo smallest pod	Per pod per hour
Weaviate Cloud	$25/mo sandbox	Per cluster size
Weaviate self-host	VPS cost only	Linear with nodes

Compare this against LLM API cost for the rest of your RAG stack when you budget.

Integration ergonomics

How the SDKs feel when you actually write the code.

Qdrant TypeScript SDK:

import { QdrantClient } from "@qdrant/js-client-rest";

const client = new QdrantClient({ url: "http://localhost:6333" });

await client.upsert("docs", {
  points: [
    { id: 1, vector: embedding, payload: { tenant_id: "acme", doc_id: "abc" } }
  ]
});

const results = await client.search("docs", {
  vector: queryEmbedding,
  filter: { must: [{ key: "tenant_id", match: { value: "acme" } }] },
  limit: 10
});

Clean, typed, fast. The filter DSL is expressive without being baroque.

Pinecone TypeScript SDK:

import { Pinecone } from "@pinecone-database/pinecone";

const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pc.index("docs");

await index.upsert([
  { id: "1", values: embedding, metadata: { tenant_id: "acme", doc_id: "abc" } }
]);

const results = await index.query({
  vector: queryEmbedding,
  filter: { tenant_id: { $eq: "acme" } },
  topK: 10,
  includeMetadata: true
});

Fine, slightly older-feeling API shape, does the job.

Weaviate TypeScript SDK (v3):

import weaviate from "weaviate-client";

const client = await weaviate.connectToLocal();
const docs = client.collections.get("Docs");

await docs.data.insert({
  properties: { tenantId: "acme", docId: "abc" },
  vectors: embedding
});

const results = await docs.query.nearVector(queryEmbedding, {
  filters: docs.filter.byProperty("tenantId").equal("acme"),
  limit: 10
});

Verbose but readable. The v3 client removed a lot of the GraphQL ceremony from v2.

All three have LangChain, LlamaIndex, and Haystack integrations. The SDK difference is maybe 30 minutes of developer time to learn each. Not a deciding factor.

Production reliability

Qdrant self-hosted with one replica and daily snapshots to S3: 99.9% is straightforward. I run this for Teedian. Systemd unit, health check, automated snapshot to B2, restore tested quarterly. No drama in 14 months.

Pinecone managed: SLA depends on plan. Starter has no SLA. Standard is 99.9%. Enterprise is 99.95%. In practice I have seen one 45 minute outage in two years of use across three client projects. That is acceptable for most non-finance applications.

Weaviate self-hosted multi-node: possible but more Kubernetes-heavy than Qdrant. Expect to invest in proper observability (Prometheus, Grafana) and a platform engineer to babysit it at scale. Weaviate Cloud offloads this for a price.

The reliability question is not “which is more reliable” but “where does the reliability work happen”. With Pinecone, it happens inside their company and you pay for it. With Qdrant or Weaviate self-hosted, it happens inside yours. Neither is free.

Migration between them

Moving between vector DBs is easier than moving between SQL databases. The data is simple: vectors plus metadata plus IDs. What actually breaks:

SDK changes in the retrieval layer. Every DB has its own query syntax. Expect to rewrite the 100-300 lines of your RAG retrieval code.
Metadata format differences. Pinecone metadata is shallow key-value. Qdrant supports nested payloads. Weaviate has strict schemas. You may need to flatten or reshape.
Filter syntax. Your filter expressions will need rewriting.
Hybrid search tuning. Sparse vector formats differ. You will re-tune alpha weights.

Rough migration time from Pinecone to Qdrant for a production RAG app (1-10M vectors, one tenant filter, hybrid search): 2-5 engineering days including testing. The hard part is not the export, it is re-validating recall on your eval set.

Export pattern: iterate through IDs, fetch vectors and metadata, upsert into the target DB in batches of 500-1000. Every vendor has a list-IDs or scroll endpoint that handles this. Budget a few hours for the script plus a day of re-ingestion on realistic hardware.

My choice and why

I run Qdrant self-hosted on Hetzner for Teedian, which is my production AI content operations engine. The reasons, in order:

EU data residency. My customers are in DACH. Hetzner is in Germany and Finland. Done.
Cost. €20/mo for a box that runs Qdrant, Redis, and a Go service. Pinecone would be $150+ for the same workload.
Filter performance. Tenant isolation via metadata filter is the dominant query shape. Qdrant does this without slowdown.
Binary quantization. I’m over 2M vectors now and the quantized index is under 800MB of RAM. Without it I’d be renting more box.
Snapshots. Automated snapshot to Backblaze B2 every night, tested restore. Solved backup problem.
Open source. If Qdrant’s company goes sideways, I have the binary and the data. Vendor lock-in risk is near zero.
Single binary, single dep. Fewer things that can break at 3am.

I also run Pinecone on two client projects. In both cases the client said “we don’t want any infrastructure, point blank”. Pinecone serverless solves that. It costs them $80-200/mo each and they sleep fine. That is a fair trade.

I have not shipped Weaviate to production for myself. I evaluated it for a multi-tenant SaaS proof of concept last year. The per-tenant isolation was genuinely better than Qdrant’s at the time, but the team didn’t want to run a Kubernetes cluster for a vector DB, so we went with Qdrant and built tenant isolation via collection prefixes. It worked.

Decision framework

Run down these questions in order and take the first “yes”:

Do you already run Postgres and have under 1M vectors? Use pgvector. Don’t add a database.
Do you need zero infrastructure exposure and budget is flexible? Pinecone serverless.
Are you building multi-tenant SaaS with per-customer isolation and want built-in generative modules? Weaviate.
Are you at 100M+ vectors with a dedicated platform team? Milvus.
Everything else in self-hosted or hybrid production RAG? Qdrant.

That covers maybe 95% of the decisions I see. The remaining 5% are edge cases (GPU-accelerated search, specific compliance frameworks, geographic constraints) and those need a real conversation, not a guide.

If you’re building agents on top of this, see my production AI agent architecture guide for where the vector DB fits in the broader stack. For the retrieval pipeline itself, the RAG pipeline tutorial walks through the full flow. If you want a deeper argument on why most vector DB decisions are over-thought, read your vector database decision is simpler than you think.

What to test before committing

Do not pick a vector DB from a blog post. Do not pick one from this blog post. Run the test.

Pre-commitment checklist:

Load 10,000 representative vectors from your actual data (not synthetic).
Run your top-20 real queries (not random vectors).
Measure p95 latency with your real metadata filters applied.
Measure recall against a ground-truth set of at least 100 queries.
Test backup and restore at least once. Yes, actually restore it.
Load test at 2x your projected peak query rate for 10 minutes.
Project cost at 6-month and 12-month data volume, not current.
Try upgrading to the next version. Breaking changes exist.
Run it inside your docker-compose stack. See my docker-compose AI development stack for a reference setup.
If self-hosting: confirm you can redeploy from scratch in under 30 minutes.

This checklist takes two days. It will save you two months.

Fixed price and milestones — or a clear no with reasons.

From pilot to production

Running an AI pilot that is not production-ready yet? That is exactly what I do: audit, fixed-price scope, delivery in 2–6 weeks.

Make your AI pilot production-ready → Production audit ($1,900 fixed)

Qdrant vs Pinecone vs Weaviate: Production Vector DB Comparison 2026

Verdict up front

The three on the table

Feature comparison matrix

Performance at realistic scales

Qdrant deep dive

Pinecone deep dive

Weaviate deep dive

Self-hosted vs managed: the real cost math

Pricing snapshot

Integration ergonomics

Production reliability

Migration between them

My choice and why

Decision framework

What to test before committing

Before you go —

Almost there

Qdrant vs Pinecone vs Weaviate: Production Vector DB Comparison 2026

Verdict up front

The three on the table

Feature comparison matrix

Performance at realistic scales

Qdrant deep dive

Pinecone deep dive

Weaviate deep dive

Self-hosted vs managed: the real cost math

Pricing snapshot

Integration ergonomics

Production reliability

Migration between them

My choice and why

Decision framework

What to test before committing

Scope my automation in 24h

Request received