How long does an integration engagement take?

Starter engagements ship in ~3 weeks. Standard ~6. Enterprise ~8. Custom platform initiatives 10-12+ weeks. AI-augmented delivery cuts the typical 8-12 week industry timeline to 3-8 weeks without cutting quality.

What is the engagement floor price?

A basic integration starts at $10,000 (one source → one target, standard fields, low volume). Multi-integration engagements start at $25,000 (3–5 integrations sharing a pattern) and scale in $25K increments — $50K, $75K, $100K+. Every engagement is delivered as a fixed-bid SOW with target-state architecture diagram, returned within 3 business days of intake.

What integration platforms does Green Dolphin support?

MuleSoft Anypoint, Dell Boomi, Workato, Oracle Integration Cloud (OIC), TIBCO, Talend, SnapLogic, Informatica, Azure Integration Services, SAP CPI, Apigee, Kong. Plus custom Java (Spring Boot), .NET, and Node.js. Plus AWS (Lambda + EventBridge), Azure (Functions + Logic Apps), and GCP (Cloud Functions + Pub/Sub).

Do you offer time-and-materials engagements?

No. All Green Dolphin engagements are fixed-bid SOWs. T&M is not a billing model offered. If scope changes mid-engagement, a written change order with a new fixed price is issued for client approval.

What managed services options are available after delivery?

10 hours/week of senior architect time, optional add-on to any fixed-bid SOW. Available in 3-month ($25K), 6-month ($48K, 4% off), 12-month ($90K, 10% off), and 24-month ($168K, 16% off) terms.

What industries does Green Dolphin work in?

Financial Services, Healthcare, Retail, Telecommunications, Aerospace & Defense, Public Sector, Logistics & Supply Chain, and Manufacturing. Including regulated environments under HIPAA, SOX, FedRAMP, GDPR, and PCI-DSS.

Vector Databases for Enterprise RAG: Pinecone, Weaviate, Qdrant, and the In-Warehouse Option

May 16, 2026

by Green Dolphin Software, Data architecture practice

Vector databases for enterprise RAG — Pinecone, Weaviate, Qdrant, in-warehouse

The vector database market got crowded fast. Pinecone, Weaviate, Qdrant, Milvus, Chroma, LanceDB, Vespa, Marqo, plus pgvector, plus the in-warehouse options inside Snowflake (Cortex Search) and Databricks (Mosaic AI Vector Search). For a buyer evaluating RAG infrastructure in 2026, the question is not "which is the best vector DB" — it is "what is the right vector layer for our architecture."

This post is the framework we use on $25K+ Data Architecture engagements when the AI roadmap requires retrieval-augmented generation. Vendor-neutral, no kickback agreements with any vendor.

The five-way decision

Five clusters of vector storage, each appropriate for a different architecture:

1. Managed dedicated vector DB (Pinecone, Weaviate Cloud, Qdrant Cloud)

Pinecone — fully managed, serverless or dedicated, mature production story, premium pricing
Weaviate — managed or self-hosted, strong hybrid-search story (BM25 + vector), modular embedding integrations
Qdrant — managed or self-hosted, Rust-based, strong performance/cost ratio, good filtered-search ergonomics

Best fit when: RAG is a first-class workload, retrieval latency matters (sub-50ms p95), you want a vendor accountable for uptime, and the cost of standing up a dedicated team for vector infra is not justified.

2. Open-source vector DB self-hosted (Milvus, Qdrant OSS, Weaviate OSS, Vespa)

Milvus — high-scale (billions of vectors), broad ANN algorithm support, Kubernetes-native
Vespa — extreme-scale serving, Yahoo-grade infra, steep learning curve
Qdrant / Weaviate OSS — easier to operate than Milvus / Vespa, similar features to their managed counterparts

Best fit when: data residency requirements forbid managed SaaS, you have a platform team that operates Kubernetes infra, or scale exceeds managed-tier economics (billions of vectors with high QPS).

3. In-warehouse vector search (Snowflake Cortex Search, Databricks Mosaic AI Vector Search)

Cortex Search — managed inside Snowflake, hybrid lexical + vector retrieval, Snowpark-friendly
Mosaic AI Vector Search — Delta-native, Unity Catalog-governed, MLflow-integrated

Best fit when: the source data already lives in your warehouse / lakehouse, you want governance + lineage + access controls aligned with the rest of your data stack, latency requirements are 100-500ms (not sub-50ms), and you do not want to duplicate data into a separate vector layer.

4. Postgres + pgvector (or AlloyDB, Aurora pgvector, Supabase, Neon)

pgvector in vanilla Postgres, plus the managed flavors above

Best fit when: total vector count is under ~10M, your team already runs Postgres, retrieval is one feature among many in an OLTP app, and you do not need horizontal scale beyond what Postgres provides.

5. Embedded local vector store (Chroma, LanceDB, FAISS)

Single-node, file-based or in-process

Best fit when: you are building a notebook prototype, a per-user local cache, or an edge inference scenario. Not appropriate for production multi-tenant enterprise RAG.

Capability comparison (the production-grade options)

Capability	Pinecone	Weaviate	Qdrant	Cortex Search	Mosaic AI VS	pgvector
Managed offering	✓	✓	✓	✓ (in Snowflake)	✓ (in Databricks)	✓ (managed PG)
Self-host option	✗	✓	✓	✗	✗	✓
Hybrid lexical + vector	partial	✓	✓	✓	✓	basic
Metadata filtering at scale	✓	✓	✓	✓	✓	depends on index
Multi-tenant isolation	namespaces	tenants	collections	schemas	catalogs	schemas/roles
Sub-50ms p95 at 10M vectors	✓	✓	✓	usually no	usually no	depends
Native governance (RBAC + audit)	API keys	enterprise tier	enterprise tier	Snowflake-native	Unity Catalog-native	Postgres-native
Native to existing data	✗	✗	✗	✓ (Snowflake)	✓ (Delta)	✓ (Postgres)
Easy embedding model swap	✓	✓	✓	model-locked	flexible	DIY

Where the in-warehouse option wins

The pattern we ship most often in 2026 is Cortex Search or Mosaic AI Vector Search sitting on top of Silver-tier data that is already cleansed, governed, and access-controlled. Three reasons:

No data duplication. Source-of-truth data stays in the warehouse. The vector index is a derived asset, not a parallel store. Governance, lineage, and access controls are unified.
Auditor-ready. "Who accessed this PHI / cardholder data" answers itself. With a separate vector DB, you are duplicating the access-control logic and probably getting it wrong.
Lower TCO at moderate scale. Below ~50M vectors with sub-100ms latency tolerance, in-warehouse pricing beats a separate vector DB once you factor in the data egress + sync infrastructure.

Where it breaks down: sub-50ms p95 requirements (consumer-facing chat with strict UX latency budgets) or hundreds of millions of vectors with high QPS. Then a dedicated vector layer earns its cost.

Where Pinecone (or Qdrant managed) wins

Three scenarios where we recommend a dedicated managed vector DB:

Latency-critical UX. Consumer chat, in-product semantic search, agent runtimes with tight tool-call budgets. Sub-50ms p95 at scale is what these vendors are built for.
Source data not in a warehouse. SharePoint, Confluence, customer support tickets — if the source-of-truth lives in SaaS, replicating it into Snowflake just to use Cortex Search is more work than indexing it into Pinecone directly.
Multi-cloud / multi-warehouse strategy. If you might move warehouses, a vendor-neutral vector layer is the right insurance against vendor lock-in.

Where Postgres + pgvector wins

Often overlooked. If your app is already on Postgres and you have under ~10M vectors, pgvector + an IVFFlat or HNSW index is enough. The simplicity payoff is real:

One database for OLTP + vectors = one connection pool, one backup strategy, one ACL model
Joins between vectors and metadata are native SQL
No new vendor relationship

Where it breaks: vector counts above ~10M with high QPS start to require careful tuning, replica strategies, and eventually a dedicated vector layer.

RAG pipeline decisions beyond the vector DB

The vector DB choice is one decision among five. The others matter more for quality:

Chunking strategy — semantic chunking (LangChain SemanticChunker, LlamaIndex SemanticSplitterNodeParser) usually beats fixed-size chunking for technical content
Embedding model — text-embedding-3-large, Cohere embed-v3, Voyage voyage-3-large, e5-mistral-7b-instruct — domain matters more than rank on a leaderboard
Retrieval strategy — hybrid (lexical + vector + reranker) beats vector-only for almost every enterprise workload
Reranker — Cohere rerank-v3.5, Voyage rerank-2, or in-warehouse equivalents — usually adds 10-30% top-k quality
Evaluation — RAGAS, TruEra, MLflow Evaluate, or custom — measure retrieval quality before declaring victory

The vector DB sits at the foundation but a bad chunking strategy or no reranker hurts RAG quality more than picking the "wrong" vector DB.

How we pick

The decision tree on a $25K+ Data Architecture engagement:

Where does the source data live? Warehouse → in-warehouse vector. SaaS / files → dedicated vector DB. Postgres app → pgvector first.
What is the latency SLA? Sub-50ms p95 → Pinecone or Qdrant managed. 100-500ms → in-warehouse. Best-effort → anything works.
What is the governance posture? Regulated → in-warehouse strongly preferred (single audit boundary). Non-regulated → optimize for cost + latency.
What is the team's skill set? Platform team that operates K8s → OSS options are viable. Lean team → managed every time.
What is the scale projection? Above ~100M vectors → start with a dedicated vector vendor; in-warehouse will hit cost walls.

Concrete next step

If the RAG infrastructure decision is upcoming, a $25K Data Architecture engagement returns a fixed-bid recommendation with:

Target-state diagram (data source → chunker → embedder → vector layer → retriever → reranker → LLM)
3-year TCO for at least two viable vector backends at your projected scale
Evaluation framework recommendation (which retrieval metrics to track and how)
Governance design that survives the choice

Start the intake. Fixed-bid SOW returned in 3 business days. See also the warehouse-side AI comparison and the broader platform-selection framework.

Our offices

Follow us