How long does an integration engagement take?

Starter engagements ship in ~3 weeks. Standard ~6. Enterprise ~8. Custom platform initiatives 10-12+ weeks. AI-augmented delivery cuts the typical 8-12 week industry timeline to 3-8 weeks without cutting quality.

What is the engagement floor price?

A basic integration starts at $10,000 (one source → one target, standard fields, low volume). Multi-integration engagements start at $25,000 (3–5 integrations sharing a pattern) and scale in $25K increments — $50K, $75K, $100K+. Every engagement is delivered as a fixed-bid SOW with target-state architecture diagram, returned within 3 business days of intake.

What integration platforms does Green Dolphin support?

MuleSoft Anypoint, Dell Boomi, Workato, Oracle Integration Cloud (OIC), TIBCO, Talend, SnapLogic, Informatica, Azure Integration Services, SAP CPI, Apigee, Kong. Plus custom Java (Spring Boot), .NET, and Node.js. Plus AWS (Lambda + EventBridge), Azure (Functions + Logic Apps), and GCP (Cloud Functions + Pub/Sub).

Do you offer time-and-materials engagements?

No. All Green Dolphin engagements are fixed-bid SOWs. T&M is not a billing model offered. If scope changes mid-engagement, a written change order with a new fixed price is issued for client approval.

What managed services options are available after delivery?

10 hours/week of senior architect time, optional add-on to any fixed-bid SOW. Available in 3-month ($25K), 6-month ($48K, 4% off), 12-month ($90K, 10% off), and 24-month ($168K, 16% off) terms.

What industries does Green Dolphin work in?

Financial Services, Healthcare, Retail, Telecommunications, Aerospace & Defense, Public Sector, Logistics & Supply Chain, and Manufacturing. Including regulated environments under HIPAA, SOX, FedRAMP, GDPR, and PCI-DSS.

Enterprise RAG Standardization: One Governed Retrieval Layer for Every Dev AI Tool

May 21, 2026

by Green Dolphin Software, AI / Integration practice

Enterprise RAG standardization — one governed retrieval layer for every dev AI tool

Every dev AI tool ships its own retrieval. Claude calls one thing, Cursor calls another, ChatGPT does whatever a user pastes into it. In a small engineering org that is fine. At enterprise scale, "every tool retrieves its own way" is the single most expensive architectural mistake we see in 2026 enterprise AI rollouts.

This post is the vendor-neutral playbook for fixing it: one governed retrieval substrate that every dev AI tool routes through, regardless of which tool the developer picks. Drawn from production engagements where iPaaS platforms (MuleSoft Anypoint, Workato, Dell Boomi, Oracle Integration Cloud) have become the AI-data-plane backbone.

The problem we keep solving

Without a standard, enterprises end up with this picture:

Each AI tool implements its own retrieval against source systems (GitHub, Confluence, Jira, internal wikis, ticketing).
Developers paste sensitive content into chat windows to compensate when retrieval is poor.
Local vector stores accumulate on workstations — uncontrolled IP sprawl, zero audit.
Same question, different tools, different answers — the team loses trust in any of them.
Every new tool re-implements the same source connectors. Three tools, three integrations, no reuse.

The risks scale with the team. Audit fails because nobody knows what enterprise IP got indexed where. Compliance teams ban the tools to be safe, costing the productivity gains the tools were bought for in the first place.

Goals for a real standard

A useful enterprise RAG standard has four properties:

One canonical retrieval path for engineering knowledge (and other corpora later) across every AI tool.
Governance enforced at the integration tier — auth, audit, rate-limiting, data classification, retention.
Tool choice preserved — developers keep using Claude, Cursor, ChatGPT, GitHub Copilot, whatever wins the productivity argument.
No local RAG stores on workstations — by policy AND by ergonomics. If the centralized path is faster and better than DIY, the policy enforces itself.

What is explicitly out-of-scope for the first standard: customer / regulated data (HIPAA, PII, PCI-scoped). Those belong in a separate workstream with BAA-covered vector stores and de-identification pipelines. Engineering knowledge only is the right opening move because the risk surface is lower and the win is faster.

Why an iPaaS as the backbone

The retrieval substrate is best built on top of an existing integration platform — MuleSoft Anypoint, Workato, Dell Boomi, Oracle Integration Cloud, or whichever iPaaS your platform team already operates. The platform brings:

Source connectors to the systems that hold the knowledge (GitHub, Confluence, Jira, Salesforce, SharePoint, internal docs).
API platform with OAuth, rate-limiting, structured logging, audit retention.
Versioned flows or recipes so the retrieval logic is reviewable, testable, and reversible.
A single observability surface — one place to see which AI tool queried what, when, by whom.
Existing operations muscle — the team that runs the iPaaS already does incident response, on-call, change management.

What the iPaaS does NOT give you natively (in 2026): vector storage, embedding generation, MCP protocol support, semantic ranking. Those come from companion tools (managed vector DBs, embedding APIs, a thin MCP wrapper). The iPaaS is the spine; the AI-specific components are the limbs.

The target topology (vendor-agnostic)

   Dev AI tools (Claude, Cursor, GPT, GitHub Copilot, ...)
                          |
                          | MCP / native protocol
                          v
        Thin MCP server (wraps the iPaaS HTTP API)
                          |
                          | HTTPS + OAuth
                          v
              Enterprise iPaaS (audit + auth + rate-limit)
                          |
              +-----------+-----------+
              |                       |
              v                       v
       Source systems       Vector DB + embedding API
       (GitHub, Confluence,  (Pinecone / Vectara /
        Jira, Salesforce,    Azure AI Search /
        SharePoint, ...)     Bedrock KB / pgvector)

The MCP server is a thin shim (a few hundred lines of code) so dev AI tools can speak their native protocol (Model Context Protocol) while the iPaaS speaks plain HTTPS. The interesting architectural choice is what lives to the right of the iPaaS.

Three options for the right-of-iPaaS layer

Option A: Live Retrieval Gateway (no vector store)

The iPaaS translates every AI query into a live call against the source system's native search API.

Pros: always fresh (no staleness window), source ACLs are authoritative (no parallel permission model), lowest cost (no embeddings, no vector DB), fastest to ship (weeks not quarters).

Cons: latency (every query hits live source APIs), no semantic search (limited to source-system search quality), source rate limits become user-facing, multi-source fan-out logic lives in iPaaS flows.

Fit: strong pilot. Proves the architectural surface — devs hit the iPaaS instead of going direct — without committing to a vector store.

Option B: Pipeline + Managed Vector Store

The iPaaS schedules ingestion against source systems, calls an embedding API, writes embeddings to a managed vector DB. Retrieval queries hit the vector DB through the iPaaS.

Pros: semantic + hybrid retrieval, predictable latency, centralized index for data classification + audit, consistent quality bar across sources.

Cons: embedding + vector DB recurring cost, staleness window between ingest cycles, ACL inheritance requires mirroring source permissions into the index, vector DB choice creates a lock-in surface.

The vector DB shortlist depends on cloud commitments:

Candidate	Fit shape	Lock-in	Cost shape
Pinecone	Managed-vector pure-play, mature SDK + ops	Vendor-only	Per-pod or serverless
Vectara	Managed RAG-as-a-service (ingest + embed + rank)	Vendor-only	Per-query + storage
Azure AI Search	Azure-native shop	Cloud-only	Per-tier
AWS Bedrock KB	AWS-native shop, Bedrock model coupling	Cloud-only	Per-query + storage
pgvector in Postgres	Already operate a suitable Postgres	Portable	Effectively $0 to start

The right pick depends on five axes: primary cloud, query volume, hybrid-search needs (vector + keyword), operational appetite (managed SaaS vs in-our-cloud), and iPaaS connector ergonomics. We do NOT recommend picking the vector DB on day one — it is the kind of decision that survives a real pilot.

Option C: Hybrid Routing

The iPaaS routes each incoming AI query to the right backend: vector store for semantic recall, live source for fresh or structured lookups (issue numbers, PR numbers, exact-match ticket IDs).

Pros: best developer experience (right tool for each query type), vector DB stays small (only what benefits from embeddings), fresh data for issue/PR lookups + semantic recall for code/docs, the iPaaS is the natural router since routing is what it was built for.

Cons: more complexity in the flow/recipe layer (routing rules, query classifier), two systems to operate behind one surface, risk of "neither well" if routing rules are unclear.

Fit: the mature end-state. Earns its complexity only after Option A or B has been load-bearing for a quarter.

Comparison matrix

Axis	Live Gateway	Indexed Vector	Hybrid
Time to value	weeks	quarter	quarter + delta
Build complexity	low	medium	high
Recurring cost	low	medium-high	medium-high
Semantic retrieval quality	weak	strong	strong
Data freshness	live	lagged	mixed (best-of)
ACL inheritance	natural	needs mirroring	mixed
Lock-in surface	none	vector DB	vector DB
Governance posture	strong	strong	strong

All three score the same on governance — that is the point. The iPaaS is the audit / auth gate regardless of which backend serves the query.

Recommended phased approach

Going straight to Option B or C is the most common mistake. You pay for a vector store before you know what to index, then optimize the wrong things. The right shape is phased:

Phase 1 (this quarter): ship the Live Gateway. Two or three source connectors, the iPaaS endpoints with OAuth + audit, MCP server pushed to dev workstations. Goal: prove adoption and collect 90 days of usage data.

Phase 2 (next quarter): add the indexed path for the highest-value sources. Pick the vector DB based on Phase 1 usage signals, not pre-pilot hypotheticals. Keep live retrieval for tickets and dynamic data.

Phase 3 (six-plus months out): formalize the hybrid router. Once you know which query types benefit from semantic vs live, codify the routing rules. Now you have a real platform.

Governance: three concentric controls

   1. Make the iPaaS path the easiest path
      (best retrieval, lowest friction, no setup)
   2. Push the MCP server config to every workstation
      (MDM or dotfile baseline; no per-dev setup)
   3. Network policy blocking direct egress from dev tools
      to source APIs (last resort, after carrot)

Lead with ergonomics, not enforcement. If the centralized path is genuinely faster and produces better answers, most developers will switch on their own. Network policy is the last lever, not the first.

Audit data comes for free: every query through the iPaaS is logged with user, source, latency, result count. That log stream feeds the quarterly governance review and the post-incident forensics if something goes wrong.

What to do next

If you are running multiple dev AI tools across multiple engineering teams and have no shared retrieval substrate, the cost of inaction compounds weekly. Every new tool added makes the eventual standardization harder.

A scoped engagement to design + ship Phase 1 typically takes 6-8 weeks, uses the iPaaS you already operate, and produces:

iPaaS flows / recipes for 2-3 source connectors
iPaaS API endpoints with OAuth + rate-limit + audit
A thin MCP server (a few hundred lines, fits in a small companion repo)
Audit dashboard
Pilot with one engineering team

A vector-DB / embedding-model evaluation runs as a Phase 2 spike in parallel — you do not have to wait for it to start.

If this is the decision in front of you, our Architecture & Design engagement is $25K, ~2-3 weeks, design-only — we hand you a fundable target-state architecture, source-connector inventory, vector-DB eval criteria, and a 90-day phased roadmap. Fixed bid, no time-and-materials, no vendor kickback agreements.

Start a 6-step intake and we will return a fixed-bid SOW within 3 business days. See also the related vector database framework and the broader AI-enabled integration playbook.

Our offices

Follow us