Vector Databases and the API Layer

Key takeaway: Production RAG systems need both vector search for unstructured data and SQL queries for structured data. The API layer is where you unify access to both, routing queries to the right backend while enforcing consistent authentication and governance.

What Vector Databases Do (and Don't Do)

A vector database is a storage system optimized for high-dimensional embeddings—numerical representations of text, images, or other unstructured data. When you convert a document into an embedding using a model like OpenAI's text-embedding-ada-002, a vector database indexes that embedding so you can later retrieve it by semantic similarity rather than exact keyword match.

The major vector databases in production today include Pinecone (fully managed, cloud-native), Weaviate (open-source, supports hybrid search), Qdrant (open-source, Rust-based), Milvus (CNCF project, designed for scale), and pgvector (a PostgreSQL extension that adds vector similarity search to an existing relational database). Each occupies a different point on the managed-vs-self-hosted and specialized-vs-integrated spectrum.

Vector databases excel at one thing: given a query embedding, return the nearest neighbors from a corpus. They support approximate nearest neighbor (ANN) algorithms like HNSW and IVF that trade marginal accuracy for significant speed gains at scale.

What vector databases do not do is replace your relational database. They don't handle transactions, enforce referential integrity, support complex joins, or manage the structured business data that drives most enterprise applications. A vector database can tell you which support tickets are semantically similar to a new query. It cannot tell you the ticket's SLA deadline, the customer's contract tier, or whether the assigned engineer is on PTO. That data lives in SQL.

Structured + Unstructured: The Hybrid RAG Pattern

Retrieval-augmented generation (RAG) is the dominant pattern for grounding LLM responses in enterprise data. The basic pipeline is straightforward: embed a user query, search a vector database for relevant context, and pass that context to the LLM alongside the original question. But most production RAG systems outgrow this simple pattern within weeks.

The reason is that real business questions span both unstructured and structured data. Consider an internal AI assistant for a SaaS company. A sales rep asks: "What did the customer say about our pricing in their last support ticket, and what's their current contract value?" The first half requires vector search across ticket transcripts. The second half requires a SQL query against the billing database.

This is the hybrid RAG pattern. The AI application determines that the query needs data from two sources, issues parallel requests—one to a vector search endpoint, one to a SQL API—and combines the results into a single context window for the LLM. The structured data grounds the response with facts. The unstructured data provides nuance and detail.

Without both sources, the AI either hallucinates structured facts (contract values, dates, status codes) or provides sterile answers that lack the qualitative context decision-makers need. The hybrid pattern is not an advanced architecture. It is the baseline for useful enterprise RAG.

API Architecture for Hybrid Data Access

The cleanest way to implement hybrid RAG is to put both data sources behind APIs and let the AI application (or an orchestration layer) decide which to call. The architecture has three tiers.

First, the AI application tier. This is your LLM orchestration framework—LangChain, LlamaIndex, a custom agent, or an autonomous AI agent. It receives a user query, classifies the intent, and determines which data sources are needed. For hybrid queries, it issues parallel requests.

Second, the API gateway tier. This handles authentication, rate limiting, logging, and routing. Every request from the AI application passes through the gateway regardless of the backend. This gives you a single control plane for access policies, audit trails, and usage monitoring across both vector and relational data sources.

Third, the data tier. Vector databases expose their own REST or gRPC APIs (Pinecone's REST API, Weaviate's GraphQL endpoint, Qdrant's gRPC interface). Relational databases are exposed through auto-generated REST or OData APIs. The AI application doesn't connect to databases directly. It calls APIs.

This separation matters for security and governance. Direct database connections from AI applications create ungoverned access paths. APIs create auditable, policy-enforced access paths. When a compliance team asks "what data did the AI access and when," the API gateway's logs provide the answer.

A practical implementation typically looks like this: the AI agent calls a vector search endpoint to retrieve semantically relevant documents, extracts entity identifiers from those documents (customer IDs, order numbers, product SKUs), then calls the SQL API with those identifiers to fetch current structured data. The LLM receives both the narrative context and the factual data.

Pairing DreamFactory with Vector Search

DreamFactory is an API generation platform that auto-generates REST APIs from relational databases including SQL Server, PostgreSQL, MySQL, Oracle, and Snowflake, with built-in role-based access control, API key management, rate limiting, and request logging. It handles the SQL side of hybrid RAG architectures.

DreamFactory does not manage vector databases directly—and that is fine. Vector databases have their own well-designed APIs. The architecture works best when each backend is served by the tool purpose-built for it: DreamFactory for governed SQL access, and the vector database's native API (or a thin wrapper) for embedding search.

The integration point is the AI orchestration layer. A LangChain agent, for example, can be configured with two tools: one that calls DreamFactory's auto-generated REST API for structured queries, and another that calls Pinecone or Weaviate for semantic search. DreamFactory's role is ensuring the SQL side has the same security posture as the vector side—authenticated, rate-limited, field-level access controls, and full audit logging.

For teams using pgvector, the overlap is tighter. Since pgvector runs inside PostgreSQL, DreamFactory can expose the relational tables in the same database where vector embeddings are stored. The vector similarity search itself still goes through pgvector-specific SQL (using the <-> operator for cosine distance), but the surrounding structured queries—fetching metadata, joining related tables, filtering by business attributes—go through DreamFactory's generated API.

Choosing a Vector Database for Enterprise RAG

Pinecone is the most straightforward option for teams that want fully managed infrastructure. It handles indexing, scaling, and replication without operational overhead. The tradeoff is vendor lock-in and cost at scale. If your priority is speed to production and your embedding corpus is under a few hundred million vectors, Pinecone is a reasonable default.

pgvector is the right choice for teams already running PostgreSQL who want to avoid introducing a new data store. It supports HNSW and IVFFlat indexes, handles millions of vectors comfortably, and keeps your operational footprint small. The tradeoff is that it shares resources with your transactional database and lacks some of the advanced features (multi-tenancy, hybrid search modes) of purpose-built vector databases.

Weaviate stands out for hybrid search, combining dense vector search with BM25 keyword search in a single query. This matters for RAG use cases where pure semantic search misses important keyword-specific results. It also supports built-in vectorization modules, reducing the pipeline complexity of embedding at ingest time.

Qdrant offers strong performance characteristics thanks to its Rust implementation and supports advanced filtering (attaching metadata payloads to vectors and filtering during search, not after). For RAG architectures where you need to combine semantic similarity with metadata constraints—find similar documents but only from this department, or only from this date range—Qdrant's filtering is a differentiator.

Milvus is designed for large-scale deployments. As a CNCF project with a Kubernetes-native architecture, it fits enterprises that need to handle billions of vectors with high availability. The operational complexity is higher than managed alternatives, but for teams with the infrastructure expertise, Milvus provides the most control over scaling, sharding, and resource allocation.

The vector database you choose matters less than the architecture you wrap around it. If your API layer cleanly separates vector search from structured data access, swapping one vector database for another is a configuration change, not a rewrite. Build the abstraction first. Optimize the backend later.