Skip to main content

Glossary · ai

RAG (Retrieval Augmented Generation)

Definition

RAG (Retrieval Augmented Generation) is a method that lets an LLM 'look at' company-specific documents beyond its training data. Documents are embedded into a vector DB; on a question, relevant chunks are added to the LLM's context — hallucination drops, answer quality rises.

Published: 2026-05-05Updated: 2026-05-05

Detailed explanation

RAG's 7 steps: doc collection → preprocessing → chunking (200-1000 tokens) → embedding → write to vector DB → retrieval (top-K chunks) → pass as LLM context + generate answer + cite source.

Vector DB options (2026): Pinecone (managed, most mature), Weaviate (open source), Qdrant (Rust-based fast), pgvector (Postgres extension), Chroma (POC). Cost: $70-500/month for production.

Embedding models: OpenAI text-embedding-3-large, Cohere embed-v3, Voyage AI voyage-3-large. Quality + cost trade-offs; Cohere/Voyage stronger on Turkish.

RAG vs fine-tuning: RAG is fast + current + economical (new doc = just embed). Fine-tuning is deep domain adaptation but costly + static. RAG covers 95% of enterprise needs.

Use cases

Customer support chatbot (RAG over company docs)

Internal knowledge base search (Confluence, Notion)

Legal contract analysis

Healthcare medical guideline lookup

E-commerce product search + recommendation

Pros

  • +Reduces hallucination (sourced answers)
  • +Integrates company-specific knowledge
  • +New doc = just embed (model unchanged)
  • +Audit + traceability (every answer cites source)

Cons

  • Vector DB + embedding cost ($70-500/month)
  • Chunking strategy is critical (quality-defining)
  • Multi-turn conversation context complex
  • 3-6 months to production (POC easy, scale hard)

Related terms

LLMVector DBEmbeddingAI AgentMCP

Related services

Planning a project around RAG (Retrieval Augmented Generation)?

In a 30-minute discovery call we share a written architecture + cost + team recommendation tailored to your project.

Start a discovery call