Glossary · ai
RAG (Retrieval Augmented Generation)
Definition
RAG (Retrieval Augmented Generation) is a method that lets an LLM 'look at' company-specific documents beyond its training data. Documents are embedded into a vector DB; on a question, relevant chunks are added to the LLM's context — hallucination drops, answer quality rises.
Detailed explanation
RAG's 7 steps: doc collection → preprocessing → chunking (200-1000 tokens) → embedding → write to vector DB → retrieval (top-K chunks) → pass as LLM context + generate answer + cite source.
Vector DB options (2026): Pinecone (managed, most mature), Weaviate (open source), Qdrant (Rust-based fast), pgvector (Postgres extension), Chroma (POC). Cost: $70-500/month for production.
Embedding models: OpenAI text-embedding-3-large, Cohere embed-v3, Voyage AI voyage-3-large. Quality + cost trade-offs; Cohere/Voyage stronger on Turkish.
RAG vs fine-tuning: RAG is fast + current + economical (new doc = just embed). Fine-tuning is deep domain adaptation but costly + static. RAG covers 95% of enterprise needs.
Use cases
→Customer support chatbot (RAG over company docs)
→Internal knowledge base search (Confluence, Notion)
→Legal contract analysis
→Healthcare medical guideline lookup
→E-commerce product search + recommendation
Pros
- +Reduces hallucination (sourced answers)
- +Integrates company-specific knowledge
- +New doc = just embed (model unchanged)
- +Audit + traceability (every answer cites source)
Cons
- −Vector DB + embedding cost ($70-500/month)
- −Chunking strategy is critical (quality-defining)
- −Multi-turn conversation context complex
- −3-6 months to production (POC easy, scale hard)
Related terms
Related services
Planning a project around RAG (Retrieval Augmented Generation)?
In a 30-minute discovery call we share a written architecture + cost + team recommendation tailored to your project.
Start a discovery call