Glossary · ai
Embedding
Definition
An embedding represents text/images/audio as a numerical vector (1024-3072 dim). Closer meanings produce closer vectors — 'dog' and 'cat' are close, 'dog' and 'car' are far. It's the mathematical foundation of RAG, semantic search, and recommendations.
Detailed explanation
Embeddings are 'meaning coordinates'. Each word/sentence/doc is a point in space; semantically similar items are close. 'Weather is nice in Istanbul' and 'Istanbul is sunny today' yield cosine similarity 0.85+; 'Istanbul is nice' and 'Python code' return 0.3.
2026 models: OpenAI text-embedding-3-large (3072 dim, most common), Cohere embed-v3 (1024 dim, multilingual), Voyage AI voyage-3-large (1024 dim, leader in code + multilingual), BGE-M3 (open source, self-host).
Turkish quality: Cohere multilingual + Voyage are better in Turkish. OpenAI is acceptable but weak on Turkish-specific details. Cost: OpenAI $0.13/1M tokens; 100K docs × 500 tokens = 50M tokens = $6.5 one-off.
Use cases
→RAG retrieval (search inside a vector DB)
→Semantic search (beyond keywords)
→Clustering (group similar product/content)
→Anomaly detection (outlier vectors)
→Recommendation (user-product similarity)
Pros
- +Turns meaning into numbers (works in algorithms)
- +Multi-modal (text + image + audio same space)
- +Cross-language (with multilingual models)
- +Compact (1KB-12KB per chunk)
Cons
- −Re-embedding when model updates (high cost)
- −Storage cost (vector DB)
- −Quality bound to model (bad embedding = bad retrieval)
Related terms
Related services
Planning a project around Embedding?
In a 30-minute discovery call we share a written architecture + cost + team recommendation tailored to your project.
Start a discovery call