Imagine shining a flashlight. The direction you point it is what matters — not how bright it is. Cosine similarity measures the angle between two vectors. Vectors pointing the same way = similar meaning. Vectors pointing at right angles = totally different.
📝 Worked Example — 2-D Vectors (Easy Numbers)
distance = 1 − cos(θ). A distance of 0.0 means identical. A distance of 1.0 means completely unrelated. Smaller is always better — this is the opposite convention from Elasticsearch's similarity score where bigger is better.
🎮 Interactive Cosine Playground — Drag the Sliders
A transformer embedding model reads text and outputs a fixed-length list of numbers — a vector. Think of it as a super-smart translator that converts meaning into a location in a high-dimensional map.
📊 Popular Embedding Models Compared
| Model | Dimensions | Provider | Best For | Context Window |
|---|---|---|---|---|
| nomic-embed-text open | 768 | Nomic AI | Long docs, local deploy | 8,192 tokens |
| text-embedding-ada-002 | 1,536 | OpenAI | General purpose RAG | 8,191 tokens |
| text-embedding-3-small | 1,536 | OpenAI | Cost-efficient | 8,191 tokens |
| text-embedding-3-large | 3,072 | OpenAI | Highest accuracy | 8,191 tokens |
| all-MiniLM-L6-v2 open | 384 | Sentence-BERT | Fast, small footprint | 512 tokens |
| e5-mistral-7b open | 4,096 | Microsoft | State-of-the-art open | 32,768 tokens |
Searching cosine similarity across millions of vectors one by one would be O(n·d) — impossibly slow. HNSW solves this with a brilliant multi-layer graph structure inspired by how highway maps work. It finds your nearest neighbor in O(log n) time.
🔨 How HNSW is Built (Insertion Algorithm)
| Method | Search Speed | Accuracy | Memory | Used By |
|---|---|---|---|---|
| Brute Force | O(n·d) 🐢 | 100% exact | Just vectors | Tiny datasets <10k |
| IVF (Flat) | O(√n·d) | ~95% | Centroids + vectors | Faiss, classic Pinecone |
| HNSW ⭐ | O(log n) 🚀 | 95–99% | ~1.5× vectors | Elasticsearch, ChromaDB, Weaviate, Qdrant |
| Product Quantization | O(log n) | ~90% | ~0.25× vectors | When RAM is scarce |
dense_vector field type. When you set "index": true, ES builds an HNSW index automatically. You control M and ef_construction in the mapping. At query time, num_candidates sets the ef parameter.
collection.add(), it inserts into HNSW. When you call collection.query(), it traverses the graph and returns distance = 1 − cosine.
Before vector search existed, Elasticsearch was already the world's fastest full-text search engine. It uses two powerful ideas together: the inverted index (a backwards book index) and BM25 scoring (a smart relevance formula).
📊 BM25 Scoring Formula
ChromaDB is a vector-first database — its only job is storing embeddings and finding similar ones fast. It's the most popular choice for local RAG prototypes and Python-native LLM apps.
distance = 1 − cosine_similarity. So when you get back results with distances=[0.02, 0.15, 0.41], the smallest number is the most similar. A distance of 0 means identical. Distance of 1 means totally unrelated. This is the opposite of Elasticsearch where bigger score = better.
Elasticsearch 8.0+ added native kNN vector search. It uses HNSW internally and — uniquely — lets you combine BM25 keyword scoring with cosine similarity in a single query. This is called hybrid search and it outperforms either method alone.
RAG is how modern LLM apps (like enterprise chatbots, document Q&A) work. Instead of baking all knowledge into the model, you retrieve relevant chunks at query time and inject them into the LLM's prompt. Two phases: Ingestion and Retrieval.
📥 Phase 1 — Ingestion (Done Once / On New Data)
🔎 Phase 2 — Retrieval (Done on Every User Query)
| Tool | Type | Search Method | Best For | Weakness |
|---|---|---|---|---|
| Elasticsearch 8+ ⭐ | hybrid | BM25 + HNSW cosine | Production RAG + keyword + analytics | Complex ops, RAM heavy |
| ChromaDB | vector | HNSW cosine/L2/IP | Local prototyping, Python RAG | Not production-grade at huge scale |
| Pinecone | vector | HNSW / ANN | Managed cloud vector search | Cost, vendor lock-in |
| Weaviate | vector | HNSW + BM25 | GraphQL + vector, multi-modal | Steeper learning curve |
| Qdrant | vector | HNSW cosine | High performance, Rust-based | Smaller ecosystem |
| pgvector | sql | Exact cosine / IVFFlat | Already on PostgreSQL | Slower than dedicated DBs at scale |
| Redis (VSS) | cache | HNSW / FLAT | Low-latency, in-memory | Data size limited by RAM |
| Milvus | vector | HNSW/IVF/ANNOY | Billion-scale vectors | Heavy infrastructure |
| OpenSearch | hybrid | BM25 + k-NN | AWS native, ES alternative | Slightly behind ES features |
📐 Math & Algorithms
🔧 Tools & Docs