Vector Search Overview
Vector search finds similar vectors in high-dimensional spaces. It enables similarity search. It works with embeddings. It powers semantic search and recommendation systems. It requires efficient indexing for scale.
Vector search compares query vectors to database vectors. It uses distance metrics to measure similarity. It returns most similar vectors. It scales to millions of vectors with proper indexing.
The diagram shows vector search process. Query vector compares to database vectors. Distance metrics measure similarity. Indexes enable fast retrieval.
Similarity Metrics
Similarity metrics measure vector relationships. Cosine similarity measures angle between vectors. Euclidean distance measures straight-line distance. Manhattan distance measures city-block distance. Each suits different use cases.
Cosine similarity is cos(θ) = (A·B) / (||A|| × ||B||). It ranges from -1 to 1. Higher values mean more similar. It ignores vector magnitudes. It works well for embeddings.
# Similarity Metricsimport numpy as npfrom sklearn.metrics.pairwise import cosine_similarity, euclidean_distances, manhattan_distancesvectors = np.array([[1.0, 0.0, 0.0],[0.9, 0.1, 0.0],[0.0, 1.0, 0.0]])# Cosine similaritycos_sim = cosine_similarity(vectors)print("Cosine similarity:")print(cos_sim)# Euclidean distanceeuc_dist = euclidean_distances(vectors)print("Euclidean distance:")print(euc_dist)# Manhattan distanceman_dist = manhattan_distances(vectors)print("Manhattan distance:")print(man_dist)
-- NeuronDB: Vector Similarity SearchCREATE TABLE document_embeddings (id SERIAL PRIMARY KEY,content TEXT,embedding vector(384));INSERT INTO document_embeddings (content, embedding) VALUES('Machine learning tutorial', ARRAY[0.1, 0.2, ...]::vector(384)),('Deep learning guide', ARRAY[0.15, 0.18, ...]::vector(384));-- Cosine similarity searchSELECTid,content,1 - (embedding <=> (SELECT embedding FROM document_embeddings WHERE id = 1)) AS similarityFROM document_embeddingsWHERE id != 1ORDER BY similarity DESCLIMIT 5;
Choose metrics based on data characteristics. Cosine works well for normalized embeddings. Euclidean works for spatial data. Manhattan works for sparse data.
The diagram compares similarity metrics. Cosine measures angles. Euclidean measures distances. Each has different properties.
Distance Functions
Distance functions measure vector differences. They enable similarity ranking. Common functions include L2 (Euclidean), L1 (Manhattan), and cosine distance. Each has different computational properties.
L2 distance is ||A - B||₂ = √Σ(Aᵢ - Bᵢ)². It measures straight-line distance. It emphasizes large differences. It works well for dense vectors.
# Distance Functionsimport numpy as npdef l2_distance(a, b):return np.sqrt(np.sum((a - b)**2))def l1_distance(a, b):return np.sum(np.abs(a - b))def cosine_distance(a, b):return 1 - np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))# Examplea = np.array([1.0, 2.0, 3.0])b = np.array([1.1, 2.1, 3.1])print("L2 distance: " + str(l2_distance(a, b)))print("L1 distance: " + str(l1_distance(a, b)))print("Cosine distance: " + str(cosine_distance(a, b)))
Distance functions enable similarity ranking. They measure vector differences. They support efficient search.
The diagram shows distance functions. L2 measures Euclidean distance. L1 measures Manhattan distance. Cosine distance measures angles. Each function suits different data types.
Indexing Strategies
Indexing enables fast vector search at scale. Exact search is accurate but slow. Approximate search is fast but approximate. Common indexes include HNSW and IVFFlat.
HNSW builds hierarchical navigable small worlds. It creates multi-layer graphs. It enables fast approximate search. It works well for high-dimensional vectors.
# HNSW Indexingimport hnswlibimport numpy as np# Create indexdim = 128num_elements = 10000index = hnswlib.Index(space='cosine', dim=dim)index.init_index(max_elements=num_elements, ef_construction=200, M=16)# Add vectorsdata = np.random.randn(num_elements, dim).astype('float32')index.add_items(data, np.arange(num_elements))# Searchquery = np.random.randn(1, dim).astype('float32')index.set_ef(50)labels, distances = index.knn_query(query, k=10)print("Nearest neighbors: " + str(labels))
-- NeuronDB: Vector IndexingCREATE INDEX embedding_hnsw_idx ON document_embeddingsUSING hnsw (embedding vector_cosine_ops)WITH (m = 16, ef_construction = 200);-- Fast similarity search using indexSELECT id, contentFROM document_embeddingsORDER BY embedding <=> (SELECT embedding FROM document_embeddings WHERE id = 1)LIMIT 10;
Indexing enables scalable search. HNSW works well for high dimensions. IVFFlat works well for lower dimensions.
Detailed Indexing Algorithms
HNSW (Hierarchical Navigable Small World) builds multi-layer graphs. Bottom layer contains all vectors. Upper layers contain fewer vectors. Search starts at top layer. It navigates to bottom layer. It finds approximate nearest neighbors quickly.
Construction parameters include M (connections per node) and ef_construction (search width). Larger M increases recall but slows construction. Larger ef_construction improves quality but increases time. Typical M is 16-32. Typical ef_construction is 100-200.
IVFFlat (Inverted File with Flat Compression) clusters vectors. It creates Voronoi cells. Each cell has a centroid. Search finds closest centroids. It searches vectors in those cells. It works well for lower dimensions.
# Detailed Indexing Comparisonimport numpy as npimport timeimport hnswlibtry:import faissexcept ImportError:print("FAISS not available, using HNSW only")def compare_indexing_methods(dim=128, num_vectors=100000, num_queries=1000):# Generate datanp.random.seed(42)vectors = np.random.randn(num_vectors, dim).astype('float32')queries = np.random.randn(num_queries, dim).astype('float32')results = {}# HNSWprint("Building HNSW index...")start = time.time()index_hnsw = hnswlib.Index(space='l2', dim=dim)index_hnsw.init_index(max_elements=num_vectors, ef_construction=200, M=16)index_hnsw.add_items(vectors, np.arange(num_vectors))build_time_hnsw = time.time() - startindex_hnsw.set_ef(50)start = time.time()labels_hnsw, distances_hnsw = index_hnsw.knn_query(queries, k=10)search_time_hnsw = time.time() - startresults['HNSW'] = {'build_time': build_time_hnsw,'search_time': search_time_hnsw,'avg_search_time': search_time_hnsw / num_queries}# IVFFlat (if FAISS available)try:print("Building IVFFlat index...")start = time.time()nlist = 100 # Number of clustersquantizer = faiss.IndexFlatL2(dim)index_ivf = faiss.IndexIVFFlat(quantizer, dim, nlist)index_ivf.train(vectors)index_ivf.add(vectors)build_time_ivf = time.time() - startindex_ivf.nprobe = 10 # Search in 10 clustersstart = time.time()distances_ivf, labels_ivf = index_ivf.search(queries, 10)search_time_ivf = time.time() - startresults['IVFFlat'] = {'build_time': build_time_ivf,'search_time': search_time_ivf,'avg_search_time': search_time_ivf / num_queries}except:pass# Exact search baselineprint("Performing exact search...")start = time.time()from sklearn.metrics.pairwise import euclidean_distancesexact_distances = []for query in queries:dists = euclidean_distances([query], vectors)[0]exact_distances.append(np.argsort(dists)[:10])exact_time = time.time() - startresults['Exact'] = {'search_time': exact_time,'avg_search_time': exact_time / num_queries}print("\nIndexing Performance Comparison:")for method, metrics in results.items():print(f"{method}:")if 'build_time' in metrics:print(f" Build time: {metrics['build_time']:.2f}s")print(f" Search time: {metrics['search_time']:.4f}s")print(f" Avg per query: {metrics['avg_search_time']*1000:.2f}ms")return results# Example# results = compare_indexing_methods(dim=128, num_vectors=10000, num_queries=100)
Index Tuning and Optimization
Tune index parameters for your use case. Higher recall requires more computation. Lower recall enables faster search. Balance accuracy and speed.
For HNSW, increase M for better recall. Increase ef_search for more accurate results. Higher values slow search. Typical ef_search is 50-200. For production, start with ef_search=50. Increase if recall is insufficient.
For IVFFlat, increase nlist for better quality. Increase nprobe for higher recall. nprobe controls clusters searched. Higher nprobe improves recall but slows search. Typical nprobe is 1-10.
Monitor index performance. Measure query latency. Measure recall at k. Measure index size. Adjust parameters based on requirements.
# Index Tuning Exampledef tune_hnsw_index(vectors, queries, true_neighbors, M_values=[8, 16, 32], ef_values=[25, 50, 100]):results = []for M in M_values:for ef in ef_values:# Build indexindex = hnswlib.Index(space='l2', dim=vectors.shape[1])index.init_index(max_elements=len(vectors), ef_construction=200, M=M)index.add_items(vectors, np.arange(len(vectors)))# Searchindex.set_ef(ef)start = time.time()labels, distances = index.knn_query(queries, k=10)search_time = time.time() - start# Compute recallrecall = compute_recall(labels, true_neighbors)results.append({'M': M,'ef': ef,'search_time': search_time,'recall': recall})# Find best configurationbest = max(results, key=lambda x: x['recall'] / x['search_time'])print(f"Best configuration: M={best['M']}, ef={best['ef']}")print(f"Recall: {best['recall']:.3f}, Time: {best['search_time']:.4f}s")return resultsdef compute_recall(predicted, true):"""Compute recall@10"""correct = 0total = 0for pred, true_nn in zip(predicted, true):correct += len(set(pred) & set(true_nn))total += len(true_nn)return correct / total if total > 0 else 0
The diagram shows indexing structures. HNSW uses hierarchical graphs. IVFFlat uses inverted files. Each enables fast search.
Approximate Nearest Neighbor Search
Approximate search trades accuracy for speed. It finds near-optimal results quickly. It scales to billions of vectors. It works well for many applications.
Approximate methods include locality-sensitive hashing, product quantization, and graph-based methods. Each has different accuracy-speed tradeoffs.
# Approximate Nearest Neighborfrom sklearn.neighbors import LSHForest# Create LSH indexlsh = LSHForest(n_estimators=10, n_candidates=50)lsh.fit(vectors)# Approximate searchdistances, indices = lsh.kneighbors(query_vector, n_neighbors=10)print("Approximate neighbors: " + str(indices))
Approximate search enables scale. It finds good results quickly. It works for large datasets.
Exact vs Approximate Search
Exact search finds true nearest neighbors. It is accurate but slow. Approximate search finds near neighbors. It is fast but approximate.
Choose based on requirements. Use exact for small datasets or high accuracy needs. Use approximate for large datasets or speed needs.
The diagram compares search methods. Exact search is accurate but slow. Approximate search is fast but approximate.
Performance Optimization
Optimization improves search speed and accuracy. Techniques include quantization, pruning, and parallel processing. Each reduces computation or improves efficiency.
# Performance Optimizationimport numpy as np# Quantization reduces precisiondef quantize_vectors(vectors, bits=8):min_val = vectors.min()max_val = vectors.max()scale = (2**bits - 1) / (max_val - min_val)quantized = np.round((vectors - min_val) * scale).astype(np.uint8)return quantized, min_val, scale# Examplevectors = np.random.randn(1000, 128).astype('float32')quantized, min_val, scale = quantize_vectors(vectors, bits=8)print("Original size: " + str(vectors.nbytes))print("Quantized size: " + str(quantized.nbytes))# Quantization reduces memory by 4x
Optimization improves efficiency. It reduces memory usage. It speeds up search.
Summary
Vector search finds similar vectors efficiently. Similarity metrics measure relationships. Cosine similarity works well for embeddings. Distance functions enable ranking. Indexing enables scalable search. HNSW works for high dimensions. Approximate search trades accuracy for speed. Exact search is accurate but slow. Optimization improves performance. Vector search powers semantic search and recommendations.