Back to Tutorials
AdvancedTutorial 13

Advanced RAG: Hybrid Search and Reranking

NeuronDB Team
2/24/2025
32 min read

Advanced RAG Overview

Advanced RAG improves basic RAG with hybrid search, reranking, and multi-vector approaches. It combines semantic and keyword search. It reranks results for better quality. It uses multiple embeddings per document. It handles temporal information.

Advanced techniques improve retrieval quality. They increase answer accuracy. They reduce hallucinations. They enable complex queries.

Advanced RAG
Figure: Advanced RAG

The diagram shows advanced RAG flow. Hybrid search combines methods. Reranking improves order. Multi-vector handles complexity.

Hybrid Search

Hybrid search combines semantic and keyword search. It uses both embeddings and term matching. It improves recall and precision. It handles diverse query types.

Hybrid methods include score fusion, reciprocal rank fusion, and weighted combination. Score fusion averages normalized scores. Reciprocal rank fusion combines ranks. Weighted combination uses configurable weights.

# Hybrid Search
def hybrid_search(query, documents, embeddings, index, alpha=0.5, top_k=10):
# Semantic search
semantic_scores = semantic_search(query, embeddings, index, top_k*2)
# Keyword search
keyword_scores = keyword_search(query, documents, top_k*2)
# Normalize scores
semantic_scores = normalize_scores(semantic_scores)
keyword_scores = normalize_scores(keyword_scores)
# Combine
hybrid_scores = alpha * semantic_scores + (1 - alpha) * keyword_scores
# Rerank
ranked_indices = np.argsort(hybrid_scores)[::-1][:top_k]
return ranked_indices
# Example
results = hybrid_search("machine learning", documents, embeddings, index, alpha=0.6)
print("Hybrid search results: " + str(results))

Hybrid search improves retrieval quality. It combines strengths of both methods. It handles diverse queries.

Hybrid Search
Figure: Hybrid Search

The diagram shows hybrid search. Semantic and keyword results combine. Final results improve quality.

Reranking Strategies

Reranking improves result order. It uses more sophisticated models. It considers query-document relationships. It improves precision at top ranks.

Reranking methods include cross-encoders, learned-to-rank, and LLM-based reranking. Cross-encoders compute query-document similarity. Learned-to-rank uses machine learning. LLM-based uses language models.

# Reranking
from sentence_transformers import CrossEncoder
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def rerank_results(query, documents, top_k=5):
pairs = [[query, doc] for doc in documents]
scores = reranker.predict(pairs)
ranked_indices = np.argsort(scores)[::-1][:top_k]
return [documents[i] for i in ranked_indices]
# Example
query = "machine learning tutorial"
documents = ["ML guide", "Deep learning", "AI basics"]
reranked = rerank_results(query, documents)
print("Reranked results: " + str(reranked))

Reranking improves result quality. It uses more computation. It provides better precision.

Reranking
Figure: Reranking

The diagram shows reranking process. Initial results retrieved. Cross-encoder reranks candidates. Top results selected for final output.

Detailed Reranking Strategies

Cross-encoders process query and document together. They compute attention between query and document tokens. They capture fine-grained interactions. They are more accurate than bi-encoders. They are slower due to pairwise computation.

Learned-to-rank uses machine learning models. Features include query-document similarity, document length, position in initial ranking. Models learn optimal feature combinations. They improve ranking quality. They require training data.

LLM-based reranking uses language models. They score documents using prompts. They understand context better. They are more expensive. They provide high-quality reranking.

# Detailed Reranking Implementation
from sentence_transformers import CrossEncoder
import numpy as np
from sklearn.ensemble import RandomForestRegressor
class RerankingSystem:
def __init__(self, method='cross_encoder'):
self.method = method
if method == 'cross_encoder':
self.reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
elif method == 'learned_to_rank':
self.ltr_model = RandomForestRegressor(n_estimators=100)
self.feature_names = ['similarity', 'doc_length', 'position', 'query_length']
def cross_encoder_rerank(self, query, documents, top_k=5):
# Create query-document pairs
pairs = [[query, doc] for doc in documents]
# Score pairs
scores = self.reranker.predict(pairs)
# Rank by score
ranked_indices = np.argsort(scores)[::-1][:top_k]
ranked_docs = [documents[i] for i in ranked_indices]
ranked_scores = scores[ranked_indices]
return ranked_docs, ranked_scores
def learned_to_rank_rerank(self, query, documents, initial_scores, top_k=5):
# Extract features
features = []
for i, doc in enumerate(documents):
feature_vector = [
initial_scores[i], # Initial similarity score
len(doc), # Document length
i, # Position in initial ranking
len(query) # Query length
]
features.append(feature_vector)
# Predict reranking scores
rerank_scores = self.ltr_model.predict(features)
# Rank by reranking scores
ranked_indices = np.argsort(rerank_scores)[::-1][:top_k]
ranked_docs = [documents[i] for i in ranked_indices]
return ranked_docs, rerank_scores[ranked_indices]
def train_ltr_model(self, queries, documents_list, initial_scores_list, relevance_labels):
"""Train learned-to-rank model"""
X_train = []
y_train = []
for queries_batch, docs_batch, scores_batch, labels_batch in zip(
queries, documents_list, initial_scores_list, relevance_labels
):
for query, docs, scores, labels in zip(queries_batch, docs_batch, scores_batch, labels_batch):
for i, (doc, score, label) in enumerate(zip(docs, scores, labels)):
features = [score, len(doc), i, len(query)]
X_train.append(features)
y_train.append(label)
self.ltr_model.fit(X_train, y_train)
return self.ltr_model
# Example
reranker = RerankingSystem(method='cross_encoder')
query = "machine learning tutorial"
documents = ["ML guide", "Deep learning basics", "AI introduction", "Neural networks explained"]
reranked, scores = reranker.cross_encoder_rerank(query, documents, top_k=3)
print("Reranked documents: " + str(reranked))
print("Reranking scores: " + str(scores))

Reranking Performance Optimization

Optimize reranking for production use. Cache frequent query-document pairs. Use approximate reranking for large candidate sets. Batch process multiple queries together.

Two-stage reranking uses fast model first. It filters to top candidates. It uses slow model on filtered set. This balances accuracy and speed.

# Optimized Reranking Pipeline
class OptimizedReranking:
def __init__(self):
self.fast_reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-3-v2') # Faster, smaller
self.slow_reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-12-v2') # Slower, better
self.cache = {}
def rerank_optimized(self, query, documents, initial_scores, top_k=5, use_cache=True):
# Stage 1: Fast reranking on all candidates
if use_cache and query in self.cache:
cached_results = self.cache[query]
if len(documents) == len(cached_results['docs']):
return cached_results['docs'][:top_k], cached_results['scores'][:top_k]
# Fast reranking
pairs = [[query, doc] for doc in documents]
fast_scores = self.fast_reranker.predict(pairs)
# Filter to top candidates (e.g., top 20)
filter_k = min(20, len(documents))
top_indices = np.argsort(fast_scores)[::-1][:filter_k]
top_docs = [documents[i] for i in top_indices]
# Stage 2: Slow reranking on filtered set
top_pairs = [[query, doc] for doc in top_docs]
slow_scores = self.slow_reranker.predict(top_pairs)
# Final ranking
final_indices = np.argsort(slow_scores)[::-1][:top_k]
final_docs = [top_docs[i] for i in final_indices]
final_scores = slow_scores[final_indices]
# Cache results
if use_cache:
self.cache[query] = {'docs': final_docs, 'scores': final_scores}
return final_docs, final_scores
# Batch processing
def batch_rerank(queries, documents_list, reranker, batch_size=32):
"""Process multiple queries in batches"""
all_results = []
for i in range(0, len(queries), batch_size):
batch_queries = queries[i:i+batch_size]
batch_docs = documents_list[i:i+batch_size]
batch_results = []
for query, docs in zip(batch_queries, batch_docs):
reranked, scores = reranker.rerank_optimized(query, docs, top_k=5)
batch_results.append((reranked, scores))
all_results.extend(batch_results)
return all_results

Multi-vector Approaches

Multi-vector approaches use multiple embeddings per document. They capture different aspects. They improve retrieval coverage. They handle complex documents.

Methods include sentence-level embeddings, chunk-level embeddings, and aspect-based embeddings. Sentence-level captures fine-grained information. Chunk-level captures context. Aspect-based captures specific aspects.

# Multi-vector Approach
def create_multi_vectors(document):
# Sentence embeddings
sentences = split_sentences(document)
sentence_embs = embedder.encode(sentences)
# Chunk embeddings
chunks = chunk_document(document)
chunk_embs = embedder.encode(chunks)
# Aspect embeddings
aspects = extract_aspects(document)
aspect_embs = embedder.encode(aspects)
return {
'sentences': sentence_embs,
'chunks': chunk_embs,
'aspects': aspect_embs
}
# Search across all vectors
def multi_vector_search(query, multi_vectors):
query_emb = embedder.encode([query])
all_scores = []
for doc_id, vectors in multi_vectors.items():
for vec_type, embs in vectors.items():
scores = cosine_similarity(query_emb, embs)[0]
all_scores.append((doc_id, vec_type, max(scores)))
return sorted(all_scores, key=lambda x: x[2], reverse=True)

Multi-vector approaches improve coverage. They capture document complexity. They enable better retrieval.

Multi-Query Retrieval
Figure: Multi-Query Retrieval

The diagram shows multi-query process. Original query generates multiple queries. Each query retrieves results. Results combined and reranked. Improves coverage and recall.

Temporal Search Patterns

Temporal search handles time-sensitive information. It considers document timestamps. It prioritizes recent information. It enables time-based filtering.

Temporal methods include time-weighted scoring, recency boosting, and time-based filtering. Time-weighted combines relevance and recency. Recency boosts recent documents. Time-based filters by time ranges.

# Temporal Search
def temporal_search(query, documents, timestamps, alpha=0.7, top_k=10):
# Relevance scores
relevance_scores = compute_relevance(query, documents)
# Recency scores
max_time = max(timestamps)
recency_scores = [(max_time - t).days for t in timestamps]
recency_scores = normalize(recency_scores)
# Combine
combined_scores = alpha * relevance_scores + (1 - alpha) * recency_scores
ranked_indices = np.argsort(combined_scores)[::-1][:top_k]
return ranked_indices
# Example
timestamps = [datetime(2024, 1, 1), datetime(2024, 2, 1), datetime(2024, 3, 1)]
results = temporal_search("AI news", documents, timestamps)
print("Temporal search results: " + str(results))

Temporal search handles time-sensitive queries. It prioritizes recent information. It improves relevance for time-dependent topics.

Query Routing

Query routing directs queries to appropriate retrievers. It analyzes query characteristics. It selects best retrieval method. It improves efficiency and quality.

Routing methods include rule-based, learned, and hybrid routing. Rule-based uses heuristics. Learned uses machine learning. Hybrid combines approaches.

# Query Routing
def route_query(query):
# Analyze query
has_keywords = has_exact_terms(query)
is_semantic = is_meaning_based(query)
if has_keywords and is_semantic:
return 'hybrid'
elif has_keywords:
return 'keyword'
else:
return 'semantic'
def routed_search(query, documents, embeddings, index):
route = route_query(query)
if route == 'hybrid':
return hybrid_search(query, documents, embeddings, index)
elif route == 'keyword':
return keyword_search(query, documents)
else:
return semantic_search(query, embeddings, index)
# Example
results = routed_search("machine learning tutorial", documents, embeddings, index)
print("Routed search results: " + str(results))

Query routing improves efficiency. It selects appropriate methods. It optimizes retrieval.

Summary

Advanced RAG improves basic RAG with hybrid search, reranking, and multi-vector approaches. Hybrid search combines semantic and keyword methods. Reranking improves result order. Multi-vector approaches handle document complexity. Temporal search handles time-sensitive information. Query routing optimizes retrieval. Advanced techniques improve RAG quality.

References

Related Tutorials