RAG Fundamentals Overview
Retrieval-Augmented Generation combines retrieval and generation. It retrieves relevant documents. It uses them as context for generation. It improves answer quality and reduces hallucinations. It enables knowledge-grounded responses.
RAG has three main components. Retrieval finds relevant documents. Augmentation adds context to prompts. Generation produces answers using context.
The diagram shows RAG flow. Query triggers retrieval. Retrieved documents provide context. Generator produces answer using context.
RAG Architecture Components
RAG architecture includes document store, retriever, and generator. Document store stores knowledge base. Retriever finds relevant documents. Generator produces answers.
Document store can be vector database, traditional database, or hybrid. Retriever uses semantic search or keyword search. Generator uses language models.
# RAG Architecturefrom sentence_transformers import SentenceTransformerfrom transformers import pipelineclass RAGSystem:def __init__(self):self.embedder = SentenceTransformer('all-MiniLM-L6-v2')self.generator = pipeline('text-generation', model='gpt2')self.documents = []self.embeddings = Nonedef add_documents(self, documents):self.documents = documentsself.embeddings = self.embedder.encode(documents)def retrieve(self, query, top_k=3):query_emb = self.embedder.encode([query])similarities = cosine_similarity(query_emb, self.embeddings)[0]top_indices = np.argsort(similarities)[::-1][:top_k]return [self.documents[i] for i in top_indices]def generate(self, query, context):prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"answer = self.generator(prompt, max_length=100)[0]['generated_text']return answerdef query(self, query):context = " ".join(self.retrieve(query))return self.generate(query, context)# Examplerag = RAGSystem()rag.add_documents(["Machine learning is...", "Deep learning uses..."])answer = rag.query("What is machine learning?")print("Answer: " + str(answer))
RAG architecture enables knowledge-grounded generation. It improves answer quality. It reduces hallucinations.
Detailed RAG Implementation Patterns
Basic RAG uses simple retrieval and generation. Query triggers semantic search. Top-k documents retrieved. Context built from documents. Prompt includes context and query. Generator produces answer.
Advanced RAG adds reranking and filtering. Initial retrieval gets more candidates. Reranking improves order. Filtering removes irrelevant documents. Final context uses best documents.
Iterative RAG refines retrieval through feedback. Initial answer generated. Answer analyzed for gaps. Additional retrieval fills gaps. Process repeats until complete.
# Detailed RAG Implementationclass AdvancedRAGSystem:def __init__(self):self.embedder = SentenceTransformer('all-MiniLM-L6-v2')self.generator = pipeline('text-generation', model='gpt2')self.reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')self.documents = []self.embeddings = Nonedef basic_rag(self, query, top_k=3):"""Basic RAG implementation"""# Retrievequery_emb = self.embedder.encode([query])similarities = cosine_similarity(query_emb, self.embeddings)[0]top_indices = np.argsort(similarities)[::-1][:top_k]retrieved = [self.documents[i] for i in top_indices]# Build contextcontext = "\n\n".join([f"Document {i+1}: {doc}" for i, doc in enumerate(retrieved)])# Generateprompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"answer = self.generator(prompt, max_length=200)[0]['generated_text']return answer.split("Answer:")[-1].strip()def advanced_rag(self, query, retrieve_k=20, rerank_k=5):"""Advanced RAG with reranking"""# Initial retrievalquery_emb = self.embedder.encode([query])similarities = cosine_similarity(query_emb, self.embeddings)[0]top_indices = np.argsort(similarities)[::-1][:retrieve_k]candidates = [self.documents[i] for i in top_indices]# Rerankpairs = [[query, doc] for doc in candidates]rerank_scores = self.reranker.predict(pairs)rerank_indices = np.argsort(rerank_scores)[::-1][:rerank_k]final_docs = [candidates[i] for i in rerank_indices]# Build contextcontext = "\n\n".join([f"Document {i+1}: {doc}" for i, doc in enumerate(final_docs)])# Generateprompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"answer = self.generator(prompt, max_length=200)[0]['generated_text']return answer.split("Answer:")[-1].strip()def iterative_rag(self, query, max_iterations=3):"""Iterative RAG with feedback"""context = ""answer = ""for iteration in range(max_iterations):# Retrieve based on query and current answersearch_query = f"{query} {answer}" if answer else queryquery_emb = self.embedder.encode([search_query])similarities = cosine_similarity(query_emb, self.embeddings)[0]top_indices = np.argsort(similarities)[::-1][:5]retrieved = [self.documents[i] for i in top_indices]# Update contextnew_context = "\n\n".join([f"Doc: {doc}" for doc in retrieved])context = context + "\n\n" + new_context if context else new_context# Generate answerprompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"answer = self.generator(prompt, max_length=200)[0]['generated_text']answer = answer.split("Answer:")[-1].strip()# Check if answer is complete (simplified)if len(answer) > 50: # Placeholder checkbreakreturn answer# Examplerag = AdvancedRAGSystem()rag.documents = ["ML is AI subset", "Neural networks have layers", "Deep learning uses many layers"]rag.embeddings = rag.embedder.encode(rag.documents)basic_answer = rag.basic_rag("What is machine learning?")print("Basic RAG answer: " + str(basic_answer))advanced_answer = rag.advanced_rag("What is machine learning?", retrieve_k=10, rerank_k=3)print("Advanced RAG answer: " + str(advanced_answer))
RAG Quality Evaluation
Evaluate RAG systems using multiple metrics. Answer quality measures correctness. Answer relevance measures topic alignment. Answer completeness measures information coverage. Context utilization measures retrieved document usage.
Answer quality uses human evaluation or automated metrics. BLEU scores measure n-gram overlap. ROUGE scores measure summary quality. BERTScore measures semantic similarity. Human evaluation provides gold standard.
# RAG Quality Evaluationfrom rouge_score import rouge_scorerfrom bert_score import score as bert_scoreimport numpy as npclass RAGEvaluator:def __init__(self):self.rouge_scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)def evaluate_answer_quality(self, generated_answer, reference_answer):"""Evaluate answer quality using multiple metrics"""metrics = {}# ROUGE scoresrouge_scores = self.rouge_scorer.score(reference_answer, generated_answer)metrics['rouge1'] = rouge_scores['rouge1'].fmeasuremetrics['rouge2'] = rouge_scores['rouge2'].fmeasuremetrics['rougeL'] = rouge_scores['rougeL'].fmeasure# BERTScoreP, R, F1 = bert_score([generated_answer], [reference_answer], lang='en', verbose=False)metrics['bertscore_precision'] = P.item()metrics['bertscore_recall'] = R.item()metrics['bertscore_f1'] = F1.item()# Answer lengthmetrics['answer_length'] = len(generated_answer.split())metrics['reference_length'] = len(reference_answer.split())return metricsdef evaluate_context_utilization(self, retrieved_docs, generated_answer):"""Measure how well answer uses retrieved context"""answer_words = set(generated_answer.lower().split())doc_words_sets = [set(doc.lower().split()) for doc in retrieved_docs]all_doc_words = set().union(*doc_words_sets)# Overlap ratiooverlap = answer_words & all_doc_wordsutilization = len(overlap) / len(answer_words) if len(answer_words) > 0 else 0return {'context_utilization': utilization,'overlap_count': len(overlap),'answer_word_count': len(answer_words),'context_word_count': len(all_doc_words)}def evaluate_retrieval_quality(self, retrieved_docs, relevant_docs):"""Evaluate retrieval performance"""retrieved_set = set(retrieved_docs)relevant_set = set(relevant_docs)precision = len(retrieved_set & relevant_set) / len(retrieved_set) if len(retrieved_set) > 0 else 0recall = len(retrieved_set & relevant_set) / len(relevant_set) if len(relevant_set) > 0 else 0f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0return {'precision': precision,'recall': recall,'f1': f1}# Exampleevaluator = RAGEvaluator()generated = "Machine learning is a subset of artificial intelligence that enables computers to learn."reference = "Machine learning is a method of data analysis that automates analytical model building."quality_metrics = evaluator.evaluate_answer_quality(generated, reference)print("Answer quality metrics:")for metric, value in quality_metrics.items():print(f"{metric}: {value:.4f}")
The diagram shows RAG components. Document store provides knowledge. Retriever finds relevant content. Generator produces answers.
Document Processing Pipeline
Document processing prepares documents for RAG. It includes ingestion, chunking, embedding, and indexing. Each step affects retrieval quality.
Processing pipeline ingests documents from sources. It chunks documents appropriately. It generates embeddings. It indexes for fast retrieval.
# Document Processing Pipelinedef process_documents(documents):# Chunk documentschunks = []for doc in documents:chunks.extend(chunk_document(doc, chunk_size=500))# Generate embeddingsembeddings = embedder.encode(chunks)# Indexindex = create_index(embeddings)return chunks, embeddings, index# Exampledocuments = ["Document 1 content...", "Document 2 content..."]chunks, embeddings, index = process_documents(documents)print("Processed " + str(len(chunks)) + " chunks")
Document processing affects RAG quality. Good processing improves retrieval. It enables accurate generation.
The diagram shows processing pipeline. Documents are chunked. Chunks are embedded. Embeddings are indexed.
Retrieval Strategies
Retrieval strategies find relevant documents. They use semantic search, keyword search, or hybrid approaches. Each has different strengths.
Semantic retrieval uses embeddings. It finds semantically similar documents. Keyword retrieval uses term matching. It finds documents with matching terms. Hybrid combines both.
# Retrieval Strategiesdef semantic_retrieve(query, embeddings, index, top_k=5):query_emb = embedder.encode([query])results = index.search(query_emb, top_k)return resultsdef keyword_retrieve(query, documents, top_k=5):# Use TF-IDF or BM25scores = compute_keyword_scores(query, documents)top_indices = np.argsort(scores)[::-1][:top_k]return [documents[i] for i in top_indices]def hybrid_retrieve(query, documents, embeddings, index, alpha=0.5, top_k=5):semantic_results = semantic_retrieve(query, embeddings, index, top_k*2)keyword_results = keyword_retrieve(query, documents, top_k*2)# Combine and rerankcombined = combine_results(semantic_results, keyword_results, alpha)return combined[:top_k]
Retrieval strategies affect RAG performance. Semantic retrieval works for meaning-based queries. Keyword retrieval works for exact matches. Hybrid works for diverse queries.
The diagram shows retrieval strategies. Semantic uses embeddings. Keyword uses term matching. Hybrid combines both methods.
Context Building
Context building prepares retrieved documents for generation. It combines multiple documents. It formats context appropriately. It ensures context fits prompt limits.
Context building includes document selection, formatting, and truncation. Selection chooses most relevant documents. Formatting structures context. Truncation fits length limits.
# Context Buildingdef build_context(retrieved_docs, max_length=1000):context_parts = []current_length = 0for doc in retrieved_docs:doc_text = format_document(doc)if current_length + len(doc_text) <= max_length:context_parts.append(doc_text)current_length += len(doc_text)else:breakcontext = "\n\n".join(context_parts)return context# Exampleretrieved = ["Doc 1", "Doc 2", "Doc 3"]context = build_context(retrieved, max_length=500)print("Context: " + str(context))
Context building affects generation quality. Good context improves answers. It provides relevant information.
Prompt Construction
Prompt construction creates effective prompts for generation. It includes context, query, and instructions. It structures information clearly.
Prompts typically include context section, query section, and instruction section. Context provides background. Query specifies task. Instructions guide generation.
# Prompt Constructiondef construct_prompt(query, context, instruction=""):prompt = f"""Context:{context}Question: {query}{instruction}Answer:"""return prompt# Examplequery = "What is machine learning?"context = "Machine learning is a subset of AI..."prompt = construct_prompt(query, context, "Answer based on the context provided.")print("Prompt: " + str(prompt))
Prompt construction affects generation quality. Clear prompts produce better answers. Good structure improves understanding.
Generation Integration
Generation integration uses language models to produce answers. It takes prompts with context. It generates coherent responses. It ensures answers use provided context.
Generation uses decoder models like GPT. It generates text autoregressively. It conditions on context. It produces relevant answers.
# Generation Integrationfrom transformers import pipelinegenerator = pipeline('text-generation', model='gpt2')def generate_answer(prompt, max_length=200):output = generator(prompt, max_length=max_length, num_return_sequences=1)answer = output[0]['generated_text'].split("Answer:")[-1].strip()return answer# Exampleprompt = "Context: ML is AI subset.\n\nQuestion: What is ML?\n\nAnswer:"answer = generate_answer(prompt)print("Answer: " + str(answer))
Generation integration produces final answers. It uses retrieved context. It ensures relevance.
Summary
RAG combines retrieval and generation. Architecture includes document store, retriever, and generator. Document processing prepares documents. Retrieval strategies find relevant content. Context building prepares information. Prompt construction creates effective prompts. Generation integration produces answers. RAG enables knowledge-grounded generation.