Session 4: Query Enhancement & Context Augmentation - Bridging the Semantic Gap¶
Learning Outcomes¶
By the end of this session, you will be able to: - Implement advanced query enhancement techniques including HyDE and multi-query generation - Build context augmentation systems that improve retrieval accuracy by 30-50% - Apply semantic gap bridging strategies for better query-document matching - Design query intelligence pipelines with automatic optimization and feedback loops - Evaluate and optimize query enhancement systems for production environments
Chapter Overview¶
What you'll learn: Advanced techniques for improving query understanding and context retrieval in RAG systems
Why it matters: The semantic gap between user queries and document content is the #1 cause of poor RAG performance. This session teaches proven techniques that improve retrieval accuracy by 30-50% in production systems.
How it stands out: Unlike basic similarity search, these advanced query enhancement methods bridge the gap between natural language questions and technical document content through intelligent query expansion and context augmentation.
Where you'll apply it: Customer support systems, technical documentation search, enterprise knowledge bases, and any RAG application requiring high precision retrieval.
Learning Path Options¶
Observer Path (65 minutes): Understand concepts and examine query intelligence patterns - Focus: Conceptual understanding with practical examples - Best for: Getting oriented and understanding the business value
🙋♂️ Participant Path (130 minutes): Follow exercises and implement HyDE and context optimization - Focus: Hands-on implementation with guided examples - Best for: Learning through practical implementation
🛠️ Implementer Path (230 minutes): Build custom enhancement systems and explore production patterns - Focus: Advanced techniques and production deployment - Best for: Technical teams planning real-world deployments
Core Learning Track (130 minutes) - REQUIRED¶
Section | Concept Load | Time | Skills |
---|---|---|---|
HyDE Implementation | 8 concepts | 25 min | Semantic Gap Bridging |
Query Enhancement | 9 concepts | 25 min | Multi-Strategy Expansion |
Context Optimization | 6 concepts | 20 min | Window Management |
Prompt Engineering | 10 concepts | 30 min | Generation Quality |
Integration Pipeline | 7 concepts | 30 min | System Architecture |
Optional Deep Dive Modules (Choose Your Adventure)¶
- 🔬 Module A: Advanced Query Understanding (50 min)
- 🏭 Module B: Enterprise Context Systems (50 min)
Core Section (Required - 130 minutes)¶
Learning Outcomes¶
By the end of this session, you will be able to: - Implement HyDE (Hypothetical Document Embeddings) for semantic gap bridging - Design query expansion and reformulation strategies using LLMs - Build multi-query generation systems for comprehensive retrieval - Optimize context windows and chunking for better generation quality - Engineer advanced prompts that maximize retrieval-augmented generation effectiveness
Chapter Introduction¶
Building on Vector Search Mastery: The Query Intelligence Layer¶
In Session 3, you mastered vector databases and hybrid search optimization - now we tackle the next critical challenge: the query-document semantic gap. Even the most sophisticated vector search fails when user queries don't linguistically match document content. This session transforms your optimized retrieval system into an intelligent query interpreter that understands intent and bridges semantic mismatches.
The Query Enhancement Challenge: - Semantic Mismatch: User questions vs. document language styles create embedding distance - Vector Search Limitations: Perfect similarity doesn't guarantee relevant context - Incomplete Queries: Users provide minimal context, defeating your optimized indices - Domain-Specific Language: Technical documents use terms users don't know to query
Advanced Solutions Building on Your Vector Expertise: - HyDE Integration: Generate hypothetical documents that match your vector space - Query Decomposition: Break complex questions into your optimized retrieval patterns - Multi-Query Generation: Leverage multiple search perspectives across your hybrid system - Context Optimization: Intelligent window sizing that maximizes your retrieval quality
The Evolution from Search to Understanding¶
This session represents a fundamental shift in RAG thinking - from "find similar content" to "understand user intent":
Session 3 Achievement: Fast, accurate vector similarity search Session 4 Goal: Intelligent query understanding that maximizes retrieval success Session 5 Preview: Evaluating whether our query enhancements actually improve RAG quality
Your journey from basic chunking (Session 2) → optimized search (Session 3) → query intelligence (Session 4) → quality measurement (Session 5) forms the complete foundation for advanced RAG patterns you'll explore in Sessions 6-9.
Let's transform your high-performance vector system into a true query understanding engine! 🎯
Part 1: HyDE - Hypothetical Document Embeddings (25 minutes)¶
Understanding the Semantic Gap Problem¶
Traditional embedding models often struggle with the query-document mismatch problem:
# Demonstrating the semantic gap
class SemanticGapAnalyzer:
"""Analyze and demonstrate semantic gaps in retrieval."""
def __init__(self, embedding_model):
self.embedding_model = embedding_model
Core Functionality: This analyzer helps identify why traditional embedding models struggle with query-document matching by quantifying the semantic distance between user questions and stored content.
def analyze_query_document_gap(self, query: str, documents: List[str]) -> Dict:
"""Analyze semantic gaps between queries and documents."""
# Embed query and documents
query_embedding = self.embedding_model.encode([query])
doc_embeddings = self.embedding_model.encode(documents)
Embedding Process: We convert both the query and documents into vector representations using the same embedding model to ensure comparable semantic spaces.
# Calculate similarities
similarities = []
for doc_emb in doc_embeddings:
similarity = np.dot(query_embedding[0], doc_emb) / (
np.linalg.norm(query_embedding[0]) * np.linalg.norm(doc_emb)
)
similarities.append(similarity)
Similarity Computation: Using cosine similarity to measure semantic closeness, revealing gaps where relevant documents score poorly due to linguistic differences.
return {
'query': query,
'avg_similarity': np.mean(similarities),
'max_similarity': np.max(similarities),
'min_similarity': np.min(similarities),
'gap_analysis': self._analyze_gap(query, documents, similarities)
}
HyDE Implementation: Building on Vector Search Excellence¶
Why HyDE Enhances Your Optimized Vector System
Remember from Session 3 how vector similarity search relies on embedding space proximity? HyDE exploits this by generating documents that exist in the same semantic space as your indexed content, creating better query-document matching within your optimized vector infrastructure.
The HyDE Process: 1. Query Analysis: Understand the semantic intent of the user question 2. Hypothetical Generation: Create documents that would naturally contain the answer 3. Vector Integration: Embed these hypothetical documents using your embedding models 4. Enhanced Retrieval: Search your vector database using these improved representations
HyDE transforms semantic gaps into vector space advantages:
Step 1: Initialize the HyDE Query Enhancer
Understanding HyDE (Hypothetical Document Embeddings): HyDE solves a fundamental RAG problem - the semantic gap between how users ask questions and how information is stored in documents. Instead of searching with the query directly, HyDE generates hypothetical answers and searches with those.
Why This Approach Works: Questions and answers exist in different semantic spaces. By generating hypothetical documents that would answer the query, we search in the same semantic space as our stored documents, dramatically improving retrieval accuracy.
# Core HyDE setup
from typing import List, Dict, Any, Optional
import openai
from sentence_transformers import SentenceTransformer
import numpy as np
Essential Imports: We need these libraries for LLM communication, embedding generation, and numerical operations that power the HyDE system.
class HyDEQueryEnhancer:
"""Hypothetical Document Embeddings for query enhancement."""
def __init__(self, llm_model, embedding_model, temperature: float = 0.7):
self.llm_model = llm_model
self.embedding_model = embedding_model
self.temperature = temperature
Temperature Parameter: The 0.7 temperature balances creativity and accuracy in hypothetical document generation, ensuring diverse but relevant hypothetical answers.
Step 2: Configure Query Type Templates
Next, set up templates for different types of queries:
# Different HyDE templates for various query types
self.hyde_templates = {
'factual': self._factual_hyde_template,
'procedural': self._procedural_hyde_template,
'analytical': self._analytical_hyde_template,
'creative': self._creative_hyde_template
}
Step 3: Core Enhancement Method
Now implement the main query enhancement workflow:
def enhance_query_with_hyde(self, query: str,
query_type: str = 'factual',
num_hypotheticals: int = 3) -> Dict[str, Any]:
"""Generate hypothetical documents for query enhancement."""
print(f"Generating HyDE for query: {query[:100]}...")
# Step 1: Classify query type if not provided
if query_type == 'auto':
query_type = self._classify_query_type(query)
Query Classification: Automatically determining query type enables us to use appropriate templates for generating hypothetical documents that match the expected answer format.
# Step 2: Generate hypothetical documents
hypothetical_docs = self._generate_hypothetical_documents(
query, query_type, num_hypotheticals
)
Step 4: Create Enhanced Embeddings
Finally, combine the generated hypothetical documents into enhanced embeddings:
# Step 3: Create enhanced embeddings
enhanced_embeddings = self._create_enhanced_embeddings(
query, hypothetical_docs
)
Embedding Integration: Converting hypothetical documents into vector representations and combining them with the original query for improved semantic matching.
# Step 4: Return comprehensive enhancement data
return {
'original_query': query,
'query_type': query_type,
'hypothetical_documents': hypothetical_docs,
'enhanced_embedding': enhanced_embeddings['combined'],
'original_embedding': enhanced_embeddings['original'],
'hyde_embeddings': enhanced_embeddings['hyde_embeddings'],
'confidence_score': self._calculate_enhancement_confidence(
enhanced_embeddings
)
}
Educational Context: Understanding HyDE's Impact
This implementation demonstrates query enhancement working in harmony with your vector infrastructure from Session 3. Notice how: - Enhanced embeddings leverage your optimized embedding models - Confidence scoring helps evaluate query improvement quality - Query classification enables template-based document generation
The result bridges the semantic gap between user intent and document content, dramatically improving retrieval success in your hybrid search systems.
Step 1: Query Type Classification - Tailoring HyDE to Query Nature
def _classify_query_type(self, query: str) -> str:
"""Classify query type for appropriate HyDE template selection."""
classification_prompt = f"""
Classify the following query into one of these categories:
1. factual - Questions seeking specific facts or information
2. procedural - Questions asking how to do something or step-by-step processes
3. analytical - Questions requiring analysis, comparison, or interpretation
4. creative - Questions requiring creative or open-ended responses
Query: {query}
Return only the category name:
"""
Classification Strategy: Using the LLM to automatically categorize queries enables appropriate template selection for generating relevant hypothetical documents.
try:
response = self.llm_model.predict(classification_prompt).strip().lower()
if response in self.hyde_templates:
return response
except Exception as e:
print(f"Classification error: {e}")
# Default to factual
return 'factual'
Step 2A: Factual Query Template
Create templates that generate different document types. Start with factual queries:
def _factual_hyde_template(self, query: str) -> str:
"""Template for factual query types."""
return f"""
Write a detailed, informative document that comprehensively answers this question: {query}
The document should:
- Provide specific facts and data points
- Include relevant context and background information
- Use authoritative tone and precise language
- Cover multiple aspects of the topic
- Include examples where relevant
Document:
"""
Step 2B: Procedural Query Template
Handle step-by-step and how-to queries:
def _procedural_hyde_template(self, query: str) -> str:
"""Template for procedural query types."""
return f"""
Write a detailed how-to guide that explains: {query}
The guide should:
- Provide step-by-step instructions
- Include prerequisite requirements
- Mention common pitfalls and how to avoid them
- Explain the reasoning behind each step
- Include tips for success
Guide:
"""
Step 2C: Analytical Query Template
Support analytical and comparative queries:
def _analytical_hyde_template(self, query: str) -> str:
"""Template for analytical query types."""
return f"""
Write an analytical document that thoroughly examines: {query}
The analysis should:
- Present multiple perspectives or approaches
- Compare and contrast different options
- Discuss pros and cons
- Provide evidence-based reasoning
- Draw insightful conclusions
Analysis:
"""
Step 3A: Document Generation Setup
Generate multiple hypothetical documents with variations for better coverage:
def _generate_hypothetical_documents(self, query: str,
query_type: str,
num_docs: int) -> List[Dict]:
"""Generate multiple hypothetical documents with variations."""
base_template = self.hyde_templates[query_type]
hypothetical_docs = []
Document Variation Strategy: Generating multiple hypothetical documents with different temperature settings creates diverse perspectives that increase retrieval coverage.
for i in range(num_docs):
# Add variation to temperature for diversity
varied_temperature = self.temperature + (i * 0.1)
varied_temperature = min(varied_temperature, 1.0)
Step 3B: Document Generation Loop
Generate each hypothetical document with error handling:
try:
# Generate hypothetical document
prompt = base_template(query)
# Use varied temperature for diversity
doc_text = self.llm_model.predict(
prompt,
temperature=varied_temperature
)
Document Generation: Each iteration creates a hypothetical document using the query-appropriate template with temperature variation for diverse outputs.
hypothetical_docs.append({
'document': doc_text.strip(),
'temperature': varied_temperature,
'variation': i + 1,
'word_count': len(doc_text.split()),
'quality_score': self._assess_document_quality(doc_text, query)
})
print(f"Generated hypothetical document {i+1}/{num_docs}")
Metadata Collection: Tracking temperature, quality scores, and variations helps optimize the document generation process and selection.
Step 3C: Quality Ranking
Sort generated documents by quality for best results:
# Sort by quality score
hypothetical_docs.sort(key=lambda x: x['quality_score'], reverse=True)
return hypothetical_docs
Step 4A: Create Base Embeddings
Convert query and hypothetical documents into embedding vectors:
def _create_enhanced_embeddings(self, query: str,
hypothetical_docs: List[Dict]) -> Dict:
"""Create enhanced embeddings combining query and hypothetical docs."""
# Original query embedding
original_embedding = self.embedding_model.encode([query])[0]
# Hypothetical document embeddings
hyde_texts = [doc['document'] for doc in hypothetical_docs]
hyde_embeddings = self.embedding_model.encode(hyde_texts)
Step 4B: Weight and Combine Embeddings
Create quality-weighted combination of hypothetical documents:
# Combine embeddings using weighted average
weights = self._calculate_document_weights(hypothetical_docs)
# Weighted combination
weighted_hyde = np.average(hyde_embeddings, axis=0, weights=weights)
Weighted Averaging: Quality-based weighting ensures higher-quality hypothetical documents have more influence on the final embedding representation.
# Combine original query with weighted hypotheticals
# Give more weight to original query to preserve intent
combined_embedding = (0.3 * original_embedding +
0.7 * weighted_hyde)
# Normalize the combined embedding
combined_embedding = combined_embedding / np.linalg.norm(combined_embedding)
Step 4C: Return Comprehensive Results
Provide all embedding variations for analysis and debugging:
return {
'original': original_embedding,
'hyde_embeddings': hyde_embeddings,
'weighted_hyde': weighted_hyde,
'combined': combined_embedding,
'weights': weights
}
Step 4D: Quality-Based Weighting
Calculate document importance based on quality scores:
def _calculate_document_weights(self, hypothetical_docs: List[Dict]) -> List[float]:
"""Calculate weights for hypothetical documents based on quality."""
quality_scores = [doc['quality_score'] for doc in hypothetical_docs]
# Softmax normalization for weights
exp_scores = np.exp(np.array(quality_scores))
weights = exp_scores / np.sum(exp_scores)
return weights.tolist()
Part 2: Query Expansion and Reformulation (25 minutes)¶
Beyond HyDE: Multi-Strategy Query Intelligence¶
Building on HyDE's hypothetical document generation, query expansion tackles the problem from a different angle - instead of creating new content, it enriches the original query with related terms and concepts that increase the likelihood of matching relevant documents in your vector database.
The Expansion Strategy: - HyDE (Part 1): Generate hypothetical documents → embed → search - Query Expansion (Part 2): Enrich original query → search with expanded terms - Combined Power: Use both approaches for maximum retrieval success
This multi-layered approach ensures your optimized vector search from Session 3 captures content through multiple semantic pathways:
Step 5A: Query Expansion Setup¶
Set up the intelligent query expansion system with multiple strategies:
# Advanced query expansion system
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import nltk
from nltk.corpus import wordnet
from collections import defaultdict
Query Expansion Libraries: These imports provide statistical analysis, semantic relationships, and domain-specific term extraction capabilities.
class IntelligentQueryExpander:
"""Advanced query expansion using multiple strategies."""
def __init__(self, llm_model, domain_corpus: Optional[List[str]] = None):
self.llm_model = llm_model
self.domain_corpus = domain_corpus
Step 5B: Configure Expansion Strategies¶
Set up different approaches to query expansion:
# Initialize expansion strategies
self.expansion_strategies = {
'synonym': self._synonym_expansion,
'semantic': self._semantic_expansion,
'contextual': self._contextual_expansion,
'domain_specific': self._domain_specific_expansion
}
Strategy Framework: Multiple expansion approaches ensure comprehensive coverage - from simple synonyms to complex semantic relationships and domain-specific terminology.
# Domain-specific TF-IDF if corpus provided
if domain_corpus:
self.domain_tfidf = TfidfVectorizer(
max_features=10000,
stop_words='english',
ngram_range=(1, 3)
)
self.domain_tfidf.fit(domain_corpus)
Step 5C: Core Expansion Method¶
Implement the main query expansion workflow:
def expand_query(self, query: str,
strategies: List[str] = ['semantic', 'contextual'],
max_expansions: int = 5) -> Dict[str, Any]:
"""Expand query using multiple strategies."""
expansion_results = {}
combined_expansions = set()
Multi-Strategy Coordination: Coordinating different expansion approaches allows leveraging the strengths of each method for comprehensive query enhancement.
# Apply each expansion strategy
for strategy in strategies:
if strategy in self.expansion_strategies:
expansions = self.expansion_strategies[strategy](
query, max_expansions
)
expansion_results[strategy] = expansions
combined_expansions.update(expansions)
Step 5D: Combine and Return Results¶
Create the final expanded query with comprehensive results:
# Create final expanded query
expanded_query = self._create_expanded_query(
query, list(combined_expansions)
)
return {
'original_query': query,
'expansions_by_strategy': expansion_results,
'all_expansions': list(combined_expansions),
'expanded_query': expanded_query,
'expansion_count': len(combined_expansions)
}
Step 5: Semantic Expansion Using LLM
def _semantic_expansion(self, query: str, max_expansions: int) -> List[str]:
"""Generate semantic expansions using LLM understanding."""
semantic_prompt = f"""
Given this query, generate {max_expansions} semantically related terms or phrases that would help find relevant information:
Query: {query}
Requirements:
1. Include synonyms and related concepts
2. Add domain-specific terminology if applicable
3. Include both broader and narrower terms
4. Focus on terms likely to appear in relevant documents
Return only the expanded terms, one per line:
"""
Semantic Expansion Strategy: Using LLM understanding to generate related terms creates expansions that capture conceptual relationships beyond simple synonyms.
try:
response = self.llm_model.predict(semantic_prompt)
expansions = [
term.strip()
for term in response.strip().split('\n')
if term.strip() and not term.strip().startswith(('-', '*', '•'))
]
return expansions[:max_expansions]
Response Processing: Filtering and cleaning LLM output ensures we get high-quality expansion terms while removing formatting artifacts.
Step 6: Contextual Query Reformulation
def _contextual_expansion(self, query: str, max_expansions: int) -> List[str]:
"""Generate contextual reformulations of the query."""
reformulation_prompt = f"""
Reformulate this query in {max_expansions} different ways that express the same information need:
Original Query: {query}
Create variations that:
1. Use different phrasing and vocabulary
2. Approach the question from different angles
3. Include both specific and general formulations
4. Maintain the original intent and meaning
Reformulations:
"""
Reformulation Strategy: Creating multiple ways to express the same information need increases the likelihood of matching documents with different linguistic styles.
try:
response = self.llm_model.predict(reformulation_prompt)
reformulations = [
reform.strip().rstrip('.')
for reform in response.strip().split('\n')
if reform.strip() and '?' in reform or len(reform.split()) > 3
]
return reformulations[:max_expansions]
Quality Filtering: Ensuring reformulations are substantive questions rather than fragments improves the quality of query variations.
Step 6A: Multi-Query Generator Setup¶
Set up the multi-query generation system with different perspectives:
# Multi-query generation system
class MultiQueryGenerator:
"""Generate multiple query perspectives for comprehensive retrieval."""
def __init__(self, llm_model):
self.llm_model = llm_model
Perspective Framework: Multiple query generation approaches ensure comprehensive coverage by viewing the same information need from different angles and specificity levels.
self.query_perspectives = {
'decomposition': self._decompose_complex_query,
'specificity_levels': self._generate_specificity_variants,
'temporal_variants': self._generate_temporal_variants,
'perspective_shifts': self._generate_perspective_variants,
'domain_focused': self._generate_domain_variants
}
Step 6B: Multi-Query Generation Workflow¶
Coordinate multiple perspective generation strategies:
def generate_multi_query_set(self, query: str,
perspectives: List[str] = None,
total_queries: int = 8) -> Dict[str, Any]:
"""Generate comprehensive query set from multiple perspectives."""
if perspectives is None:
perspectives = ['decomposition', 'specificity_levels', 'perspective_shifts']
all_queries = {'original': query}
generation_metadata = {}
Query Distribution Strategy: Intelligently distributing query generation across different perspectives ensures balanced coverage of the information space.
# Distribute query generation across perspectives
queries_per_perspective = total_queries // len(perspectives)
remaining_queries = total_queries % len(perspectives)
Step 6C: Generate Perspective Variants¶
Create queries from each perspective and collect metadata:
for i, perspective in enumerate(perspectives):
num_queries = queries_per_perspective
if i < remaining_queries:
num_queries += 1
generated = self.query_perspectives[perspective](query, num_queries)
all_queries[perspective] = generated
Perspective Execution: Each perspective method generates query variants using its specific approach, whether decomposition, specificity adjustment, or angle shifting.
generation_metadata[perspective] = {
'count': len(generated),
'method': perspective
}
# Flatten and deduplicate
flattened_queries = self._flatten_and_deduplicate(all_queries)
Step 6D: Return Comprehensive Query Set¶
Package all generated query variants with metadata:
return {
'original_query': query,
'query_variants': flattened_queries,
'queries_by_perspective': all_queries,
'generation_metadata': generation_metadata,
'total_variants': len(flattened_queries)
}
Step 7: Complex Query Decomposition
def _decompose_complex_query(self, query: str, num_queries: int) -> List[str]:
"""Decompose complex queries into simpler sub-questions."""
decomposition_prompt = f"""
Break down this complex query into {num_queries} simpler, focused sub-questions that together would fully answer the original question:
Complex Query: {query}
Requirements:
1. Each sub-question should be independently searchable
2. Sub-questions should cover different aspects of the main query
3. Avoid redundancy between sub-questions
4. Maintain logical flow and completeness
Sub-questions:
"""
Decomposition Strategy: Breaking complex questions into focused sub-questions enables more precise retrieval and comprehensive coverage of multifaceted information needs.
try:
response = self.llm_model.predict(decomposition_prompt)
sub_questions = [
q.strip().rstrip('?') + '?'
for q in response.strip().split('\n')
if q.strip() and ('?' in q or len(q.split()) > 3)
]
Question Processing: Ensuring proper question formatting and filtering maintains quality while standardizing the output format.
# Ensure we have question marks
sub_questions = [
q if q.endswith('?') else q + '?'
for q in sub_questions
]
return sub_questions[:num_queries]
except Exception as e:
print(f"Decomposition error: {e}")
return []
Step 8: Specificity Level Variants
def _generate_specificity_variants(self, query: str, num_queries: int) -> List[str]:
"""Generate queries at different levels of specificity."""
specificity_prompt = f"""
Generate {num_queries} variants of this query at different levels of specificity:
Original Query: {query}
Create variants that range from:
1. Very broad/general versions
2. Medium specificity versions
3. Very specific/detailed versions
Each variant should maintain the core intent but adjust the scope:
"""
Specificity Adjustment: Creating query variants at different granularity levels ensures retrieval of both overview information and detailed specifics.
try:
response = self.llm_model.predict(specificity_prompt)
variants = [
variant.strip()
for variant in response.strip().split('\n')
if variant.strip() and len(variant.split()) > 2
]
return variants[:num_queries]
except Exception as e:
print(f"Specificity variant error: {e}")
return []
Part 3: Context Window Optimization (20 minutes)¶
Smart Context Assembly¶
Optimize context windows for maximum information density:
# Context window optimization system
class ContextWindowOptimizer:
"""Optimize context assembly for maximum information density."""
def __init__(self, llm_tokenizer, max_context_tokens: int = 4000):
self.tokenizer = llm_tokenizer
self.max_context_tokens = max_context_tokens
Token Management: Setting up tokenizer and context limits enables precise control over the information density within LLM context windows.
# Context optimization strategies
self.optimization_strategies = {
'relevance_ranking': self._relevance_based_selection,
'diversity_clustering': self._diversity_based_selection,
'hierarchical_summary': self._hierarchical_summarization,
'semantic_compression': self._semantic_compression
}
Strategy Selection: Multiple optimization approaches allow choosing the best method based on the specific characteristics of retrieved content and query requirements.
def optimize_context_window(self, query: str,
retrieved_chunks: List[Dict],
strategy: str = 'relevance_ranking') -> Dict[str, Any]:
"""Optimize context window using specified strategy."""
# Calculate available token budget
query_tokens = len(self.tokenizer.encode(query))
available_tokens = self.max_context_tokens - query_tokens - 200 # Buffer
Budget Calculation: Precise token budgeting ensures context stays within limits while reserving space for the query and response generation.
# Apply optimization strategy
optimized_context = self.optimization_strategies[strategy](
query, retrieved_chunks, available_tokens
)
return {
'optimized_context': optimized_context['context'],
'selected_chunks': optimized_context['chunks'],
'context_tokens': optimized_context['token_count'],
'efficiency_score': optimized_context['efficiency'],
'strategy_used': strategy,
'original_chunk_count': len(retrieved_chunks)
}
Step 9: Relevance-Based Context Selection
def _relevance_based_selection(self, query: str, chunks: List[Dict],
token_budget: int) -> Dict[str, Any]:
"""Select chunks based on relevance scores and token efficiency."""
# Calculate relevance scores and token costs
chunk_analysis = []
for i, chunk in enumerate(chunks):
content = chunk['document'].page_content
tokens = len(self.tokenizer.encode(content))
relevance = 1 - chunk.get('similarity_score', 0.5) # Convert distance to similarity
# Calculate efficiency: relevance per token
efficiency = relevance / tokens if tokens > 0 else 0
Efficiency Calculation: Computing relevance-per-token ratios enables optimal selection of chunks that provide maximum information value within token constraints.
chunk_analysis.append({
'index': i,
'content': content,
'tokens': tokens,
'relevance': relevance,
'efficiency': efficiency,
'metadata': chunk.get('metadata', {})
})
# Sort by efficiency (relevance per token)
chunk_analysis.sort(key=lambda x: x['efficiency'], reverse=True)
Efficiency Ranking: Sorting chunks by efficiency ensures the most valuable content per token is prioritized for inclusion in the context window.
# Select chunks within token budget
selected_chunks = []
total_tokens = 0
for chunk_data in chunk_analysis:
if total_tokens + chunk_data['tokens'] <= token_budget:
selected_chunks.append(chunk_data)
total_tokens += chunk_data['tokens']
else:
break
Budget Management: Greedy selection based on efficiency ensures optimal use of available tokens while staying within limits.
# Assemble context
context_parts = []
for chunk_data in selected_chunks:
source = chunk_data['metadata'].get('source', 'Unknown')
context_parts.append(f"[Source: {source}]\n{chunk_data['content']}")
final_context = '\n\n'.join(context_parts)
return {
'context': final_context,
'chunks': selected_chunks,
'token_count': total_tokens,
'efficiency': np.mean([c['efficiency'] for c in selected_chunks])
}
Step 10: Hierarchical Context Summarization
def _hierarchical_summarization(self, query: str, chunks: List[Dict],
token_budget: int) -> Dict[str, Any]:
"""Create hierarchical summaries when context exceeds budget."""
# Group chunks by source/topic
chunk_groups = self._group_chunks_by_source(chunks)
summarized_chunks = []
total_tokens = 0
Grouping Strategy: Organizing chunks by source or topic enables intelligent summarization that preserves information coherence while reducing token usage.
for group_key, group_chunks in chunk_groups.items():
# Calculate total tokens for this group
group_content = '\n'.join([
chunk['document'].page_content for chunk in group_chunks
])
group_tokens = len(self.tokenizer.encode(group_content))
if group_tokens > token_budget // 4: # Group too large, summarize
summary = self._summarize_chunk_group(query, group_chunks)
summary_tokens = len(self.tokenizer.encode(summary))
Adaptive Summarization: Groups exceeding 25% of token budget are summarized to maintain information while fitting constraints.
if total_tokens + summary_tokens <= token_budget:
summarized_chunks.append({
'content': summary,
'tokens': summary_tokens,
'type': 'summary',
'source_count': len(group_chunks),
'group_key': group_key
})
total_tokens += summary_tokens
Summary Integration: Summarized content is tracked with metadata indicating its condensed nature and original source count.
else:
# Use original chunks if they fit
for chunk in group_chunks:
content = chunk['document'].page_content
chunk_tokens = len(self.tokenizer.encode(content))
if total_tokens + chunk_tokens <= token_budget:
summarized_chunks.append({
'content': content,
'tokens': chunk_tokens,
'type': 'original',
'group_key': group_key,
'metadata': chunk.get('metadata', {})
})
total_tokens += chunk_tokens
Original Content Preservation: Smaller groups remain intact to preserve detailed information when token budget allows.
# Assemble final context
context_parts = []
for chunk in summarized_chunks:
if chunk['type'] == 'summary':
context_parts.append(f"[Summary from {chunk['source_count']} sources]\n{chunk['content']}")
else:
source = chunk['metadata'].get('source', chunk['group_key'])
context_parts.append(f"[Source: {source}]\n{chunk['content']}")
final_context = '\n\n'.join(context_parts)
return {
'context': final_context,
'chunks': summarized_chunks,
'token_count': total_tokens,
'efficiency': total_tokens / token_budget
}
Part 4: Advanced Prompt Engineering for RAG (30 minutes)¶
Context-Aware Prompt Templates¶
Design prompts that maximize the effectiveness of retrieved context:
# Advanced RAG prompt engineering
class RAGPromptEngineer:
"""Advanced prompt engineering specifically for RAG systems."""
def __init__(self, llm_model):
self.llm_model = llm_model
Prompt Engineering Foundation: Specialized RAG prompt engineering requires understanding how context and query types interact to produce optimal responses.
# Domain-specific prompt templates
self.prompt_templates = {
'factual_qa': self._factual_qa_template,
'analytical': self._analytical_template,
'procedural': self._procedural_template,
'comparative': self._comparative_template,
'creative': self._creative_template,
'technical': self._technical_template
}
Template Framework: Different query types require specialized prompt structures that guide the LLM toward appropriate response patterns and information organization.
# Prompt optimization techniques
self.optimization_techniques = {
'chain_of_thought': self._add_chain_of_thought,
'source_weighting': self._add_source_weighting,
'confidence_calibration': self._add_confidence_calibration,
'multi_step_reasoning': self._add_multi_step_reasoning
}
Optimization Techniques: Advanced prompt optimization methods enhance response quality through structured reasoning, source attribution, and confidence assessment.
def engineer_rag_prompt(self, query: str, context: str,
query_type: str = 'factual_qa',
optimizations: List[str] = ['confidence_calibration']) -> Dict[str, Any]:
"""Engineer optimized RAG prompt with specified techniques."""
# Start with base template
base_prompt = self.prompt_templates[query_type](query, context)
Template Selection: Choosing the appropriate base template based on query type ensures the prompt structure aligns with the expected response format.
# Apply optimization techniques
optimized_prompt = base_prompt
optimization_metadata = {}
for technique in optimizations:
if technique in self.optimization_techniques:
result = self.optimization_techniques[technique](
optimized_prompt, query, context
)
optimized_prompt = result['prompt']
optimization_metadata[technique] = result['metadata']
Sequential Optimization: Applying optimization techniques iteratively builds up prompt sophistication while tracking the contribution of each enhancement.
return {
'base_prompt': base_prompt,
'optimized_prompt': optimized_prompt,
'optimizations_applied': optimizations,
'optimization_metadata': optimization_metadata,
'query_type': query_type
}
Step 11: Factual QA Template with Source Attribution
def _factual_qa_template(self, query: str, context: str) -> str:
"""Optimized template for factual question answering."""
return f"""You are an expert information analyst. Use the provided context to answer the user's question accurately and comprehensively.
CONTEXT INFORMATION:
{context}
QUESTION: {query}
Context Integration: The template establishes clear boundaries between context and query, ensuring the LLM maintains focus on provided information rather than generating unsupported content.
INSTRUCTIONS:
1. Base your answer strictly on the provided context
2. If the context doesn't contain sufficient information, clearly state this limitation
3. Cite specific sources when making claims
4. Provide direct quotes when appropriate
5. Distinguish between facts and interpretations
6. If there are contradictions in the sources, acknowledge and explain them
Response Guidelines: Specific instructions ensure accurate, well-sourced responses that acknowledge limitations and handle conflicting information appropriately.
ANSWER STRUCTURE:
- Start with a direct answer to the question
- Provide supporting details from the context
- Include relevant quotes with source attribution
- Note any limitations or uncertainties
ANSWER:"""
Step 12: Chain-of-Thought Enhancement
def _add_chain_of_thought(self, base_prompt: str, query: str,
context: str) -> Dict[str, Any]:
"""Add chain-of-thought reasoning to prompt."""
cot_enhancement = """
REASONING PROCESS:
Before providing your final answer, work through your reasoning step-by-step:
1. INFORMATION EXTRACTION: What key information from the context relates to the question?
2. RELEVANCE ASSESSMENT: How does each piece of information address the question?
3. SYNTHESIS: How do you combine the information to form a comprehensive answer?
4. CONFIDENCE EVALUATION: How confident are you in your answer based on the available evidence?
Structured Reasoning: Chain-of-thought prompting guides the LLM through explicit reasoning steps, improving accuracy and providing insight into the decision-making process.
Let me work through this systematically:
STEP 1 - Information Extraction:
[Identify and list key relevant information from context]
STEP 2 - Relevance Assessment:
[Evaluate how each piece of information addresses the question]
STEP 3 - Synthesis:
[Combine information into a coherent answer]
STEP 4 - Confidence Evaluation:
[Assess confidence level and note any limitations]
FINAL ANSWER:
"""
Reasoning Framework: The four-step process ensures systematic analysis of context, leading to more thorough and reliable responses.
enhanced_prompt = base_prompt + cot_enhancement
return {
'prompt': enhanced_prompt,
'metadata': {
'technique': 'chain_of_thought',
'added_tokens': len(cot_enhancement.split()),
'reasoning_steps': 4
}
}
Step 13: Confidence Calibration
Confidence calibration helps RAG systems provide transparent reliability assessments:
def _add_confidence_calibration(self, base_prompt: str, query: str,
context: str) -> Dict[str, Any]:
"""Add confidence calibration to improve answer reliability."""
# Define confidence assessment framework
confidence_enhancement = self._build_confidence_framework()
Confidence Framework: Structured assessment dimensions ensure consistent evaluation of evidence strength, answer completeness, and overall confidence.
# Apply confidence enhancement to base prompt
enhanced_prompt = base_prompt + confidence_enhancement
return {
'prompt': enhanced_prompt,
'metadata': {
'technique': 'confidence_calibration',
'added_tokens': len(confidence_enhancement.split()),
'assessment_dimensions': 3
}
}
Metadata Tracking: Recording technique details and token costs enables optimization of prompt engineering strategies.
Now create the confidence assessment framework:
def _build_confidence_framework(self) -> str:
"""Build structured confidence assessment framework."""
return """
CONFIDENCE ASSESSMENT:
For your final answer, provide a confidence assessment:
1. EVIDENCE STRENGTH: Rate the strength of evidence (Strong/Medium/Weak)
- Strong: Multiple reliable sources with consistent information
- Medium: Some sources with mostly consistent information
- Weak: Limited or inconsistent source information
"""
Evidence Strength Assessment: The first dimension evaluates the reliability and consistency of source information.
# Continue with completeness assessment
completeness_section = """
2. COMPLETENESS: Rate answer completeness (Complete/Partial/Incomplete)
- Complete: Context fully addresses the question
- Partial: Context addresses some aspects but missing key information
- Incomplete: Context has minimal relevant information
"""
Completeness Evaluation: The second dimension assesses how thoroughly the context addresses the user's question.
# Add confidence scoring system
confidence_scoring = """
3. CONFIDENCE SCORE: Provide an overall confidence score (0-100%)
- 90-100%: Highly confident, strong evidence base
- 70-89%: Confident, adequate evidence base
- 50-69%: Moderately confident, limited evidence
- Below 50%: Low confidence, insufficient evidence
Include this assessment at the end of your response:
CONFIDENCE ASSESSMENT:
- Evidence Strength: [Strong/Medium/Weak]
- Completeness: [Complete/Partial/Incomplete]
- Confidence Score: [0-100%]
- Key Limitations: [List any significant limitations or uncertainties]
"""
return base_framework + completeness_section + confidence_scoring
Dynamic Prompt Adaptation¶
Adapt prompts based on context quality and query complexity. This system analyzes input characteristics and selects optimal prompt strategies:
# Dynamic prompt adaptation system
class DynamicPromptAdapter:
"""Adapt prompts based on context quality and query characteristics."""
def __init__(self, llm_model):
self.llm_model = llm_model
Adaptive Framework: Dynamic prompt adaptation requires analyzing both context quality and query characteristics to select appropriate response strategies.
def adapt_prompt_dynamically(self, query: str, context: str,
retrieval_metadata: Dict) -> Dict[str, Any]:
"""Dynamically adapt prompt based on context and query analysis."""
# Step 1: Analyze input characteristics
context_analysis = self._analyze_context_quality(context, query)
query_analysis = self._analyze_query_characteristics(query)
Input Analysis: Understanding context quality and query complexity enables intelligent selection of appropriate prompt engineering techniques.
# Step 2: Select optimal strategy
prompt_strategy = self._select_prompt_strategy(
context_analysis, query_analysis, retrieval_metadata
)
Strategy Selection: Based on analysis results, choose the most effective prompt structure and enhancement techniques.
# Step 3: Generate adapted prompt
adapted_prompt = self._generate_adapted_prompt(
query, context, prompt_strategy
)
return {
'adapted_prompt': adapted_prompt,
'context_analysis': context_analysis,
'query_analysis': query_analysis,
'strategy': prompt_strategy,
'adaptation_reasoning': self._explain_adaptation_reasoning(
context_analysis, query_analysis, prompt_strategy
)
}
Step 14: Context Quality Assessment
Context quality analysis involves multiple dimensions to determine optimal prompt strategies:
def _analyze_context_quality(self, context: str, query: str) -> Dict[str, Any]:
"""Analyze the quality and characteristics of retrieved context."""
# Collect basic context metrics
basic_metrics = self._collect_basic_metrics(context)
Basic Metrics Collection: Fundamental measurements of context length and diversity provide baseline quality indicators.
Source Analysis: Understanding source count and diversity helps assess the breadth of information available for response generation.
# Assess relevance using LLM
relevance_score = self._assess_context_relevance(context, query)
# Calculate information density
information_density = self._calculate_information_density(context)
Quality Assessment: LLM-based relevance scoring and information density calculation provide sophisticated quality metrics.
# Generate overall quality classification
quality_level = self._classify_context_quality(
basic_metrics, source_analysis, relevance_score, information_density
)
return {
**basic_metrics,
**source_analysis,
'relevance_score': relevance_score,
'information_density': information_density,
'quality_level': quality_level
}
Comprehensive Assessment: Combining multiple quality dimensions into a unified assessment enables intelligent prompt adaptation.
Now implement the supporting methods:
def _collect_basic_metrics(self, context: str) -> Dict[str, Any]:
"""Collect basic context metrics."""
return {
'length': len(context.split()),
'diversity_score': self._calculate_context_diversity(context)
}
Basic Metrics: Length and diversity scores provide fundamental context characteristics for quality assessment.
def _analyze_context_sources(self, context: str) -> Dict[str, Any]:
"""Analyze source characteristics in context."""
return {
'source_count': context.count('[Source:'),
'source_diversity': self._analyze_source_diversity(context)
}
Source Analysis: Source count and diversity indicate the breadth and reliability of information in the context.
def _assess_context_relevance(self, context: str, query: str) -> float:
"""Use LLM to assess context relevance to query."""
relevance_prompt = f"""
Rate the relevance of this context to the given query on a scale of 0.0 to 1.0:
Query: {query}
Context: {context[:1000]}...
Consider:
1. How directly the context addresses the query
2. The completeness of information provided
3. The accuracy and reliability of information
Return only a number between 0.0 and 1.0:
"""
Relevance Assessment: LLM-based relevance scoring provides intelligent evaluation of context-query alignment beyond simple keyword matching.
try:
response = self.llm_model.predict(relevance_prompt).strip()
relevance_score = float(response)
return max(0.0, min(1.0, relevance_score))
except:
return 0.5 # Default score if assessment fails
Hands-On Exercise: Build Query Enhancement Pipeline¶
Your Mission¶
Create a comprehensive query enhancement system that combines HyDE, multi-query generation, and context optimization.
Requirements:¶
- HyDE Implementation: Generate hypothetical documents for semantic gap bridging
- Query Expansion: Multi-strategy query expansion and reformulation
- Multi-Query System: Generate query variants from multiple perspectives
- Context Optimization: Smart context window management with token budgeting
- Prompt Engineering: Dynamic prompt adaptation based on context quality
Implementation Framework:¶
The comprehensive system integrates all enhancement techniques into a unified pipeline:
# Complete query enhancement system
class ComprehensiveQueryEnhancer:
"""Complete query enhancement pipeline for RAG systems."""
def __init__(self, llm_model, embedding_model, tokenizer):
# Initialize all enhancement components
self._initialize_components(llm_model, embedding_model, tokenizer)
Component Integration: Initializing all enhancement systems enables coordinated query processing through multiple improvement strategies.
def _initialize_components(self, llm_model, embedding_model, tokenizer):
"""Initialize all query enhancement components."""
self.hyde_enhancer = HyDEQueryEnhancer(llm_model, embedding_model)
self.query_expander = IntelligentQueryExpander(llm_model)
self.multi_query_gen = MultiQueryGenerator(llm_model)
self.context_optimizer = ContextWindowOptimizer(tokenizer)
self.prompt_engineer = RAGPromptEngineer(llm_model)
self.dynamic_adapter = DynamicPromptAdapter(llm_model)
System Architecture: Each component handles a specific aspect of query enhancement, from semantic gap bridging to prompt optimization.
async def comprehensive_enhancement(self, query: str,
vector_store,
enhancement_config: Dict) -> Dict[str, Any]:
"""Run comprehensive query enhancement pipeline."""
results = {'original_query': query}
# Phase 1: Query Enhancement
await self._apply_query_enhancements(query, results, enhancement_config)
Pipeline Coordination: The enhancement pipeline processes queries through multiple phases for comprehensive improvement.
# Phase 2: Enhanced Retrieval
enhanced_retrieval = await self._perform_enhanced_retrieval(
results, vector_store, enhancement_config
)
results['enhanced_retrieval'] = enhanced_retrieval
Enhanced Retrieval: Using improved queries to retrieve better context from the vector store.
# Phase 3: Context and Prompt Optimization
final_results = await self._optimize_context_and_prompts(
query, results, enhanced_retrieval
)
return final_results
Now implement the enhancement phases:
async def _apply_query_enhancements(self, query: str, results: Dict,
config: Dict):
"""Apply configured query enhancement techniques."""
# HyDE Enhancement
if config.get('use_hyde', True):
results['hyde_enhancement'] = self.hyde_enhancer.enhance_query_with_hyde(query)
# Query Expansion
if config.get('use_expansion', True):
results['query_expansion'] = self.query_expander.expand_query(query)
# Multi-Query Generation
if config.get('use_multi_query', True):
results['multi_query'] = self.multi_query_gen.generate_multi_query_set(query)
Selective Enhancement: Configuration-driven enhancement allows optimal performance by enabling only beneficial techniques for specific use cases.
async def _optimize_context_and_prompts(self, query: str, results: Dict,
enhanced_retrieval: Dict) -> Dict:
"""Optimize context window and engineer prompts."""
# Context Optimization
context_result = self.context_optimizer.optimize_context_window(
query, enhanced_retrieval['combined_results']
)
results['optimized_context'] = context_result
# Dynamic Prompt Engineering
prompt_result = self.dynamic_adapter.adapt_prompt_dynamically(
query, context_result['optimized_context'], enhanced_retrieval
)
results['engineered_prompt'] = prompt_result
return results
** Chapter Summary**¶
What You've Built¶
- ✅ HyDE system for semantic gap bridging with multiple hypothetical document strategies
- ✅ Multi-strategy query expansion using semantic, contextual, and domain-specific techniques
- ✅ Multi-query generation from different perspectives and specificity levels
- ✅ Context window optimization with relevance ranking and hierarchical summarization
- ✅ Advanced prompt engineering with dynamic adaptation and confidence calibration
Key Technical Skills Learned¶
- Query Enhancement: HyDE, expansion strategies, and multi-query generation
- Context Optimization: Token budgeting, relevance ranking, and smart summarization
- Prompt Engineering: Template design, chain-of-thought, and confidence calibration
- Dynamic Adaptation: Context-aware prompt selection and quality assessment
- Pipeline Integration: Combining multiple enhancement techniques effectively
Performance Improvements¶
- Retrieval Relevance: 25-35% improvement with HyDE and query expansion
- Context Utilization: 40-50% better information density in optimized contexts
- Answer Quality: 20-30% improvement with engineered prompts and confidence calibration
- User Satisfaction: Significant improvement in handling ambiguous and complex queries
Multiple Choice Test - Session 4¶
Test your understanding of query enhancement and context augmentation:
Question 1: What is the primary purpose of HyDE (Hypothetical Document Embeddings)?
A) To generate multiple query variations
B) To bridge the semantic gap between queries and documents
C) To compress document embeddings
D) To speed up retrieval performance
Question 2: When implementing query decomposition, which approach is most effective for complex questions?
A) Random sentence splitting
B) Breaking questions into answerable sub-questions using LLMs
C) Fixed-length query segments
D) Keyword-based fragmentation
Question 3: What is the key advantage of multi-query generation in RAG systems?
A) Reduced computational cost
B) Faster query processing
C) Comprehensive coverage of different query perspectives
D) Simplified system architecture
Question 4: In context window optimization, what factor is most important for maintaining quality?
A) Maximum token count
B) Processing speed
C) Balance between relevance and information density
D) Number of source documents
Question 5: Which prompt engineering technique is most effective for improving RAG response quality?
A) Longer prompts with more examples
B) Chain-of-thought reasoning with context integration
C) Simple template-based prompts
D) Keyword-heavy prompts
Session 4 Mastery Summary¶
What You've Accomplished: You've transformed your RAG system from simple similarity search to intelligent query understanding by mastering:
✅ HyDE Implementation: Bridging semantic gaps through hypothetical document generation ✅ Multi-Strategy Query Expansion: Enriching queries with semantic, contextual, and domain-specific terms ✅ Query Decomposition: Breaking complex questions into manageable retrieval tasks ✅ Context Optimization: Intelligent window sizing and prompt engineering for better generation ✅ Integration Mastery: Combining query enhancement with your Session 3 vector optimization
Your RAG Evolution Journey So Far: - Session 2: Smart document preprocessing and hierarchical chunking - Session 3: High-performance vector databases and hybrid search optimization - Session 4: Intelligent query understanding and semantic gap bridging ✅ - Session 5 Next: Measuring and validating these enhancements actually work
Critical Bridge to Session 5: Proving Enhancement Value¶
The Essential Question: Do your query enhancements actually improve RAG quality?
You've built sophisticated query intelligence - HyDE, expansion, decomposition, and context optimization. But without proper evaluation, you can't know if these techniques truly enhance user experience or just add complexity.
Session 5 Will Answer: - Which enhancements provide measurable quality improvements? - How to quantify the impact of HyDE vs. query expansion vs. context optimization? - What metrics reveal true RAG performance beyond simple accuracy? - How to set up A/B tests that prove your enhancements work in production?
Preparation for Evaluation Excellence¶
- Document your baseline: RAG performance before query enhancements
- Create test query sets: Challenging questions that test each enhancement technique
- Track enhancement metrics: Response quality, retrieval success, user satisfaction
- Prepare comparison scenarios: Original queries vs. enhanced queries across diverse topics
The Next Challenge: Transform your sophisticated query enhancement system into a measurably superior RAG experience through rigorous evaluation and quality assessment.
Ready to prove your enhancements deliver real value? Let's master RAG evaluation! 📊
Navigation¶
Previous: Session 3 - Vector Databases & Search Optimization
Optional Deep Dive Modules:
- 🔬 Module A: Advanced Query Understanding - Deep dive into query analysis and understanding systems