Skip to content

🎯📝⚙️ Session 4: Query Enhancement & Context Augmentation

🎯📝⚙️ Learning Path Overview

This session offers three distinct learning paths designed to match your goals and time investment:

Focus: Understanding concepts and architecture

Activities: Query enhancement principles, semantic gap challenges, HyDE concepts

Ideal for: Decision makers, architects, overview learners

Focus: Guided implementation and analysis

Activities: Build working query enhancement systems, implement HyDE and expansion techniques

Ideal for: Developers, technical leads, hands-on learners

Focus: Complete implementation and customization

Activities: Advanced prompt engineering, multi-query systems, complete enhancement pipelines

Ideal for: Senior engineers, architects, specialists

In Sessions 1-3, you built a sophisticated RAG system with intelligent chunking, production-grade vector search, and optimized indexing. But when you deploy to real users, you discover a frustrating pattern: technically perfect retrieval that somehow misses what users actually want. The system finds semantically similar content, but users ask "Why didn't it find the troubleshooting guide when I asked 'My app crashes'?"

This is the semantic gap – the mismatch between how users express their needs and how information is stored in documents. This session bridges that gap with proven enhancement techniques: HyDE (Hypothetical Document Embeddings) that generates better search vectors, multi-query strategies that cast wider nets, and context optimization that maximizes the value of retrieved information. The goal is search that understands intent, not just similarity.

Optional Deep Dive Modules

Agentic RAG

The Semantic Gap Challenge

The root problem isn't with your search technology – it's with human communication patterns:

  • Semantic Mismatch: Users say "My app crashes" while documentation says "Application termination errors"
  • Incomplete Context: Users ask "How to deploy?" without specifying platform, environment, or stack
  • Domain Translation: Users know business problems, documents contain technical solutions
  • Question vs. Answer Language: Queries use problem language, documents use solution language

  • HyDE: Bridge the gap by generating hypothetical answers, then searching in answer-space

  • Query Expansion: Enrich user questions with domain terminology and related concepts
  • Multi-Query Generation: Attack the same information need from multiple angles
  • Context Optimization: Maximize information density while preserving coherence

🎯 Part 1: HyDE - Hypothetical Document Embeddings

Understanding the Semantic Gap Problem

The fundamental issue is that questions and answers live in different semantic neighborhoods. When a user asks "Why is my deployment failing?", they're expressing a problem state. But your documentation contains solution states: "Configure environment variables properly" or "Ensure Docker daemon is running." Traditional embedding models struggle with this query-document mismatch because they're trained to find similar language, not complementary information.

Semantic Gap Analysis Example

Here's how to analyze semantic gaps between queries and documents:

class SemanticGapAnalyzer:
    """Analyze semantic gaps in retrieval."""

    def __init__(self, embedding_model):
        self.embedding_model = embedding_model

    def analyze_query_document_gap(self, query: str,
                                 documents: List[str]) -> Dict:
        """Analyze semantic gaps between queries and documents."""

        # Embed query and documents
        query_embedding = self.embedding_model.encode([query])
        doc_embeddings = self.embedding_model.encode(documents)

Core Process: Converting queries and documents into comparable vector representations reveals semantic distance and matching challenges.

        # Calculate similarities using cosine similarity
        similarities = []
        for doc_emb in doc_embeddings:
            similarity = np.dot(query_embedding[0], doc_emb) / (
                np.linalg.norm(query_embedding[0]) * np.linalg.norm(doc_emb)
            )
            similarities.append(similarity)

        return {
            'query': query,
            'avg_similarity': np.mean(similarities),
            'max_similarity': np.max(similarities),
            'min_similarity': np.min(similarities),
            'gap_analysis': self._analyze_gap(query, documents, similarities)
        }

Analysis Results: The similarity scores reveal how traditional embedding approaches struggle with query-document semantic gaps, highlighting the need for enhancement techniques.

HyDE Implementation: Building on Vector Search Excellence

Why HyDE Transforms Your Vector System

Remember from Session 3 how your optimized HNSW indexing and hybrid search create efficient similarity matching? HyDE leverages this infrastructure by solving the input problem. Instead of searching with questions (which live in problem-space), HyDE generates hypothetical answers (which live in solution-space), then searches with those. Your vector database suddenly becomes exponentially more effective because you're finally searching in the same semantic neighborhood as your stored content.

The HyDE Process

  1. Query Analysis: Understand the semantic intent of the user question
  2. Hypothetical Generation: Create documents that would naturally contain the answer
  3. Vector Integration: Embed these hypothetical documents using your embedding models
  4. Enhanced Retrieval: Search your vector database using these improved representations

HyDE transforms semantic gaps into vector space advantages.

🎯 Basic HyDE Implementation

Understanding HyDE (Hypothetical Document Embeddings): HyDE solves a fundamental RAG problem - the semantic gap between how users ask questions and how information is stored in documents. Instead of searching with the query directly, HyDE generates hypothetical answers and searches with those.

Why This Approach Works: Questions and answers exist in different semantic spaces. By generating hypothetical documents that would answer the query, we search in the same semantic space as our stored documents, dramatically improving retrieval accuracy.

from typing import List, Dict, Any, Optional
import openai
from sentence_transformers import SentenceTransformer
import numpy as np

class HyDEQueryEnhancer:
    """Hypothetical Document Embeddings for query enhancement."""

    def __init__(self, llm_model, embedding_model, temperature: float = 0.7):
        self.llm_model = llm_model
        self.embedding_model = embedding_model
        self.temperature = temperature

Essential Setup: The HyDE enhancer requires an LLM for generating hypothetical documents, an embedding model for vectorization, and a temperature parameter that balances creativity and accuracy in document generation.

        # Different HyDE templates for various query types
        self.hyde_templates = {
            'factual': self._factual_hyde_template,
            'procedural': self._procedural_hyde_template,
            'analytical': self._analytical_hyde_template,
            'creative': self._creative_hyde_template
        }

Template Framework: Different query types require specialized templates to generate appropriate hypothetical documents that match expected answer formats.

def enhance_query_with_hyde(self, query: str,
                           query_type: str = 'factual',
                           num_hypotheticals: int = 3) -> Dict[str, Any]:
    """Generate hypothetical documents for query enhancement."""

    # Classify query type if needed
    if query_type == 'auto':
        query_type = self._classify_query_type(query)

    # Generate hypothetical documents
    hypothetical_docs = self._generate_hypothetical_documents(
        query, query_type, num_hypotheticals
    )

Process Coordination: The main method coordinates query classification, document generation, and embedding creation.

    # Create enhanced embeddings
    enhanced_embeddings = self._create_enhanced_embeddings(
        query, hypothetical_docs
    )

    return {
        'original_query': query,
        'enhanced_embedding': enhanced_embeddings['combined'],
        'confidence_score': self._calculate_enhancement_confidence(
            enhanced_embeddings
        )
    }

Core Enhancement Workflow: The main method coordinates query type classification, hypothetical document generation, and embedding creation for improved semantic matching.

🎯 Understanding HyDE's Impact

This implementation demonstrates query enhancement working in harmony with your vector infrastructure from Session 3. Notice how:

  • Enhanced embeddings leverage your optimized embedding models
  • Confidence scoring helps evaluate query improvement quality
  • Query classification enables template-based document generation

The result bridges the semantic gap between user intent and document content, dramatically improving retrieval success in your hybrid search systems.

📝⚙️ Complete HyDE Implementation

For detailed implementation of all HyDE components including query type templates, document generation, and embedding optimization, see:

📝 Participant Path: HyDE Implementation Guide - Step-by-step practical implementation

🎯 Part 2: Query Expansion and Reformulation - Casting a Wider Net

Building on HyDE's solution-space approach, query expansion tackles the problem from the query side. While HyDE generates hypothetical answers, expansion enriches the original question with related terms, synonyms, and domain-specific language that increase the likelihood of matching relevant documents.

The two approaches are complementary: HyDE bridges semantic gaps, while expansion ensures comprehensive coverage of the search space.

The Expansion Strategy

  • HyDE (Part 1): Generate hypothetical documents → embed → search
  • Query Expansion (Part 2): Enrich original query → search with expanded terms
  • Combined Power: Use both approaches for maximum retrieval success

This multi-layered approach ensures your optimized vector search from Session 3 captures content through multiple semantic pathways.

🎯 Query Expansion Basics

Query expansion enriches the original question with related terms, synonyms, and domain-specific language:

class IntelligentQueryExpander:
    """Advanced query expansion using multiple strategies."""

    def __init__(self, llm_model, domain_corpus: Optional[List[str]] = None):
        self.llm_model = llm_model
        self.domain_corpus = domain_corpus

        # Initialize expansion strategies
        self.expansion_strategies = {
            'synonym': self._synonym_expansion,
            'semantic': self._semantic_expansion,
            'contextual': self._contextual_expansion,
            'domain_specific': self._domain_specific_expansion
        }

Strategy Framework: Multiple expansion approaches ensure comprehensive coverage - from simple synonyms to complex semantic relationships and domain-specific terminology.

def expand_query(self, query: str,
                strategies: List[str] = ['semantic', 'contextual'],
                max_expansions: int = 5) -> Dict[str, Any]:
    """Expand query using multiple strategies."""

    expansion_results = {}
    combined_expansions = set()

    # Apply each expansion strategy
    for strategy in strategies:
        if strategy in self.expansion_strategies:
            expansions = self.expansion_strategies[strategy](
                query, max_expansions
            )
            expansion_results[strategy] = expansions
            combined_expansions.update(expansions)

Strategy Application: Each configured strategy generates expansions that are combined into a comprehensive set.

    # Create final expanded query
    expanded_query = self._create_expanded_query(
        query, list(combined_expansions)
    )

    return {
        'original_query': query,
        'expanded_query': expanded_query,
        'expansion_count': len(combined_expansions)
    }

Multi-Strategy Coordination: Coordinating different expansion approaches allows leveraging the strengths of each method for comprehensive query enhancement.

🎯 Expansion Strategy Examples

Semantic Expansion: Generate conceptually related terms using LLM understanding Contextual Expansion: Create different phrasings of the same information need Domain Expansion: Add specialized terminology from your specific domain

def _semantic_expansion(self, query: str, max_expansions: int) -> List[str]:
    """Generate semantic expansions using LLM understanding."""

    semantic_prompt = f"""
    Generate {max_expansions} semantically related terms for: {query}
    Include synonyms, related concepts, and domain terminology.
    """

    # Process LLM response and return filtered terms
    response = self.llm_model.predict(semantic_prompt)
    return self._filter_expansions(response, max_expansions)

Expansion Processing: LLM-generated terms are filtered and validated to ensure high-quality query enhancement.

🎯 Multi-Query Generation Basics

Generate comprehensive query sets from multiple perspectives:

class MultiQueryGenerator:
    """Generate multiple query perspectives for comprehensive retrieval."""

    def __init__(self, llm_model):
        self.llm_model = llm_model

        self.query_perspectives = {
            'decomposition': self._decompose_complex_query,
            'specificity_levels': self._generate_specificity_variants,
            'perspective_shifts': self._generate_perspective_variants
        }

Perspective Framework: Multiple query generation approaches ensure comprehensive coverage by viewing the same information need from different angles and specificity levels.

def generate_multi_query_set(self, query: str,
                           perspectives: List[str] = None,
                           total_queries: int = 8) -> Dict[str, Any]:
    """Generate comprehensive query set from multiple perspectives."""

    if perspectives is None:
        perspectives = ['decomposition', 'specificity_levels']

    all_queries = {'original': query}

Multi-Perspective Generation: Each perspective method generates different angles on the same information need.

    # Generate variants from each perspective
    for perspective in perspectives:
        generated = self.query_perspectives[perspective](query, 3)
        all_queries[perspective] = generated

    # Flatten and deduplicate
    flattened_queries = self._flatten_and_deduplicate(all_queries)

    return {
        'original_query': query,
        'query_variants': flattened_queries,
        'total_variants': len(flattened_queries)
    }

Query Distribution Strategy: Intelligently distributing query generation across different perspectives ensures balanced coverage of the information space.

📝⚙️ Complete Query Expansion Implementation

For detailed implementation of all query expansion components including:

  • Complex Query Decomposition: Breaking multi-part questions into searchable sub-questions
  • Specificity Level Variants: Creating queries at different granularity levels
  • Domain-Specific Expansion: Using corpus knowledge for specialized terminology
  • Multi-Strategy Coordination: Combining all expansion approaches

📝 Participant Path: Query Expansion Practice - Hands-on implementation guide

🎯 Part 3: Context Window Optimization - Making Every Token Count

You've enhanced your queries and retrieved better content. But LLMs have finite context windows, and production systems need to maximize information density within those constraints. Poor context optimization wastes precious tokens on redundant information while missing critical details that could improve generation quality.

This part focuses on intelligent context assembly that balances relevance, diversity, and coherence within strict token budgets.

🎯 Context Optimization Basics

class ContextWindowOptimizer:
    """Optimize context assembly for maximum information density."""

    def __init__(self, llm_tokenizer, max_context_tokens: int = 4000):
        self.tokenizer = llm_tokenizer
        self.max_context_tokens = max_context_tokens

        # Context optimization strategies
        self.optimization_strategies = {
            'relevance_ranking': self._relevance_based_selection,
            'diversity_clustering': self._diversity_based_selection,
            'hierarchical_summary': self._hierarchical_summarization,
            'semantic_compression': self._semantic_compression
        }

Strategy Selection: Multiple optimization approaches allow choosing the best method based on the specific characteristics of retrieved content and query requirements.

def optimize_context_window(self, query: str,
                           retrieved_chunks: List[Dict],
                           strategy: str = 'relevance_ranking') -> Dict[str, Any]:
    """Optimize context window using specified strategy."""

    # Calculate available token budget
    query_tokens = len(self.tokenizer.encode(query))
    available_tokens = self.max_context_tokens - query_tokens - 200

Budget Management: Token budget calculation ensures context fits within LLM limits while reserving space for query and response.

    # Apply optimization strategy
    optimized_context = self.optimization_strategies[strategy](
        query, retrieved_chunks, available_tokens
    )

    return {
        'optimized_context': optimized_context['context'],
        'context_tokens': optimized_context['token_count'],
        'efficiency_score': optimized_context['efficiency'],
        'strategy_used': strategy
    }

Token Management: Precise token budgeting ensures context stays within limits while reserving space for the query and response generation.

🎯 Core Optimization Strategies

Relevance Ranking: Select chunks based on relevance-per-token efficiency Diversity Clustering: Balance relevance with information diversity Hierarchical Summary: Summarize content groups when budget is exceeded Semantic Compression: Compress verbose content while preserving meaning

def _relevance_based_selection(self, query: str, chunks: List[Dict],
                             token_budget: int) -> Dict[str, Any]:
    """Select chunks based on relevance scores and token efficiency."""

    # Calculate efficiency: relevance per token
    chunk_analysis = []
    for chunk in chunks:
        content = chunk['document'].page_content
        tokens = len(self.tokenizer.encode(content))
        relevance = 1 - chunk.get('similarity_score', 0.5)
        efficiency = relevance / tokens if tokens > 0 else 0

        chunk_analysis.append({
            'content': content,
            'tokens': tokens,
            'efficiency': efficiency
        })

Efficiency Calculation: Computing relevance-per-token ratios enables optimal selection of chunks that provide maximum information value within token constraints.

    # Sort by efficiency and select within budget
    chunk_analysis.sort(key=lambda x: x['efficiency'], reverse=True)
    selected_chunks = []
    total_tokens = 0

    for chunk_data in chunk_analysis:
        if total_tokens + chunk_data['tokens'] <= token_budget:
            selected_chunks.append(chunk_data)
            total_tokens += chunk_data['tokens']

    return {
        'context': self._assemble_context(selected_chunks),
        'chunks': selected_chunks,
        'token_count': total_tokens,
        'efficiency': np.mean([c['efficiency'] for c in selected_chunks])
    }

Efficiency Optimization: Computing relevance-per-token ratios enables optimal selection of chunks that provide maximum information value within token constraints.

📝⚙️ Complete Context Optimization Implementation

For detailed implementation of all context optimization techniques including:

  • Hierarchical Summarization: Intelligent grouping and summarization strategies
  • Semantic Compression: Content compression while preserving meaning
  • Diversity-Based Selection: Balancing relevance with information coverage
  • Strategy Selection Logic: Choosing optimal approaches based on content characteristics

📝 Participant Path: Context Optimization Methods - Practical optimization techniques

🎯 Part 4: Advanced Prompt Engineering for RAG - Getting the Best Response

You have enhanced queries, optimized retrieval, and efficiently packed context. The final step is prompt engineering that maximizes the LLM's ability to use this carefully curated information. Generic prompts waste the intelligence you've built into your retrieval system. RAG-specific prompts extract maximum value from context while providing transparency and reliability assessment.

🎯 RAG Prompt Engineering Basics

class RAGPromptEngineer:
    """Advanced prompt engineering specifically for RAG systems."""

    def __init__(self, llm_model):
        self.llm_model = llm_model

        # Domain-specific prompt templates
        self.prompt_templates = {
            'factual_qa': self._factual_qa_template,
            'analytical': self._analytical_template,
            'procedural': self._procedural_template,
            'technical': self._technical_template
        }

        # Prompt optimization techniques
        self.optimization_techniques = {
            'chain_of_thought': self._add_chain_of_thought,
            'confidence_calibration': self._add_confidence_calibration,
            'multi_step_reasoning': self._add_multi_step_reasoning
        }

Template Framework: Different query types require specialized prompt structures that guide the LLM toward appropriate response patterns and information organization.

def engineer_rag_prompt(self, query: str, context: str,
                       query_type: str = 'factual_qa',
                       optimizations: List[str] = ['confidence_calibration']) -> Dict[str, Any]:
    """Engineer optimized RAG prompt with specified techniques."""

    # Start with base template
    base_prompt = self.prompt_templates[query_type](query, context)

    # Apply optimization techniques
    optimized_prompt = base_prompt
    for technique in optimizations:
        if technique in self.optimization_techniques:
            result = self.optimization_techniques[technique](
                optimized_prompt, query, context
            )
            optimized_prompt = result['prompt']

Sequential Optimization: Applying optimization techniques iteratively builds up prompt sophistication while tracking the contribution of each enhancement.

    return {
        'base_prompt': base_prompt,
        'optimized_prompt': optimized_prompt,
        'optimizations_applied': optimizations,
        'query_type': query_type
    }

Sequential Optimization: Applying optimization techniques iteratively builds up prompt sophistication while tracking the contribution of each enhancement.

🎯 Basic Template Example

def _factual_qa_template(self, query: str, context: str) -> str:
    """Optimized template for factual question answering."""

    return f"""You are an expert information analyst.
    Use the provided context to answer accurately.

CONTEXT INFORMATION:
{context}

QUESTION: {query}

INSTRUCTIONS:
1. Base your answer strictly on the provided context
2. Cite sources using [Source: X] format
3. State limitations if information is insufficient
4. Provide direct quotes when appropriate

ANSWER:"""

Context Integration: The template establishes clear boundaries between context and query, ensuring the LLM maintains focus on provided information rather than generating unsupported content.

🎯 Chain-of-Thought Enhancement Example

def _add_chain_of_thought(self, base_prompt: str, query: str,
                        context: str) -> Dict[str, Any]:
    """Add chain-of-thought reasoning to prompt."""

    cot_enhancement = """
REASONING PROCESS:
Work through your reasoning step-by-step:

1. INFORMATION EXTRACTION: What key information relates to the question?
2. RELEVANCE ASSESSMENT: How does each piece address the question?
3. SYNTHESIS: How do you combine information into a coherent answer?
4. CONFIDENCE EVALUATION: How confident are you based on evidence?
"""

Reasoning Framework: The four-step process ensures systematic analysis of context.

    return {
        'prompt': base_prompt + cot_enhancement + "\nFINAL ANSWER:",
        'metadata': {
            'technique': 'chain_of_thought',
            'reasoning_steps': 4
        }
    }

Structured Reasoning: Chain-of-thought prompting guides the LLM through explicit reasoning steps, improving accuracy and providing insight into the decision-making process.

🎯 Confidence Calibration Example

def _add_confidence_calibration(self, base_prompt: str, query: str,
                              context: str) -> Dict[str, Any]:
    """Add confidence calibration to improve answer reliability."""

    confidence_framework = """
CONFIDENCE ASSESSMENT:
1. EVIDENCE STRENGTH (Strong/Medium/Weak)
2. COMPLETENESS (Complete/Partial/Incomplete)
3. CONFIDENCE SCORE (0-100%)

Include at end:
- Evidence Strength: [Strong/Medium/Weak]
- Confidence Score: [0-100%]
- Key Limitations: [List limitations]
"""

Confidence Assessment: Structured reliability evaluation framework.

    return {
        'prompt': base_prompt + confidence_framework,
        'metadata': {
            'technique': 'confidence_calibration',
            'assessment_dimensions': 3
        }
    }

Confidence Framework: Structured assessment dimensions ensure consistent evaluation of evidence strength, answer completeness, and overall confidence.

⚙️ Advanced Prompt Engineering Systems

For sophisticated prompt engineering including:

  • Dynamic Prompt Adaptation: Intelligent strategy selection based on context quality and query characteristics
  • Multi-Step Reasoning: Complex analytical reasoning frameworks
  • Advanced Confidence Calibration: Comprehensive reliability assessment systems
  • Context Quality Analysis: Sophisticated context evaluation and adaptation

⚙️ Implementer Path: Advanced Prompt Engineering - Production-ready prompt systems

🎯 Complete Query Enhancement Overview

You've learned the four essential components of query enhancement:

  1. 🎯 HyDE: Bridge semantic gaps by generating hypothetical documents
  2. 🎯 Query Expansion: Enrich queries with related terms and reformulations
  3. 🎯 Context Optimization: Maximize information density within token budgets
  4. 🎯 Prompt Engineering: Extract maximum value from enhanced context

🎯 Integration Overview

class ComprehensiveQueryEnhancer:
    """Complete query enhancement pipeline for RAG systems."""

    def __init__(self, llm_model, embedding_model, tokenizer):
        self.hyde_enhancer = HyDEQueryEnhancer(llm_model, embedding_model)
        self.query_expander = IntelligentQueryExpander(llm_model)
        self.context_optimizer = ContextWindowOptimizer(tokenizer)
        self.prompt_engineer = RAGPromptEngineer(llm_model)

System Architecture: Each component handles a specific aspect of query enhancement, from semantic gap bridging to prompt optimization.

def enhance_query_pipeline(self, query: str, vector_store) -> Dict[str, Any]:
    """Run complete query enhancement pipeline."""

    # Phase 1: Query Enhancement
    hyde_result = self.hyde_enhancer.enhance_query_with_hyde(query)
    expansion_result = self.query_expander.expand_query(query)

    # Phase 2: Enhanced Retrieval
    enhanced_results = vector_store.similarity_search_by_vector(
        hyde_result['enhanced_embedding'], k=20
    )

Pipeline Phases: Sequential processing through enhancement, retrieval, optimization, and prompt engineering.

    # Phase 3: Context Optimization
    optimized_context = self.context_optimizer.optimize_context_window(
        query, enhanced_results
    )

    # Phase 4: Prompt Engineering
    final_prompt = self.prompt_engineer.engineer_rag_prompt(
        query, optimized_context['optimized_context']
    )

    return {
        'enhanced_query': hyde_result,
        'optimized_context': optimized_context,
        'engineered_prompt': final_prompt
    }

Pipeline Coordination: The enhancement pipeline processes queries through multiple phases for comprehensive improvement.

⚙️ Complete Enhancement Pipeline Implementation

For the complete, production-ready enhancement pipeline including:

  • Comprehensive Integration: All enhancement techniques working together
  • Configuration Management: Flexible enhancement strategy selection
  • Performance Optimization: Efficient pipeline execution
  • Error Handling: Robust fallback mechanisms

⚙️ Implementer Path: Complete Enhancement Pipeline - Full production implementation

🎯 Session Summary

You've learned the essential components of query enhancement including:

  • HyDE system for semantic gap bridging between questions and documents
  • Multi-strategy query expansion using semantic, contextual, and domain techniques
  • Context window optimization with relevance ranking and intelligent selection
  • Advanced prompt engineering with confidence calibration and structured reasoning

These techniques transform your RAG system from simple similarity search to intelligent query understanding that bridges the semantic gap between user intent and document content.

Next Steps by Learning Path

🎯 Observer Path Complete: You understand the core concepts of query enhancement and how each technique addresses specific challenges in RAG systems.

📝 Participant Path: Continue with practical implementation guides for each technique to build working enhancement systems.

⚙️ Implementer Path: Proceed to advanced systems and complete pipeline implementation for production deployment.

📝 Test Your Knowledge

Test your understanding of query enhancement and context augmentation: 📝 Session 4 Quiz →


Previous: Session 3 - Advanced Patterns →
Next: Session 5 - Type-Safe Development →