Skip to content

Session 4: Production MCP Deployment - Test Solutions

📝 Multiple Choice Test

Question 1: Production Requirements

What is the primary difference between development and production MCP servers?

A) Production servers are slower than development servers
B) Production servers use different protocols
C) Production servers only work with specific LLMs
D) Production servers require observability, scalability, and reliability ✅
Correct Answer: D) Production servers require observability, scalability, and reliability

Explanation: Production environments require comprehensive observability (metrics, logs, traces), scalability to handle varying loads, and reliability through error handling and fault tolerance.


Question 2: Containerization Benefits

What is the main advantage of containerizing MCP servers with Docker?

A) Improved performance
B) Better security by default
C) Automatic scaling capabilities
D) Consistent environments across development and production ✅
Correct Answer: D) Consistent environments across development and production

Explanation: Docker containerization ensures consistent environments across development, staging, and production, eliminating "works on my machine" problems and simplifying deployment.


Question 3: Prometheus Metrics

Which Prometheus metric type is best suited for tracking response times?

A) Histogram ✅
B) Counter
C) Gauge
D) Summary
Correct Answer: A) Histogram

Explanation: Histogram metrics are ideal for tracking response times as they measure distributions and provide percentiles, helping identify performance patterns and outliers.


Question 4: Health Checks

What information should a comprehensive health check endpoint provide?

A) Database connectivity and dependent services status ✅
B) Only HTTP 200 status
C) Server uptime only
D) Current server load only
Correct Answer: A) Database connectivity and dependent services status

Explanation: Health checks should verify all critical dependencies including database connectivity, external service availability, and resource utilization to provide comprehensive system status.


Question 5: Auto-scaling Strategy

What metric is most important for auto-scaling MCP servers?

A) CPU utilization only
B) Network bandwidth only
C) Request rate combined with response time ✅
D) Memory usage only
Correct Answer: C) Request rate combined with response time

Explanation: Request rate combined with response time provides the best indicator of actual user demand and system performance, enabling more accurate scaling decisions than resource metrics alone.


Question 6: Caching Implementation

What type of caching is most effective for MCP server responses?

A) File-based caching
B) In-memory caching only
C) Redis distributed caching with TTL expiration ✅
D) Database-level caching only
Correct Answer: C) Redis distributed caching with TTL expiration

Explanation: Redis distributed caching with TTL expiration provides fast access, data persistence, and automatic cleanup while supporting multiple server instances in production environments.


Question 7: Circuit Breaker Pattern

When should a circuit breaker transition to the "open" state?

A) When the server starts up
B) When response times are slightly elevated
C) When memory usage is high
D) When error rates exceed the configured threshold ✅
Correct Answer: D) When error rates exceed the configured threshold

Explanation: Circuit breakers open when error rates exceed configured thresholds, preventing cascade failures by temporarily stopping requests to failing services until they recover.


Question 8: CI/CD Pipeline

What is the recommended approach for deploying MCP servers through CI/CD?

A) Direct deployment to production
B) Blue-green deployment with health checks ✅
C) Manual deployment verification
D) Rolling updates without testing
Correct Answer: B) Blue-green deployment with health checks

Explanation: Blue-green deployment with health checks ensures zero-downtime deployments by maintaining two identical environments and switching traffic only after verifying the new version's health.


Question 9: Monitoring Strategy

Which monitoring approach provides the most comprehensive observability?

A) The three pillars: metrics, logs, and distributed tracing ✅
B) Logs only
C) Metrics only
D) Health checks only
Correct Answer: A) The three pillars: metrics, logs, and distributed tracing

Explanation: The three pillars of observability (metrics, logs, and distributed tracing) provide comprehensive system visibility, enabling effective troubleshooting and performance optimization.


Question 10: Infrastructure as Code

What is the primary benefit of using Terraform for MCP server infrastructure?

A) Lower costs
B) Faster deployment
C) Reproducible and version-controlled infrastructure ✅
D) Improved security
Correct Answer: C) Reproducible and version-controlled infrastructure

Explanation: Infrastructure as Code with Terraform ensures reproducible, version-controlled infrastructure that can be reviewed, tested, and deployed consistently across environments.


Scoring Guide

  • 10 correct: Expert level - Ready for enterprise production deployments
  • 8-9 correct: Proficient - Strong understanding of production operations
  • 6-7 correct: Competent - Good grasp of deployment concepts
  • 4-5 correct: Developing - Review monitoring and scaling sections
  • Below 4: Beginner - Revisit production fundamentals and containerization

Key Concepts Summary

  1. Production Requirements: Observability, scalability, and reliability are essential
  2. Containerization: Docker provides consistent environments and simplified deployment
  3. Monitoring Stack: Comprehensive observability requires metrics, logs, and tracing
  4. Auto-scaling: Request rate and response time guide scaling decisions
  5. Deployment Strategy: Blue-green deployments with health checks ensure reliability

Practical Exercise Solution

Challenge: Implement a circuit breaker pattern for MCP server resilience.

Complete Circuit Breaker Implementation:

# src/resilience/circuit_breaker.py
import asyncio
import time
from enum import Enum
from typing import Any, Callable, Dict, Optional
from dataclasses import dataclass
import logging

logger = logging.getLogger(__name__)

class CircuitState(Enum):
    """Circuit breaker states."""
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Circuit is open, failing fast
    HALF_OPEN = "half_open"  # Testing if service has recovered

@dataclass
class CircuitBreakerConfig:
    """Configuration for circuit breaker."""
    failure_threshold: int = 5          # Failures before opening circuit
    recovery_timeout: int = 60          # Seconds before trying half-open
    success_threshold: int = 3          # Successes in half-open to close
    timeout: float = 30.0              # Request timeout in seconds
    expected_exceptions: tuple = (Exception,)  # Exceptions that count as failures

class CircuitBreakerError(Exception):
    """Raised when circuit breaker is open."""
    pass

class CircuitBreaker:
    """
    Circuit breaker implementation for resilient MCP server operations.

    The circuit breaker prevents cascading failures by monitoring the health
    of downstream services and failing fast when they're unavailable.
    """

    def __init__(self, config: CircuitBreakerConfig):
        self.config = config
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time = None
        self.next_attempt_time = 0

    async def call(self, func: Callable, *args, **kwargs) -> Any:
        """
        Execute function with circuit breaker protection.

        Args:
            func: Async function to execute
            *args, **kwargs: Arguments to pass to function

        Returns:
            Function result

        Raises:
            CircuitBreakerError: When circuit is open
        """
        current_time = time.time()

        # Check if we should attempt the call
        if not self._should_attempt_call(current_time):
            raise CircuitBreakerError(
                f"Circuit breaker is {self.state.value}. "
                f"Next attempt in {self.next_attempt_time - current_time:.1f}s"
            )

        try:
            # Execute the function with timeout
            result = await asyncio.wait_for(
                func(*args, **kwargs),
                timeout=self.config.timeout
            )

            # Record success
            await self._record_success()
            return result

        except self.config.expected_exceptions as e:
            # Record failure
            await self._record_failure(current_time)
            raise
        except asyncio.TimeoutError as e:
            # Timeout counts as failure
            await self._record_failure(current_time)
            raise CircuitBreakerError(f"Operation timed out after {self.config.timeout}s")

    def _should_attempt_call(self, current_time: float) -> bool:
        """Determine if we should attempt the call based on current state."""
        if self.state == CircuitState.CLOSED:
            return True
        elif self.state == CircuitState.OPEN:
            if current_time >= self.next_attempt_time:
                self.state = CircuitState.HALF_OPEN
                logger.info("Circuit breaker transitioning to HALF_OPEN")
                return True
            return False
        elif self.state == CircuitState.HALF_OPEN:
            return True

        return False

    async def _record_success(self):
        """Record a successful operation."""
        if self.state == CircuitState.HALF_OPEN:
            self.success_count += 1
            logger.debug(f"Circuit breaker success count: {self.success_count}")

            if self.success_count >= self.config.success_threshold:
                await self._close_circuit()
        elif self.state == CircuitState.CLOSED:
            # Reset failure count on success
            self.failure_count = 0

    async def _record_failure(self, current_time: float):
        """Record a failed operation."""
        self.failure_count += 1
        self.last_failure_time = current_time

        logger.warning(f"Circuit breaker failure count: {self.failure_count}")

        if self.state == CircuitState.CLOSED:
            if self.failure_count >= self.config.failure_threshold:
                await self._open_circuit(current_time)
        elif self.state == CircuitState.HALF_OPEN:
            # Any failure in half-open state opens the circuit
            await self._open_circuit(current_time)

    async def _open_circuit(self, current_time: float):
        """Open the circuit breaker."""
        self.state = CircuitState.OPEN
        self.next_attempt_time = current_time + self.config.recovery_timeout
        self.success_count = 0

        logger.error(
            f"Circuit breaker OPENED after {self.failure_count} failures. "
            f"Next attempt at {time.ctime(self.next_attempt_time)}"
        )

    async def _close_circuit(self):
        """Close the circuit breaker."""
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.success_count = 0

        logger.info("Circuit breaker CLOSED - service recovered")

    def get_stats(self) -> Dict[str, Any]:
        """Get current circuit breaker statistics."""
        return {
            "state": self.state.value,
            "failure_count": self.failure_count,
            "success_count": self.success_count,
            "last_failure_time": self.last_failure_time,
            "next_attempt_time": self.next_attempt_time if self.state == CircuitState.OPEN else None
        }

Key Features Implemented:

  1. State Management: Properly manages CLOSED, OPEN, and HALF_OPEN states
  2. Configurable Thresholds: Customizable failure thresholds and recovery timeouts
  3. Timeout Handling: Includes request timeouts as circuit breaker triggers
  4. Fallback Strategies: Multiple levels of fallback (cache, static data, queuing)
  5. Monitoring Integration: Exposes circuit breaker statistics as MCP resources

This circuit breaker implementation provides robust failure handling and enables graceful degradation in production MCP deployments.


Return to Session 4