Session 4: Production MCP Deployment - Test Solutions¶
📝 Multiple Choice Test¶
Question 1: Production Requirements¶
What is the primary difference between development and production MCP servers?
A) Production servers are slower than development servers
B) Production servers use different protocols
C) Production servers only work with specific LLMs
D) Production servers require observability, scalability, and reliability ✅
Correct Answer: D) Production servers require observability, scalability, and reliability
Explanation: Production environments require comprehensive observability (metrics, logs, traces), scalability to handle varying loads, and reliability through error handling and fault tolerance.
Question 2: Containerization Benefits¶
What is the main advantage of containerizing MCP servers with Docker?
A) Improved performance
B) Better security by default
C) Automatic scaling capabilities
D) Consistent environments across development and production ✅
Correct Answer: D) Consistent environments across development and production
Explanation: Docker containerization ensures consistent environments across development, staging, and production, eliminating "works on my machine" problems and simplifying deployment.
Question 3: Prometheus Metrics¶
Which Prometheus metric type is best suited for tracking response times?
A) Histogram ✅
B) Counter
C) Gauge
D) Summary
Correct Answer: A) Histogram
Explanation: Histogram metrics are ideal for tracking response times as they measure distributions and provide percentiles, helping identify performance patterns and outliers.
Question 4: Health Checks¶
What information should a comprehensive health check endpoint provide?
A) Database connectivity and dependent services status ✅
B) Only HTTP 200 status
C) Server uptime only
D) Current server load only
Correct Answer: A) Database connectivity and dependent services status
Explanation: Health checks should verify all critical dependencies including database connectivity, external service availability, and resource utilization to provide comprehensive system status.
Question 5: Auto-scaling Strategy¶
What metric is most important for auto-scaling MCP servers?
A) CPU utilization only
B) Network bandwidth only
C) Request rate combined with response time ✅
D) Memory usage only
Correct Answer: C) Request rate combined with response time
Explanation: Request rate combined with response time provides the best indicator of actual user demand and system performance, enabling more accurate scaling decisions than resource metrics alone.
Question 6: Caching Implementation¶
What type of caching is most effective for MCP server responses?
A) File-based caching
B) In-memory caching only
C) Redis distributed caching with TTL expiration ✅
D) Database-level caching only
Correct Answer: C) Redis distributed caching with TTL expiration
Explanation: Redis distributed caching with TTL expiration provides fast access, data persistence, and automatic cleanup while supporting multiple server instances in production environments.
Question 7: Circuit Breaker Pattern¶
When should a circuit breaker transition to the "open" state?
A) When the server starts up
B) When response times are slightly elevated
C) When memory usage is high
D) When error rates exceed the configured threshold ✅
Correct Answer: D) When error rates exceed the configured threshold
Explanation: Circuit breakers open when error rates exceed configured thresholds, preventing cascade failures by temporarily stopping requests to failing services until they recover.
Question 8: CI/CD Pipeline¶
What is the recommended approach for deploying MCP servers through CI/CD?
A) Direct deployment to production
B) Blue-green deployment with health checks ✅
C) Manual deployment verification
D) Rolling updates without testing
Correct Answer: B) Blue-green deployment with health checks
Explanation: Blue-green deployment with health checks ensures zero-downtime deployments by maintaining two identical environments and switching traffic only after verifying the new version's health.
Question 9: Monitoring Strategy¶
Which monitoring approach provides the most comprehensive observability?
A) The three pillars: metrics, logs, and distributed tracing ✅
B) Logs only
C) Metrics only
D) Health checks only
Correct Answer: A) The three pillars: metrics, logs, and distributed tracing
Explanation: The three pillars of observability (metrics, logs, and distributed tracing) provide comprehensive system visibility, enabling effective troubleshooting and performance optimization.
Question 10: Infrastructure as Code¶
What is the primary benefit of using Terraform for MCP server infrastructure?
A) Lower costs
B) Faster deployment
C) Reproducible and version-controlled infrastructure ✅
D) Improved security
Correct Answer: C) Reproducible and version-controlled infrastructure
Explanation: Infrastructure as Code with Terraform ensures reproducible, version-controlled infrastructure that can be reviewed, tested, and deployed consistently across environments.
Scoring Guide¶
- 10 correct: Expert level - Ready for enterprise production deployments
- 8-9 correct: Proficient - Strong understanding of production operations
- 6-7 correct: Competent - Good grasp of deployment concepts
- 4-5 correct: Developing - Review monitoring and scaling sections
- Below 4: Beginner - Revisit production fundamentals and containerization
Key Concepts Summary¶
- Production Requirements: Observability, scalability, and reliability are essential
- Containerization: Docker provides consistent environments and simplified deployment
- Monitoring Stack: Comprehensive observability requires metrics, logs, and tracing
- Auto-scaling: Request rate and response time guide scaling decisions
- Deployment Strategy: Blue-green deployments with health checks ensure reliability
Practical Exercise Solution¶
Challenge: Implement a circuit breaker pattern for MCP server resilience.
Complete Circuit Breaker Implementation:¶
# src/resilience/circuit_breaker.py
import asyncio
import time
from enum import Enum
from typing import Any, Callable, Dict, Optional
from dataclasses import dataclass
import logging
logger = logging.getLogger(__name__)
class CircuitState(Enum):
"""Circuit breaker states."""
CLOSED = "closed" # Normal operation
OPEN = "open" # Circuit is open, failing fast
HALF_OPEN = "half_open" # Testing if service has recovered
@dataclass
class CircuitBreakerConfig:
"""Configuration for circuit breaker."""
failure_threshold: int = 5 # Failures before opening circuit
recovery_timeout: int = 60 # Seconds before trying half-open
success_threshold: int = 3 # Successes in half-open to close
timeout: float = 30.0 # Request timeout in seconds
expected_exceptions: tuple = (Exception,) # Exceptions that count as failures
class CircuitBreakerError(Exception):
"""Raised when circuit breaker is open."""
pass
class CircuitBreaker:
"""
Circuit breaker implementation for resilient MCP server operations.
The circuit breaker prevents cascading failures by monitoring the health
of downstream services and failing fast when they're unavailable.
"""
def __init__(self, config: CircuitBreakerConfig):
self.config = config
self.state = CircuitState.CLOSED
self.failure_count = 0
self.success_count = 0
self.last_failure_time = None
self.next_attempt_time = 0
async def call(self, func: Callable, *args, **kwargs) -> Any:
"""
Execute function with circuit breaker protection.
Args:
func: Async function to execute
*args, **kwargs: Arguments to pass to function
Returns:
Function result
Raises:
CircuitBreakerError: When circuit is open
"""
current_time = time.time()
# Check if we should attempt the call
if not self._should_attempt_call(current_time):
raise CircuitBreakerError(
f"Circuit breaker is {self.state.value}. "
f"Next attempt in {self.next_attempt_time - current_time:.1f}s"
)
try:
# Execute the function with timeout
result = await asyncio.wait_for(
func(*args, **kwargs),
timeout=self.config.timeout
)
# Record success
await self._record_success()
return result
except self.config.expected_exceptions as e:
# Record failure
await self._record_failure(current_time)
raise
except asyncio.TimeoutError as e:
# Timeout counts as failure
await self._record_failure(current_time)
raise CircuitBreakerError(f"Operation timed out after {self.config.timeout}s")
def _should_attempt_call(self, current_time: float) -> bool:
"""Determine if we should attempt the call based on current state."""
if self.state == CircuitState.CLOSED:
return True
elif self.state == CircuitState.OPEN:
if current_time >= self.next_attempt_time:
self.state = CircuitState.HALF_OPEN
logger.info("Circuit breaker transitioning to HALF_OPEN")
return True
return False
elif self.state == CircuitState.HALF_OPEN:
return True
return False
async def _record_success(self):
"""Record a successful operation."""
if self.state == CircuitState.HALF_OPEN:
self.success_count += 1
logger.debug(f"Circuit breaker success count: {self.success_count}")
if self.success_count >= self.config.success_threshold:
await self._close_circuit()
elif self.state == CircuitState.CLOSED:
# Reset failure count on success
self.failure_count = 0
async def _record_failure(self, current_time: float):
"""Record a failed operation."""
self.failure_count += 1
self.last_failure_time = current_time
logger.warning(f"Circuit breaker failure count: {self.failure_count}")
if self.state == CircuitState.CLOSED:
if self.failure_count >= self.config.failure_threshold:
await self._open_circuit(current_time)
elif self.state == CircuitState.HALF_OPEN:
# Any failure in half-open state opens the circuit
await self._open_circuit(current_time)
async def _open_circuit(self, current_time: float):
"""Open the circuit breaker."""
self.state = CircuitState.OPEN
self.next_attempt_time = current_time + self.config.recovery_timeout
self.success_count = 0
logger.error(
f"Circuit breaker OPENED after {self.failure_count} failures. "
f"Next attempt at {time.ctime(self.next_attempt_time)}"
)
async def _close_circuit(self):
"""Close the circuit breaker."""
self.state = CircuitState.CLOSED
self.failure_count = 0
self.success_count = 0
logger.info("Circuit breaker CLOSED - service recovered")
def get_stats(self) -> Dict[str, Any]:
"""Get current circuit breaker statistics."""
return {
"state": self.state.value,
"failure_count": self.failure_count,
"success_count": self.success_count,
"last_failure_time": self.last_failure_time,
"next_attempt_time": self.next_attempt_time if self.state == CircuitState.OPEN else None
}
Key Features Implemented:¶
- State Management: Properly manages CLOSED, OPEN, and HALF_OPEN states
- Configurable Thresholds: Customizable failure thresholds and recovery timeouts
- Timeout Handling: Includes request timeouts as circuit breaker triggers
- Fallback Strategies: Multiple levels of fallback (cache, static data, queuing)
- Monitoring Integration: Exposes circuit breaker statistics as MCP resources
This circuit breaker implementation provides robust failure handling and enables graceful degradation in production MCP deployments.