Skip to content

Session 8: Agno Production-Ready Agents

Learning Navigation Hub

Total Time Investment: 85 minutes (Core) + 30-230 minutes (Optional) Your Learning Path: Choose your engagement level

Quick Start Guide

  • Observer (50 min): Read production concepts + examine architecture patterns
  • Participant (85 min): Follow deployment exercises + implement monitoring
  • Implementer (150 min): Build production systems + explore enterprise patterns

Session Overview Dashboard

Core Learning Track (85 minutes) - REQUIRED

Section Concept Load Time Skills
Production Architecture 3 concepts 25 min Understanding
Essential Production Patterns 4 concepts 30 min Implementation
Deployment Fundamentals 4 concepts 25 min Application
✅ Production Validation 3 concepts 5 min Verification

Optional Deep Dive Modules (Choose Your Adventure)

  • 🔬 Module A: Advanced Monitoring & Observability (65 min) - Comprehensive monitoring systems
  • 🏭 Module B: Enterprise Scaling & Architecture (70 min) - Kubernetes & auto-scaling
  • 🔧 Module C: Performance Optimization (50 min) - Cost management & caching
  • 🛡️ Module D: Security & Compliance (45 min) - Enterprise security patterns

Code Files: All examples use files in src/session8/ Quick Start: Run cd src/session8 && python agno_foundation.py to see Agno in action


Core Section (Required - 85 minutes)

Part 1: Production Architecture & Mindset (25 minutes)

Cognitive Load: 3 new concepts Learning Mode: Conceptual Understanding

Production vs Development Mindset (10 minutes)

Agno emphasizes production-first thinking from the start. These imports provide built-in monitoring, persistent storage, and enterprise-grade tools that distinguish Agno from development-focused frameworks:

File: src/session8/agno_foundation.py - Core Agno implementation and setup

# Essential Agno imports for production

from agno import Agent, Workflow  
from agno.storage import PostgresStorage
from agno.monitoring import PrometheusExporter
from agno.tools import DuckDuckGo, FileTools

Key Mindset Shift: | Development Focus | Production Focus | |-------------------|------------------|
| Speed of iteration | Reliability & stability | | Feature completeness | Performance & efficiency | | Local testing | Distributed monitoring | | Manual intervention OK | Automated recovery required |

Agno Framework Overview (8 minutes)

Agno's production-ready architecture:

Agno Agent Architecture This diagram showcases Agno's production-first architecture, highlighting built-in monitoring, persistent storage, and reliability features. The architecture emphasizes operational excellence with integrated telemetry, structured logging, and automated recovery mechanisms.

This basic Agno agent setup demonstrates how production features like persistent storage and monitoring are built into the framework from the ground up:

File: src/session8/agno_foundation.py - Basic production agent setup

# Basic Agno agent with production features

from agno import Agent
from agno.storage import PostgresStorage

# Agent with persistent storage

storage = PostgresStorage(
    host="localhost",
    db="production_agents",
    table="agent_sessions"
)

production_agent = Agent(
    name="ProductionAssistant",
    model="gpt-4",
    storage=storage,
    monitoring=True,  # Built-in metrics
    debug_mode=False  # Production optimized
)

Agno's Production Advantages: 1. Built-in Storage: PostgreSQL integration out of the box 2. Monitoring Ready: Prometheus metrics included 3. Multi-Provider: 23+ LLM providers supported 4. Docker Ready: Production deployment containers

Enterprise Agent Architecture (7 minutes)

Building scalable agent systems:

# Enterprise agent configuration

class ProductionConfig:
    # Model configuration
    PRIMARY_MODEL = "gpt-4"
    FALLBACK_MODEL = "gpt-3.5-turbo" 

    # Performance settings
    MAX_RETRIES = 3
    TIMEOUT_SECONDS = 30
    CONCURRENT_REQUESTS = 10

    # Storage configuration  
    DATABASE_URL = "postgresql://user:pass@localhost:5432/agents"

    # Monitoring
    ENABLE_METRICS = True
    METRICS_PORT = 8080

# Create production-ready agent

def create_enterprise_agent():
    return Agent(
        name="EnterpriseAgent",
        model=ProductionConfig.PRIMARY_MODEL,
        storage=PostgresStorage(ProductionConfig.DATABASE_URL),
        tools=[DuckDuckGo(), FileTools()],
        show_tool_calls=False,  # Clean logs for production
        monitoring=ProductionConfig.ENABLE_METRICS
    )

Part 2: Essential Production Patterns (30 minutes)

Cognitive Load: 4 new concepts Learning Mode: Implementation & Practice

Monitoring & Observability (10 minutes)

Built-in monitoring for production systems:

Agno Telemetry & Debugging This visualization demonstrates Agno's comprehensive telemetry and debugging capabilities, showing real-time metrics collection, distributed tracing, error tracking, and performance monitoring dashboards that enable effective production operations and troubleshooting.

Setting up comprehensive monitoring involves configuring structured logging, Prometheus metrics, and agent-level monitoring. This configuration ensures you can track performance, errors, and usage patterns in production:

File: src/session8/structured_outputs.py - Monitoring and telemetry setup

from agno.monitoring import PrometheusExporter
import logging

# Set up production logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# Agent with comprehensive monitoring

monitored_agent = Agent(
    name="MonitoredAgent",
    model="gpt-4",
    monitoring=True,
    tools=[DuckDuckGo()]
)

# Custom monitoring setup

prometheus_exporter = PrometheusExporter(
    agent=monitored_agent,
    port=8080,
    metrics=[
        "request_count",
        "response_time",
        "error_rate", 
        "token_usage"
    ]
)

Error Handling & Recovery (8 minutes)

Production-grade error handling requires implementing retry mechanisms with exponential backoff, proper exception handling, and graceful degradation. This pattern ensures your agents can handle transient failures and maintain service availability:

File: src/session8/performance_resilience.py - Error handling and recovery patterns

import asyncio
from typing import Optional

class RobustAgentWrapper:
    def __init__(self, agent: Agent, max_retries: int = 3):
        self.agent = agent
        self.max_retries = max_retries

    async def run_with_retry(self, message: str) -> Optional[str]:
        """Execute agent with exponential backoff retry."""
        for attempt in range(self.max_retries):
            try:
                response = await self.agent.arun(message)
                return response.content

            except Exception as e:
                wait_time = 2 ** attempt
                logging.warning(f"Attempt {attempt + 1} failed: {e}")

                if attempt == self.max_retries - 1:
                    logging.error(f"All retries failed for: {message}")
                    return None

                await asyncio.sleep(wait_time)

        return None

# Usage

robust_agent = RobustAgentWrapper(monitored_agent)
result = await robust_agent.run_with_retry("Analyze market trends")

Resource Management (7 minutes)

Managing resources in production involves controlling memory usage, managing database connections, and implementing session limits. This resource manager pattern helps prevent resource exhaustion and ensures consistent performance:

File: src/session8/performance_resilience.py - Resource management patterns

from agno.storage import PostgresStorage
from contextlib import asynccontextmanager

class ResourceManager:
    def __init__(self, max_sessions: int = 100):
        self.max_sessions = max_sessions
        self.active_sessions = {}
        self.storage = PostgresStorage()

    @asynccontextmanager
    async def get_agent_session(self, session_id: str):
        """Context manager for agent sessions."""
        if len(self.active_sessions) >= self.max_sessions:
            raise Exception("Maximum sessions reached")

        try:
            # Create agent for session
            agent = Agent(
                name=f"Agent_{session_id}",
                model="gpt-4",
                storage=self.storage,
                session_id=session_id
            )

            self.active_sessions[session_id] = agent
            yield agent

        finally:
            # Cleanup session
            if session_id in self.active_sessions:
                del self.active_sessions[session_id]

            # Save session data
            await self.storage.save_session(session_id)

# Usage

resource_manager = ResourceManager()

async with resource_manager.get_agent_session("user_123") as agent:
    response = await agent.arun("Help with task")
    print(response.content)

Performance Optimization (5 minutes)

Optimizing for production performance involves implementing intelligent caching, connection pooling, and response optimization. This performance-optimized agent demonstrates how to reduce latency and improve throughput:

File: src/session8/performance_resilience.py - Performance optimization patterns

from agno.cache import RedisCache

# Production performance configuration

class PerformanceOptimizedAgent:
    def __init__(self):
        self.cache = RedisCache(
            host="localhost",
            port=6379,
            ttl=3600  # 1 hour cache
        )

        self.agent = Agent(
            name="OptimizedAgent",
            model="gpt-4",
            # Performance settings
            temperature=0.7,
            max_tokens=1000,
            # Caching
            cache=self.cache,
            # Connection pooling
            max_connections=20
        )

    async def run_cached(self, message: str) -> str:
        """Run with intelligent caching."""
        cache_key = f"agent_response_{hash(message)}"

        # Check cache first
        cached_response = await self.cache.get(cache_key)
        if cached_response:
            return cached_response

        # Generate and cache response
        response = await self.agent.arun(message)
        await self.cache.set(cache_key, response.content)

        return response.content

# Usage

optimized_agent = PerformanceOptimizedAgent()
result = await optimized_agent.run_cached("Common query")

Part 3: Deployment Fundamentals (25 minutes)

Cognitive Load: 4 new concepts Learning Mode: Application & Deployment

Docker Deployment (8 minutes)

Containerizing Agno applications requires careful configuration of the container environment, dependencies, and health checks. This Dockerfile demonstrates production-ready containerization with proper security and monitoring:

File: src/session8/Dockerfile - Production container configuration

# Dockerfile for Agno production deployment

FROM python:3.11-slim

WORKDIR /app

# Install dependencies

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code

COPY . .

# Environment variables

ENV PYTHONPATH=/app
ENV AGNO_ENV=production

# Health check

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health')"

# Run application

EXPOSE 8000
CMD ["python", "main.py"]

This production server implementation shows how to expose Agno agents through a REST API with proper error handling, health checks, and request validation:

File: src/session8/agno_foundation.py - Production API server

# main.py - Production server

from fastapi import FastAPI, HTTPException
from agno import Agent
from pydantic import BaseModel
import uvicorn

app = FastAPI(title="Agno Production API")

# Initialize agent

production_agent = Agent(
    name="ProductionAPI",
    model="gpt-4",
    storage=PostgresStorage(),
    monitoring=True
)

class QueryRequest(BaseModel):
    message: str
    session_id: str

@app.post("/query")
async def query_agent(request: QueryRequest):
    try:
        response = await production_agent.arun(
            request.message,
            session_id=request.session_id
        )
        return {"response": response.content}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
async def health_check():
    return {"status": "healthy", "service": "agno-api"}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Basic Scaling (7 minutes)

Simple scaling strategies involve load balancing, service replication, and shared resources like databases and caches. This Docker Compose configuration demonstrates horizontal scaling with multiple service instances:

File: src/session8/docker-compose.yml - Scaling configuration

# docker-compose.yml for basic scaling

version: '3.8'

services:
  agno-api:
    build: .
    ports:
      - "8000-8002:8000"
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/agno
      - REDIS_URL=redis://redis:6379
    depends_on:
      - postgres
      - redis
    deploy:
      replicas: 3

  postgres:
    image: postgres:13
    environment:
      - POSTGRES_DB=agno
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
    volumes:
      - postgres_data:/var/lib/postgresql/data

  redis:
    image: redis:alpine

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - agno-api

volumes:
  postgres_data:

Health Monitoring (5 minutes)

Production health checks must verify all system dependencies including database connectivity, cache availability, and agent responsiveness. This comprehensive health checker provides detailed system status information:

File: src/session8/team_coordination.py - Health monitoring and system checks

from agno.monitoring import HealthChecker
import asyncio

class ProductionHealthChecker:
    def __init__(self, agent: Agent):
        self.agent = agent
        self.health_checker = HealthChecker(agent)

    async def comprehensive_health_check(self):
        """Comprehensive system health verification."""
        checks = {
            "agent_responsive": False,
            "database_connected": False,
            "cache_available": False,
            "memory_usage": "unknown"
        }

        try:
            # Test agent responsiveness
            response = await self.agent.arun("health check", timeout=5)
            checks["agent_responsive"] = bool(response.content)

            # Test database connection
            if hasattr(self.agent.storage, 'test_connection'):
                checks["database_connected"] = await self.agent.storage.test_connection()

            # Test cache availability
            if hasattr(self.agent, 'cache'):
                checks["cache_available"] = await self.agent.cache.ping()

            # Check memory usage
            import psutil
            memory = psutil.virtual_memory()
            checks["memory_usage"] = f"{memory.percent}%"

        except Exception as e:
            checks["error"] = str(e)

        return checks

# Usage in FastAPI

@app.get("/health/detailed")
async def detailed_health():
    health_checker = ProductionHealthChecker(production_agent)
    return await health_checker.comprehensive_health_check()

Security Essentials (5 minutes)

Basic production security:

from fastapi import FastAPI, HTTPException, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt

app = FastAPI()
security = HTTPBearer()

def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
    """Verify JWT token for API access."""
    try:
        payload = jwt.decode(
            credentials.credentials, 
            "your-secret-key",
            algorithms=["HS256"]
        )
        return payload
    except jwt.ExpiredSignatureError:
        raise HTTPException(status_code=401, detail="Token expired")
    except jwt.InvalidTokenError:
        raise HTTPException(status_code=401, detail="Invalid token")

@app.post("/secure-query")
async def secure_query(
    request: QueryRequest, 
    user_info = Depends(verify_token)
):
    # Log user access
    logging.info(f"User {user_info.get('user_id')} made query")

    # Rate limiting (simplified)
    # In production, use Redis-based rate limiting

    response = await production_agent.arun(request.message)
    return {"response": response.content, "user": user_info["user_id"]}

Part 4: Production Validation (5 minutes)

Cognitive Load: 3 new concepts Learning Mode: Verification & Checklist

Production Readiness Checklist (5 minutes)

Essential verification steps:

class ProductionReadinessChecker:
    def __init__(self, agent: Agent):
        self.agent = agent

    async def validate_production_readiness(self):
        """Comprehensive production readiness assessment."""
        checklist = {
            "✅ Configuration": {
                "environment_variables": self._check_env_vars(),
                "database_configured": self._check_database(),
                "monitoring_enabled": self._check_monitoring()
            },
            "✅ Performance": {
                "response_time": await self._check_response_time(),
                "concurrent_handling": await self._check_concurrency(),
                "resource_limits": self._check_resource_limits()
            },
            "✅ Reliability": {
                "error_handling": self._check_error_handling(),
                "retry_logic": self._check_retry_logic(),
                "graceful_degradation": self._check_degradation()
            },
            "✅ Security": {
                "authentication": self._check_auth(),
                "input_validation": self._check_validation(),
                "rate_limiting": self._check_rate_limits()
            }
        }

        return checklist

    def _check_env_vars(self) -> bool:
        """Check required environment variables."""
        required_vars = ["DATABASE_URL", "API_KEY", "SECRET_KEY"]
        import os
        return all(os.getenv(var) for var in required_vars)

    async def _check_response_time(self) -> str:
        """Measure average response time."""
        import time
        start = time.time()
        await self.agent.arun("test query")
        duration = time.time() - start
        return f"{duration:.2f}s"

# Quick validation

checker = ProductionReadinessChecker(production_agent)
readiness = await checker.validate_production_readiness()

Core Section Validation (5 minutes)

Quick Implementation Exercise

Deploy a basic production Agno agent:

# Challenge: Create a production-ready agent service

def create_production_service():
    # 1. Set up agent with monitoring
    # 2. Add error handling
    # 3. Configure health checks
    # 4. Add basic security
    pass

# Test your implementation

service = create_production_service()
health = await service.check_health()

Self-Assessment Checklist

  • I understand production vs development mindset
  • I can set up monitoring and error handling
  • I can deploy agents with Docker and scaling
  • I understand production security basics
  • I'm ready to explore optional modules or move to next session

Next Session Prerequisites: ✅ Core Section Complete Recommended: Explore advanced modules for enterprise deployment

Choose Your Next Path:

Complete Learning Path Options

Production Focus: Core → Module A → Module B
Performance Focus: Core → Module C → Module A
Security Focus: Core → Module D → Module A
Enterprise Complete: Core → All Modules


LEARNING REINFORCEMENT

Spaced Repetition Schedule

  • Day 1: Complete core production concepts ✅
  • Day 3: Review monitoring patterns (10 min)
  • Week 1: Deploy a test production system (45 min)
  • Week 2: Implement full monitoring stack (60 min)

Cross-Session Integration

Connections to Other Sessions: - Session 6: Build on atomic agents for production - Session 9: Multi-agent patterns in production - Session 10: Enterprise integration patterns

Knowledge Application Projects

  1. Simple: Deploy a monitored agent with Docker
  2. Intermediate: Create auto-scaling agent service
  3. Advanced: Build enterprise multi-agent platform

Progress Tracking

Completion Status

  • Core Section (85 min) - Essential for next session
  • Module A: Monitoring & Observability (65 min)
  • Module B: Enterprise Scaling (70 min)
  • Module C: Performance Optimization (50 min)
  • Module D: Security & Compliance (45 min)

Time Investment Tracking

  • Minimum Path: 85 minutes (Core only)
  • Production Path: 150 minutes (Core + Module A)
  • Enterprise Path: 155 minutes (Core + Module B)
  • Comprehensive Path: 315 minutes (Core + all modules)

Next Steps

  • To Session 9: Complete Core Section
  • For Production Deployment: Add Module A
  • For Enterprise Scale: Add Module B
  • For Security Focus: Add Module D

Multiple Choice Test - Session 8 (15 minutes)

Test your understanding of Agno production-ready agent systems.

Question 1

What is Agno's primary advantage over other agent frameworks?

A) Simplest learning curve
B) Performance optimization and production focus
C) Best documentation
D) Largest community

Question 2

How does Agno achieve better performance than traditional frameworks?

A) Better algorithms
B) Optimized memory usage and faster agent instantiation
C) More CPU cores
D) Cloud-only deployment

Question 3

What is the purpose of circuit breaker patterns in production agents?

A) Improve performance
B) Prevent cascading failures in distributed systems
C) Reduce costs
D) Simplify deployment

Question 4

How should you handle API rate limits in production agent systems?

A) Ignore them
B) Exponential backoff and jitter strategies
C) Faster requests
D) Multiple API keys

Question 5

What makes a health check endpoint effective?

A) Fast response time only
B) Comprehensive dependency and resource checks
C) Simple HTTP 200 response
D) Authentication requirements

Question 6

Which monitoring approach is most suitable for production agents?

A) Log files only
B) Comprehensive metrics with alerting and observability
C) Manual monitoring
D) Error counts only

Question 7

How should production agent configurations be managed?

A) Hard-coded values
B) Environment variables and external config management
C) Database storage
D) Code comments

Question 8

What is the most important aspect of production error handling?

A) Hiding errors from users
B) Graceful degradation with meaningful error responses
C) Immediate system restart
D) Detailed error messages to all users

Question 9

How should you approach scaling production agent systems?

A) Vertical scaling only
B) Horizontal scaling with load balancing and auto-scaling
C) Manual scaling
D) Single instance deployment

Question 10

What security measures are essential for production agents?

A) Password protection only
B) Authentication, authorization, encryption, and audit logging
C) Network isolation only
D) No security needed

🗂️ View Test Solutions →

Previous: Session 7 - First ADK Agent

Optional Deep Dive Modules:

Next: Session 9 - Multi-Agent Patterns →