Centralized Logging Implementation

Module 28: DevOps & Deployment

Introduction to Centralized Logging

In our previous lecture, we explored application monitoring focusing on metrics—quantitative measurements of system behavior. Now, we turn our attention to logging, which provides qualitative insights into application behavior through event-based records.

Imagine trying to diagnose a mysterious car problem. Metrics are like your dashboard gauges—they tell you if the engine is overheating or if you're low on fuel. Logs are like the detailed service history and the mechanic's notes—they tell you what happened, when, and in what sequence. Both are essential for a complete picture.

In modern distributed systems, logs from individual components must be brought together into a centralized system for effective analysis. This is what we call centralized logging.

flowchart TD A[Microservice A] --> |logs| C[Centralized Logging System] B[Microservice B] --> |logs| C D[Legacy Application] --> |logs| C E[Frontend Application] --> |logs| C F[Database] --> |logs| C G[Load Balancer] --> |logs| C C --> H[Search & Analysis] C --> I[Visualization] C --> J[Alerting]

Why Centralized Logging Matters

Centralized logging is not just a convenience—it's a necessity for modern applications. Here's why:

Challenges of Distributed Systems

Benefits of Centralized Logging

A real-world example: A major e-commerce site implemented centralized logging and reduced their mean time to resolution (MTTR) for critical incidents by 45%. What previously took hours to diagnose could now be identified in minutes by correlating logs across services.

The Centralized Logging Architecture

A typical centralized logging system has several key components:

flowchart LR subgraph Sources A[Application Logs] B[System Logs] C[Network Logs] end subgraph Collection D[Log Shippers/Agents] E[Aggregators/Buffers] end subgraph Processing F[Parsing] G[Enrichment] H[Transformation] end subgraph Storage I[Indexes] J[Archives] end subgraph Analysis K[Search] L[Visualization] M[Alerting] end Sources --> Collection Collection --> Processing Processing --> Storage Storage --> Analysis

Log Collection

Collection involves capturing logs from various sources and forwarding them to the central system.

Log Processing

Processing transforms raw logs into a structured, searchable format:

Log Storage

Storage solutions need to handle high-volume write operations while supporting fast queries:

Log Analysis

Analysis tools help extract insights from collected logs:

Popular Centralized Logging Stacks

The ELK Stack

The ELK Stack is one of the most popular open-source logging solutions, consisting of:

flowchart LR A[Application Logs] --> B[Filebeat] B --> C[Logstash] C --> D[Elasticsearch] D --> E[Kibana]

The PLG Stack

A newer alternative using:

flowchart LR A[Application Logs] --> B[Promtail] B --> C[Loki] C --> D[Grafana]

Fluentd + Elasticsearch + Kibana

Popular in Kubernetes environments:

flowchart LR A[Application Logs] --> B[Fluentd] B --> C[Elasticsearch] C --> D[Kibana]

Cloud-based Solutions

Managed services offer logging without the operational overhead:

The choice between these stacks depends on your specific requirements, existing infrastructure, team expertise, and budget. Many organizations use a combination of solutions.

Implementing Structured Logging

Structured logging is the practice of formatting logs as structured data (typically JSON) rather than plain text. This makes logs more machine-readable and easier to parse, search, and analyze.

Benefits of Structured Logging

Structured Logging Implementation

Node.js Example with Winston


const winston = require('winston');

// Define the logger
const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  defaultMeta: { service: 'user-service' },
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' })
  ]
});

// Usage examples
logger.info('User logged in', { 
  userId: '12345', 
  username: 'johndoe', 
  loginTime: new Date().toISOString() 
});

logger.error('Payment failed', { 
  userId: '12345', 
  amount: 99.99, 
  errorCode: 'PAYMENT_DECLINED',
  errorMessage: 'Insufficient funds'
});
          

Sample output:


{
  "level": "info",
  "message": "User logged in",
  "service": "user-service",
  "timestamp": "2025-05-11T15:23:45.678Z",
  "userId": "12345",
  "username": "johndoe",
  "loginTime": "2025-05-11T15:23:45.678Z"
}
          

Python Example with structlog


import structlog
import logging
import sys

# Set up structlog
structlog.configure(
    processors=[
        structlog.stdlib.add_log_level,
        structlog.stdlib.add_logger_name,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ],
    context_class=dict,
    logger_factory=structlog.stdlib.LoggerFactory(),
    wrapper_class=structlog.stdlib.BoundLogger,
    cache_logger_on_first_use=True,
)

# Create logger
logger = structlog.get_logger("payment-service")

# Usage examples
logger.info("payment_processed", 
    user_id="67890", 
    payment_id="PMT123456",
    amount=99.99, 
    currency="USD"
)

logger.error("payment_failed",
    user_id="67890",
    payment_id="PMT123457",
    amount=149.99,
    currency="USD",
    error_code="CARD_EXPIRED",
    error_message="Credit card has expired"
)
          

Java Example with Logback and logstash-logback-encoder


import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import net.logstash.logback.argument.StructuredArguments;

public class PaymentService {
    private static final Logger logger = LoggerFactory.getLogger(PaymentService.class);
    
    public void processPayment(String userId, String paymentId, double amount, String currency) {
        // Process payment logic
        
        // Log successful payment
        logger.info("Payment processed successfully", 
            StructuredArguments.kv("event", "payment_processed"),
            StructuredArguments.kv("user_id", userId),
            StructuredArguments.kv("payment_id", paymentId),
            StructuredArguments.kv("amount", amount),
            StructuredArguments.kv("currency", currency)
        );
    }
    
    public void handlePaymentFailure(String userId, String paymentId, double amount, 
                                    String currency, String errorCode, String errorMessage) {
        logger.error("Payment processing failed", 
            StructuredArguments.kv("event", "payment_failed"),
            StructuredArguments.kv("user_id", userId),
            StructuredArguments.kv("payment_id", paymentId),
            StructuredArguments.kv("amount", amount),
            StructuredArguments.kv("currency", currency),
            StructuredArguments.kv("error_code", errorCode),
            StructuredArguments.kv("error_message", errorMessage)
        );
    }
}
          

Distributed Tracing Integration

Distributed tracing adds context to logs by tracking requests as they flow through microservices. It helps answer questions like:

sequenceDiagram participant User participant API Gateway participant Auth Service participant Product Service participant Cart Service participant Logging System User->>API Gateway: GET /cart API Gateway->>Auth Service: Validate token Auth Service-->>API Gateway: Token valid API Gateway->>Product Service: Get product details Product Service-->>API Gateway: Product details API Gateway->>Cart Service: Get cart items Cart Service-->>API Gateway: Cart items API Gateway-->>User: Cart response Note over API Gateway,Logging System: All services send logs with trace ID API Gateway->>Logging System: Logs with trace ID Auth Service->>Logging System: Logs with trace ID Product Service->>Logging System: Logs with trace ID Cart Service->>Logging System: Logs with trace ID

Implementing Tracing with OpenTelemetry

OpenTelemetry is an open-source observability framework that provides standardized ways to generate, collect, and export telemetry data (traces, metrics, and logs).


// Node.js example with OpenTelemetry
const { NodeTracerProvider } = require('@opentelemetry/node');
const { SimpleSpanProcessor } = require('@opentelemetry/tracing');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');

// Create a tracer provider
const provider = new NodeTracerProvider();

// Configure span processor and exporter
const exporter = new JaegerExporter({
  serviceName: 'user-service',
});
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));

// Register the provider
provider.register();

// Register instrumentations
registerInstrumentations({
  instrumentations: [
    new HttpInstrumentation(),
    new ExpressInstrumentation(),
  ],
});

// Your Express app
const express = require('express');
const app = express();

// Now your application is instrumented, and traces will be sent to Jaeger
app.get('/users/:id', (req, res) => {
  // Custom spans can be added for specific operations
  const tracer = provider.getTracer('user-service');
  const span = tracer.startSpan('fetch-user-data');
  
  // Add attributes to the span
  span.setAttribute('user.id', req.params.id);
  
  // Simulate database operation
  setTimeout(() => {
    // If there's an error, you can record it
    if (Math.random() > 0.8) {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: 'Failed to fetch user data',
      });
      res.status(500).json({ error: 'Internal server error' });
    } else {
      res.json({ id: req.params.id, name: 'John Doe' });
    }
    
    // End the span
    span.end();
  }, 100);
});

app.listen(3000);
          

Once logs include trace IDs, you can correlate them across services to follow the path of a request, making troubleshooting much faster, especially in complex microservices architectures.

Log Management Best Practices

What to Log

Deciding what to log requires balancing detail with performance and storage concerns:

Log Levels and When to Use Them

Using appropriate log levels helps filter logs based on importance:

Log Retention and Rotation

Manage log volume with proper retention policies:

Security Considerations

Protect your logs from unauthorized access and tampering:

Real-world Implementation: ELK Stack

Let's walk through setting up an ELK stack with Docker Compose:


version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.16.3
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    networks:
      - elk

  logstash:
    image: docker.elastic.co/logstash/logstash:7.16.3
    volumes:
      - ./logstash/pipeline:/usr/share/logstash/pipeline
    ports:
      - "5044:5044"
      - "5000:5000/tcp"
      - "5000:5000/udp"
      - "9600:9600"
    depends_on:
      - elasticsearch
    networks:
      - elk

  kibana:
    image: docker.elastic.co/kibana/kibana:7.16.3
    ports:
      - "5601:5601"
    environment:
      ELASTICSEARCH_URL: http://elasticsearch:9200
      ELASTICSEARCH_HOSTS: http://elasticsearch:9200
    depends_on:
      - elasticsearch
    networks:
      - elk

  filebeat:
    image: docker.elastic.co/beats/filebeat:7.16.3
    volumes:
      - ./filebeat/config/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    depends_on:
      - elasticsearch
      - logstash
    networks:
      - elk

networks:
  elk:
    driver: bridge

volumes:
  elasticsearch-data:
          

Logstash configuration example (pipeline/logstash.conf):


input {
  beats {
    port => 5044
  }
  tcp {
    port => 5000
    codec => json
  }
}

filter {
  if [fields][log_type] == "access_log" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
    geoip {
      source => "clientip"
    }
  }
  
  if [fields][log_type] == "app_log" {
    json {
      source => "message"
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "%{[fields][log_type]}-%{+YYYY.MM.dd}"
  }
}
          

Filebeat configuration example (config/filebeat.yml):


filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
  fields:
    log_type: access_log

- type: log
  enabled: true
  paths:
    - /var/log/app/*.log
  fields:
    log_type: app_log
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: log

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.template.enabled: false
setup.kibana.enabled: false

output.logstash:
  hosts: ["logstash:5044"]
          

Advanced Log Analysis Techniques

Log Query Languages

Each logging platform has its own query language for searching and analyzing logs:

Example Kibana Query Language (KQL) queries:


# Find all logs with an error level
level:error

# Find logs for a specific user
user.id:"12345"

# Find payment errors with a specific code
level:error AND event:payment_failed AND error_code:PAYMENT_DECLINED

# Find slow requests (taking more than 500ms)
response.time > 500

# Complex query with time range
service:"user-service" AND level:error AND @timestamp > "2025-05-10T00:00:00Z"
          

Log Visualization

Visualization helps identify patterns and anomalies that might not be obvious in raw logs:

Alerting on Logs

Set up alerts to be notified of important events or patterns in your logs:


# Elasticsearch Watcher alert example (simplified)
{
  "trigger": {
    "schedule": {
      "interval": "5m"
    }
  },
  "input": {
    "search": {
      "request": {
        "indices": ["app_log-*"],
        "body": {
          "query": {
            "bool": {
              "must": [
                { "match": { "level": "error" } },
                { "match": { "event": "payment_failed" } },
                { "range": { "@timestamp": { "gte": "now-5m" } } }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 10
      }
    }
  },
  "actions": {
    "send_email": {
      "email": {
        "to": "alerts@example.com",
        "subject": "Payment Error Alert",
        "body": "There have been {{ctx.payload.hits.total}} payment errors in the last 5 minutes."
      }
    }
  }
}
          

Practical Exercise

Let's put these concepts into practice:

  1. Set Up ELK Stack:
    • Use the Docker Compose configuration provided above
    • Configure Logstash to process logs from different sources
    • Set up Filebeat to collect logs from your application
  2. Implement Structured Logging:
    • Choose a language (Node.js, Python, Java) and implement structured logging
    • Ensure logs include consistent fields (timestamp, level, service name, message)
    • Add context-specific fields based on the type of event
  3. Create Kibana Dashboards:
    • Set up index patterns in Kibana
    • Create visualizations for common log metrics
    • Build a dashboard combining multiple visualizations
    • Configure alerts for critical error conditions

Further Learning Resources

Summary

Centralized logging is a critical component of modern application observability. We've covered key concepts including:

Remember that effective logging is not just about collecting data—it's about making that data actionable and insightful. By implementing the strategies we've discussed, you'll be better equipped to troubleshoot issues, understand system behavior, and improve your applications.

In our next lecture, we'll explore how to combine metrics and logs in a comprehensive monitoring solution using Prometheus and Grafana.