Observability with OpenTelemetry: Monitoring Microservices in Production
Implement complete observability in your microservices with OpenTelemetry, Prometheus, and Grafana. Learn to configure distributed traces, custom metrics, and log correlation for production debugging.
This content is free! Help keep the project running.
0737160d-e98f-4a65-8392-5dba70e7ff3eThis is the fourth article in our microservices series. If you haven't read the previous articles, check out the microservices guide, API Gateway with Kong, and messaging with RabbitMQ.
Why Observability?
In distributed systems, debugging is exponentially harder. A request passes through multiple services, each with its own logs, metrics, and states. Without proper observability, finding the root cause of a problem is like looking for a needle in a haystack.
The Three Pillars of Observability
┌─────────────────────────────────────────────────────────────┐
│ OBSERVABILITY │
├───────────────────┬───────────────────┬───────────────────┤
│ TRACES │ METRICS │ LOGS │
│ │ │ │
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐ │
│ │ Distributed │ │ │ Counters │ │ │ Structured │ │
│ │ Request │ │ │ Histograms │ │ │ JSON │ │
│ │ Latency │ │ │ Gauges │ │ │ Context │ │
│ │ Errors │ │ │ Percentiles│ │ │ TraceID │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │
│ │ │ │
│ "What happened │ "How is the │ "Why did it │
│ in this request?│ system behaving?│ happen?" │
│ │ │ │
└───────────────────┴───────────────────┴───────────────────┘
OpenTelemetry: The Industry Standard
OpenTelemetry (OTel) is a CNCF project that provides APIs, SDKs, and tools to collect telemetry (traces, metrics, and logs) in a standardized and vendor-neutral way.
OpenTelemetry Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ APPLICATION │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Auto-instr. │ │ Manual-instr. │ │ Baggage │ │
│ │ (HTTP, gRPC) │ │ (Custom) │ │ (Context) │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ OTel SDK │ │
│ │ ┌───────────────┐ │ │
│ │ │ Processor │ │ │
│ │ │ Sampler │ │ │
│ │ │ Exporter │ │ │
│ │ └───────────────┘ │ │
│ └──────────┬──────────┘ │
└───────────────────────────────┼─────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ OTel Collector │
│ ┌───────────────┐ │
│ │ Receivers │──┼──► OTLP, Jaeger, Zipkin
│ │ Processors │──┼──► Batch, Filter, Transform
│ │ Exporters │──┼──► Jaeger, Prometheus, Loki
│ └───────────────┘ │
└──────────┬──────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Jaeger │ │ Prometheus │ │ Loki │
│ (Traces) │ │ (Metrics) │ │ (Logs) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────┼────────────────┘
▼
┌─────────────┐
│ Grafana │
│ (Dashboard) │
└─────────────┘
Project Structure
observability-service/
├── src/
│ ├── instrumentation/
│ │ ├── index.ts # Main OTel setup
│ │ ├── tracing.ts # Trace configuration
│ │ ├── metrics.ts # Metrics configuration
│ │ └── logging.ts # Log configuration
│ ├── middleware/
│ │ ├── request-context.ts # Request context
│ │ ├── metrics.middleware.ts # HTTP metrics
│ │ └── logging.middleware.ts # Structured logs
│ ├── utils/
│ │ ├── trace-context.ts # Trace utilities
│ │ ├── custom-metrics.ts # Custom metrics
│ │ └── log-formatter.ts # Log formatting
│ ├── exporters/
│ │ ├── jaeger.ts # Jaeger exporter
│ │ ├── prometheus.ts # Prometheus exporter
│ │ └── loki.ts # Loki exporter
│ └── app.ts
├── docker/
│ ├── otel-collector-config.yaml
│ ├── prometheus.yml
│ ├── loki-config.yaml
│ └── grafana/
│ └── dashboards/
│ └── microservices.json
├── docker-compose.observability.yml
└── package.json
OpenTelemetry SDK Configuration
Installation
# Core OpenTelemetrynpm install @opentelemetry/api @opentelemetry/sdk-node # Auto-instrumentationnpm install @opentelemetry/auto-instrumentations-node # Exportersnpm install @opentelemetry/exporter-trace-otlp-httpnpm install @opentelemetry/exporter-metrics-otlp-httpnpm install @opentelemetry/exporter-logs-otlp-http # Resources and semanticsnpm install @opentelemetry/resourcesnpm install @opentelemetry/semantic-conventionsMain Setup
// src/instrumentation/index.tsimport { NodeSDK } from '@opentelemetry/sdk-node';import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';import { OTLPLogExporter } from '@opentelemetry/exporter-logs-otlp-http';import { Resource } from '@opentelemetry/resources';import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION, SEMRESATTRS_DEPLOYMENT_ENVIRONMENT,} from '@opentelemetry/semantic-conventions';import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';import { BatchLogRecordProcessor } from '@opentelemetry/sdk-logs';import { diag, DiagConsoleLogger, DiagLogLevel } from '@opentelemetry/api'; // Configure diagnostics for debuggingif (process.env.OTEL_DEBUG === 'true') { diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);} // Resource configuration (identifies the service)const resource = new Resource({ [SEMRESATTRS_SERVICE_NAME]: process.env.SERVICE_NAME || 'unknown-service', [SEMRESATTRS_SERVICE_VERSION]: process.env.SERVICE_VERSION || '1.0.0', [SEMRESATTRS_DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development', 'service.instance.id': process.env.HOSTNAME || 'local', 'service.namespace': 'microservices',}); // Exporter configurationconst traceExporter = new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces', headers: { 'x-api-key': process.env.OTEL_API_KEY || '', },}); const metricExporter = new OTLPMetricExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/metrics',}); const logExporter = new OTLPLogExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/logs',}); // SDK configurationconst sdk = new NodeSDK({ resource, traceExporter, metricReader: new PeriodicExportingMetricReader({ exporter: metricExporter, exportIntervalMillis: 15000, // Export metrics every 15s }), logRecordProcessor: new BatchLogRecordProcessor(logExporter), instrumentations: [ getNodeAutoInstrumentations({ // Specific configuration per instrumentation '@opentelemetry/instrumentation-http': { requestHook: (span, request) => { span.setAttribute('http.request.id', request.headers['x-request-id'] || ''); }, responseHook: (span, response) => { span.setAttribute('http.response.content_length', response.headers['content-length'] || 0); }, ignoreIncomingRequestHook: (request) => { // Ignore health checks return request.url === '/health' || request.url === '/ready'; }, }, '@opentelemetry/instrumentation-express': { enabled: true, }, '@opentelemetry/instrumentation-pg': { enhancedDatabaseReporting: true, }, '@opentelemetry/instrumentation-redis': { enabled: true, }, '@opentelemetry/instrumentation-amqplib': { enabled: true, // RabbitMQ }, }), ],}); // Initializationexport async function initTelemetry(): Promise<void> { try { await sdk.start(); console.log('OpenTelemetry initialized successfully'); // Graceful shutdown process.on('SIGTERM', async () => { try { await sdk.shutdown(); console.log('OpenTelemetry shut down successfully'); } catch (error) { console.error('Error shutting down OpenTelemetry', error); } }); } catch (error) { console.error('Error initializing OpenTelemetry', error); throw error; }} export { sdk };Application Entry Point
// src/index.tsimport { initTelemetry } from './instrumentation'; // IMPORTANT: Initialize telemetry first!async function bootstrap() { await initTelemetry(); // Now import the rest of the application const { createApp } = await import('./app'); const app = await createApp(); const port = process.env.PORT || 3000; app.listen(port, () => { console.log(`Server running on port ${port}`); });} bootstrap().catch(console.error);Distributed Tracing
Distributed tracing allows you to follow a request through multiple services.
Fundamental Concepts
┌─────────────────────────────────────────────────────────────────┐
│ TRACE │
│ TraceID: abc123 │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ SPAN: API Gateway (Root Span) │ │
│ │ SpanID: span-1, ParentID: null │ │
│ │ Duration: 250ms │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ SPAN: User Service │ │ │
│ │ │ SpanID: span-2, ParentID: span-1 │ │ │
│ │ │ Duration: 50ms │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ SPAN: Order Service │ │ │
│ │ │ SpanID: span-3, ParentID: span-1 │ │ │
│ │ │ Duration: 150ms │ │ │
│ │ │ ┌────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ SPAN: Database Query │ │ │ │
│ │ │ │ SpanID: span-4, ParentID: span-3 │ │ │ │
│ │ │ │ Duration: 45ms │ │ │ │
│ │ │ └────────────────────────────────────────────────────┘ │ │ │
│ │ │ ┌────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ SPAN: RabbitMQ Publish │ │ │ │
│ │ │ │ SpanID: span-5, ParentID: span-3 │ │ │ │
│ │ │ │ Duration: 10ms │ │ │ │
│ │ │ └────────────────────────────────────────────────────┘ │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Manual Span Instrumentation
// src/utils/trace-context.tsimport { trace, SpanStatusCode, SpanKind, context, propagation } from '@opentelemetry/api';import type { Span, SpanOptions, Context } from '@opentelemetry/api'; const tracer = trace.getTracer('microservice-tracer', '1.0.0'); // Decorator for automatic tracingexport function Traced( spanName?: string, options?: SpanOptions): MethodDecorator { return function ( target: any, propertyKey: string | symbol, descriptor: PropertyDescriptor ) { const originalMethod = descriptor.value; const name = spanName || `${target.constructor.name}.${String(propertyKey)}`; descriptor.value = async function (...args: any[]) { return tracer.startActiveSpan(name, options || {}, async (span: Span) => { try { // Add parameters as attributes (careful with sensitive data!) span.setAttribute('method.arguments.count', args.length); const result = await originalMethod.apply(this, args); span.setStatus({ code: SpanStatusCode.OK }); return result; } catch (error) { span.setStatus({ code: SpanStatusCode.ERROR, message: error instanceof Error ? error.message : 'Unknown error', }); span.recordException(error as Error); throw error; } finally { span.end(); } }); }; return descriptor; };} // Create span manuallyexport function createSpan( name: string, fn: (span: Span) => Promise<any>, options?: SpanOptions): Promise<any> { return tracer.startActiveSpan(name, options || {}, async (span) => { try { const result = await fn(span); span.setStatus({ code: SpanStatusCode.OK }); return result; } catch (error) { span.setStatus({ code: SpanStatusCode.ERROR, message: error instanceof Error ? error.message : 'Unknown error', }); span.recordException(error as Error); throw error; } finally { span.end(); } });} // Extract/inject context for propagationexport function extractContext(headers: Record<string, string>): Context { return propagation.extract(context.active(), headers);} export function injectContext(headers: Record<string, string>): void { propagation.inject(context.active(), headers);} // Add events to a spanexport function addSpanEvent( eventName: string, attributes?: Record<string, string | number | boolean>): void { const span = trace.getActiveSpan(); if (span) { span.addEvent(eventName, attributes); }} // Get current trace IDexport function getCurrentTraceId(): string | undefined { const span = trace.getActiveSpan(); return span?.spanContext().traceId;} // Get current span IDexport function getCurrentSpanId(): string | undefined { const span = trace.getActiveSpan(); return span?.spanContext().spanId;}Usage in Services
// src/services/order.service.tsimport { Traced, createSpan, addSpanEvent } from '../utils/trace-context';import { trace, SpanKind } from '@opentelemetry/api'; export class OrderService { private readonly tracer = trace.getTracer('order-service'); @Traced('OrderService.createOrder', { kind: SpanKind.INTERNAL }) async createOrder(orderData: CreateOrderDTO): Promise<Order> { addSpanEvent('order.validation.started'); // Validation await this.validateOrder(orderData); addSpanEvent('order.validation.completed'); // Create child span for specific operation const order = await createSpan('order.save', async (span) => { span.setAttribute('order.items.count', orderData.items.length); span.setAttribute('order.total', orderData.total); const savedOrder = await this.orderRepository.save(orderData); span.setAttribute('order.id', savedOrder.id); return savedOrder; }); // Publish event await this.publishOrderCreated(order); return order; } @Traced('OrderService.validateOrder') private async validateOrder(orderData: CreateOrderDTO): Promise<void> { // Validation with automatic spans await this.validateStock(orderData.items); await this.validatePayment(orderData.paymentMethod); } private async publishOrderCreated(order: Order): Promise<void> { // Span for messaging await createSpan( 'rabbitmq.publish.order_created', async (span) => { span.setAttribute('messaging.system', 'rabbitmq'); span.setAttribute('messaging.destination', 'orders.created'); span.setAttribute('messaging.message_id', order.id); await this.messagePublisher.publish('orders.created', { orderId: order.id, timestamp: new Date().toISOString(), }); }, { kind: SpanKind.PRODUCER } ); }}Context Propagation Between Services
// src/middleware/request-context.tsimport { Request, Response, NextFunction } from 'express';import { context, propagation, trace } from '@opentelemetry/api';import { v4 as uuidv4 } from 'uuid'; export interface RequestContext { traceId: string; spanId: string; requestId: string; userId?: string; correlationId: string;} declare global { namespace Express { interface Request { context: RequestContext; } }} export function requestContextMiddleware( req: Request, res: Response, next: NextFunction): void { // Extract propagation context (if exists) const extractedContext = propagation.extract(context.active(), req.headers); context.with(extractedContext, () => { const span = trace.getActiveSpan(); const spanContext = span?.spanContext(); // Create request context req.context = { traceId: spanContext?.traceId || uuidv4().replace(/-/g, ''), spanId: spanContext?.spanId || uuidv4().replace(/-/g, '').substring(0, 16), requestId: req.headers['x-request-id'] as string || uuidv4(), userId: req.headers['x-user-id'] as string, correlationId: req.headers['x-correlation-id'] as string || uuidv4(), }; // Add response headers for debugging res.setHeader('x-trace-id', req.context.traceId); res.setHeader('x-request-id', req.context.requestId); // Add attributes to current span if (span) { span.setAttribute('request.id', req.context.requestId); span.setAttribute('correlation.id', req.context.correlationId); if (req.context.userId) { span.setAttribute('user.id', req.context.userId); } } next(); });} // Helper to propagate context in HTTP callsexport function getTracingHeaders(): Record<string, string> { const headers: Record<string, string> = {}; propagation.inject(context.active(), headers); return headers;}HTTP Client with Automatic Propagation
// src/utils/http-client.tsimport axios, { AxiosInstance, AxiosRequestConfig } from 'axios';import { getTracingHeaders, getCurrentTraceId } from './trace-context'; export function createTracedHttpClient(baseURL: string): AxiosInstance { const client = axios.create({ baseURL }); // Interceptor to add tracing headers client.interceptors.request.use((config) => { const tracingHeaders = getTracingHeaders(); config.headers = { ...config.headers, ...tracingHeaders, 'x-trace-id': getCurrentTraceId(), }; return config; }); // Interceptor for error logging client.interceptors.response.use( (response) => response, (error) => { const traceId = getCurrentTraceId(); console.error(`HTTP Error [trace: ${traceId}]:`, { url: error.config?.url, method: error.config?.method, status: error.response?.status, message: error.message, }); throw error; } ); return client;}Custom Metrics
Metric Types
// src/instrumentation/metrics.tsimport { metrics, ValueType } from '@opentelemetry/api'; const meter = metrics.getMeter('microservice-metrics', '1.0.0'); // Counter - values that only increaseexport const httpRequestsTotal = meter.createCounter('http_requests_total', { description: 'Total number of HTTP requests', unit: '1',}); // UpDownCounter - values that can increase or decreaseexport const activeConnections = meter.createUpDownCounter('active_connections', { description: 'Number of active connections', unit: '1',}); // Histogram - value distributionexport const httpRequestDuration = meter.createHistogram('http_request_duration_seconds', { description: 'Duration of HTTP requests in seconds', unit: 's', advice: { explicitBucketBoundaries: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10], },}); // Observable Gauge - current value that is observedexport const memoryUsage = meter.createObservableGauge('process_memory_bytes', { description: 'Process memory usage in bytes', unit: 'By',}); memoryUsage.addCallback((result) => { const usage = process.memoryUsage(); result.observe(usage.heapUsed, { type: 'heap_used' }); result.observe(usage.heapTotal, { type: 'heap_total' }); result.observe(usage.rss, { type: 'rss' }); result.observe(usage.external, { type: 'external' });}); // Observable Counter - observable counterexport const cpuUsage = meter.createObservableCounter('process_cpu_seconds_total', { description: 'Total CPU time spent in seconds', unit: 's',}); let previousCpuUsage = process.cpuUsage();cpuUsage.addCallback((result) => { const currentCpuUsage = process.cpuUsage(previousCpuUsage); result.observe((currentCpuUsage.user + currentCpuUsage.system) / 1e6, {}); previousCpuUsage = process.cpuUsage();});Business Metrics
// src/utils/business-metrics.tsimport { metrics } from '@opentelemetry/api'; const meter = metrics.getMeter('business-metrics', '1.0.0'); // Order metricsexport const ordersCreated = meter.createCounter('orders_created_total', { description: 'Total orders created',}); export const orderValue = meter.createHistogram('order_value_dollars', { description: 'Order value distribution', unit: 'USD', advice: { explicitBucketBoundaries: [10, 25, 50, 100, 250, 500, 1000, 2500, 5000], },}); export const orderProcessingTime = meter.createHistogram('order_processing_duration_seconds', { description: 'Time to process an order', unit: 's',}); // User metricsexport const activeUsers = meter.createUpDownCounter('active_users', { description: 'Number of currently active users',}); export const userRegistrations = meter.createCounter('user_registrations_total', { description: 'Total user registrations',}); // Stock metricsexport const stockLevel = meter.createObservableGauge('stock_level', { description: 'Current stock level by product',}); // Payment metricsexport const paymentAttempts = meter.createCounter('payment_attempts_total', { description: 'Total payment attempts',}); export const paymentAmount = meter.createHistogram('payment_amount_dollars', { description: 'Payment amount distribution', unit: 'USD',}); // Helper to record order metricsexport function recordOrderMetrics(order: { id: string; total: number; items: number; processingTimeMs: number; paymentMethod: string; region: string;}) { const labels = { payment_method: order.paymentMethod, region: order.region, }; ordersCreated.add(1, labels); orderValue.record(order.total, labels); orderProcessingTime.record(order.processingTimeMs / 1000, labels);}HTTP Metrics Middleware
// src/middleware/metrics.middleware.tsimport { Request, Response, NextFunction } from 'express';import { httpRequestsTotal, httpRequestDuration, activeConnections } from '../instrumentation/metrics'; export function metricsMiddleware( req: Request, res: Response, next: NextFunction): void { const startTime = process.hrtime.bigint(); // Increment active connections activeConnections.add(1); // Common labels const labels = { method: req.method, route: req.route?.path || req.path, host: req.hostname, }; // When response finishes res.on('finish', () => { const endTime = process.hrtime.bigint(); const durationSeconds = Number(endTime - startTime) / 1e9; const finalLabels = { ...labels, status_code: res.statusCode.toString(), status_class: `${Math.floor(res.statusCode / 100)}xx`, }; // Record metrics httpRequestsTotal.add(1, finalLabels); httpRequestDuration.record(durationSeconds, finalLabels); activeConnections.add(-1); }); // Error/timeout case res.on('close', () => { if (!res.writableEnded) { activeConnections.add(-1); } }); next();}Structured Logs
Logger Configuration
// src/instrumentation/logging.tsimport { logs, SeverityNumber } from '@opentelemetry/api-logs';import { trace, context } from '@opentelemetry/api';import pino from 'pino'; const logger = logs.getLogger('microservice-logger', '1.0.0'); // OpenTelemetry severity levelsconst severityMap: Record<string, SeverityNumber> = { trace: SeverityNumber.TRACE, debug: SeverityNumber.DEBUG, info: SeverityNumber.INFO, warn: SeverityNumber.WARN, error: SeverityNumber.ERROR, fatal: SeverityNumber.FATAL,}; export interface LogContext { [key: string]: unknown;} export function createLogger(serviceName: string) { // Pino for local/console logs const pinoLogger = pino({ level: process.env.LOG_LEVEL || 'info', formatters: { level: (label) => ({ level: label }), bindings: () => ({}), }, timestamp: () => `,"timestamp":"${new Date().toISOString()}"`, base: { service: serviceName, environment: process.env.NODE_ENV, }, }); return { trace: (message: string, ctx?: LogContext) => log('trace', message, ctx), debug: (message: string, ctx?: LogContext) => log('debug', message, ctx), info: (message: string, ctx?: LogContext) => log('info', message, ctx), warn: (message: string, ctx?: LogContext) => log('warn', message, ctx), error: (message: string, ctx?: LogContext) => log('error', message, ctx), fatal: (message: string, ctx?: LogContext) => log('fatal', message, ctx), child: (bindings: Record<string, unknown>) => { return createChildLogger(serviceName, bindings); }, }; function log(level: string, message: string, ctx?: LogContext) { // Log to console via Pino pinoLogger[level as keyof typeof pinoLogger]({ ...ctx }, message); // Log to OpenTelemetry const span = trace.getActiveSpan(); const spanContext = span?.spanContext(); logger.emit({ severityNumber: severityMap[level], severityText: level.toUpperCase(), body: message, attributes: { 'service.name': serviceName, 'log.level': level, ...(spanContext && { 'trace_id': spanContext.traceId, 'span_id': spanContext.spanId, }), ...flattenObject(ctx || {}), }, }); }} function createChildLogger(serviceName: string, bindings: Record<string, unknown>) { const parentLogger = createLogger(serviceName); return { trace: (message: string, ctx?: LogContext) => parentLogger.trace(message, { ...bindings, ...ctx }), debug: (message: string, ctx?: LogContext) => parentLogger.debug(message, { ...bindings, ...ctx }), info: (message: string, ctx?: LogContext) => parentLogger.info(message, { ...bindings, ...ctx }), warn: (message: string, ctx?: LogContext) => parentLogger.warn(message, { ...bindings, ...ctx }), error: (message: string, ctx?: LogContext) => parentLogger.error(message, { ...bindings, ...ctx }), fatal: (message: string, ctx?: LogContext) => parentLogger.fatal(message, { ...bindings, ...ctx }), child: (newBindings: Record<string, unknown>) => createChildLogger(serviceName, { ...bindings, ...newBindings }), };} // Flatten nested objects for attributesfunction flattenObject( obj: Record<string, unknown>, prefix = ''): Record<string, string | number | boolean> { const result: Record<string, string | number | boolean> = {}; for (const [key, value] of Object.entries(obj)) { const newKey = prefix ? `${prefix}.${key}` : key; if (value && typeof value === 'object' && !Array.isArray(value)) { Object.assign(result, flattenObject(value as Record<string, unknown>, newKey)); } else if (typeof value === 'string' || typeof value === 'number' || typeof value === 'boolean') { result[newKey] = value; } else if (value !== undefined && value !== null) { result[newKey] = String(value); } } return result;} export const log = createLogger(process.env.SERVICE_NAME || 'unknown-service');Complete Observability Stack
Docker Compose
# docker-compose.observability.ymlversion: '3.8' services: # OpenTelemetry Collector otel-collector: image: otel/opentelemetry-collector-contrib:0.91.0 container_name: otel-collector command: ["--config=/etc/otel-collector-config.yaml"] volumes: - ./docker/otel-collector-config.yaml:/etc/otel-collector-config.yaml ports: - "4317:4317" # OTLP gRPC - "4318:4318" # OTLP HTTP - "8888:8888" # Prometheus metrics exposed by the collector - "8889:8889" # Prometheus exporter metrics - "13133:13133" # Health check - "55679:55679" # zPages depends_on: - jaeger - prometheus - loki networks: - observability # Jaeger - Distributed Tracing jaeger: image: jaegertracing/all-in-one:1.52 container_name: jaeger ports: - "16686:16686" # UI - "14268:14268" # HTTP collector - "14250:14250" # gRPC collector environment: - COLLECTOR_OTLP_ENABLED=true networks: - observability # Prometheus - Metrics prometheus: image: prom/prometheus:v2.48.0 container_name: prometheus volumes: - ./docker/prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090" networks: - observability # Loki - Log Aggregation loki: image: grafana/loki:2.9.2 container_name: loki ports: - "3100:3100" networks: - observability # Grafana - Visualization grafana: image: grafana/grafana:10.2.2 container_name: grafana environment: - GF_SECURITY_ADMIN_USER=admin - GF_SECURITY_ADMIN_PASSWORD=admin123 ports: - "3001:3000" depends_on: - prometheus - loki - jaeger networks: - observability networks: observability: driver: bridgeProduction Checklist
Instrumentation
- OpenTelemetry SDK configured before other imports
- Auto-instrumentation enabled for HTTP, database, messaging
- Custom spans for critical business operations
- Relevant attributes added to spans
- Errors captured and recorded correctly
Metrics
- RED metrics for all endpoints
- Business metrics defined
- Histograms with appropriate buckets
- Consistent labels across services
- Label cardinality controlled
Logs
- Structured format (JSON)
- Correlation with trace ID
- Appropriate log levels
- Sensitive data masked
- Rotation and retention configured
Alerts
- SLOs defined and monitored
- Alerts for critical metrics
- Runbooks for each alert
- Escalation configured
- Alert tests performed
Infrastructure
- Collector with high availability
- Adequate data retention
- Configuration backup
- Sampling configured for volume
- Adequate resources for stack
Conclusion
Observability is the foundation for operating microservices in production with confidence. The key points are:
- Three Pillars: Traces, metrics, and logs work together to provide complete visibility
- OpenTelemetry: Vendor-neutral standard that simplifies instrumentation
- Correlation: Trace ID connects logs, metrics, and traces from the same request
- SLOs: Define clear objectives and monitor error budgets
- Smart Alerts: Alert on symptoms, not causes
With this complete series, you have all the tools to build robust microservices:
- Microservices Architecture - Fundamentals and patterns
- API Gateway with Kong - Traffic management
- Messaging with RabbitMQ - Asynchronous communication
- Observability with OpenTelemetry (this article) - Monitoring and debugging
Enjoyed the content? Your contribution helps keep everything online and free!
0737160d-e98f-4a65-8392-5dba70e7ff3e