Observabilidad con OpenTelemetry: Monitoreando Microservicios en Producción

Este es el cuarto artículo de nuestra serie sobre microservicios. Si aún no has leído los artículos anteriores, consulta la guía de microservicios, API Gateway con Kong y mensajería con RabbitMQ.

¿Por qué Observabilidad?

En sistemas distribuidos, el debugging es exponencialmente más difícil. Una solicitud pasa por múltiples servicios, cada uno con sus propios logs, métricas y estados. Sin observabilidad adecuada, encontrar la causa raíz de un problema es como buscar una aguja en un pajar.

Los Tres Pilares de la Observabilidad

┌─────────────────────────────────────────────────────────────┐
│                    OBSERVABILIDAD                           │
├───────────────────┬───────────────────┬───────────────────┤
│      TRACES       │     MÉTRICAS      │       LOGS        │
│                   │                   │                   │
│  ┌─────────────┐  │  ┌─────────────┐  │  ┌─────────────┐  │
│  │ Solicitud   │  │  │  Contadores │  │  │ Estructura  │  │
│  │ Distribuida │  │  │  Histogramas│  │  │    JSON     │  │
│  │   Latencia  │  │  │   Gauges    │  │  │  Contexto   │  │
│  │   Errores   │  │  │  Percentiles│  │  │   TraceID   │  │
│  └─────────────┘  │  └─────────────┘  │  └─────────────┘  │
│                   │                   │                   │
│  "¿Qué pasó en    │  "¿Cómo se está   │  "¿Por qué       │
│   esta request?"  │   comportando     │   sucedió?"      │
│                   │   el sistema?"    │                   │
└───────────────────┴───────────────────┴───────────────────┘

OpenTelemetry: El Estándar de la Industria

OpenTelemetry (OTel) es un proyecto CNCF que proporciona APIs, SDKs y herramientas para recopilar telemetría (traces, métricas y logs) de forma estandarizada y vendor-neutral.

Arquitectura OpenTelemetry

┌─────────────────────────────────────────────────────────────────────┐
│                         APLICACIÓN                                  │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐     │
│  │   Auto-instr.   │  │  Manual-instr.  │  │    Baggage      │     │
│  │  (HTTP, gRPC)   │  │   (Custom)      │  │   (Context)     │     │
│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘     │
│           │                    │                    │               │
│           └────────────────────┼────────────────────┘               │
│                                ▼                                    │
│                    ┌─────────────────────┐                         │
│                    │   OTel SDK          │                         │
│                    │  ┌───────────────┐  │                         │
│                    │  │   Processor   │  │                         │
│                    │  │   Sampler     │  │                         │
│                    │  │   Exporter    │  │                         │
│                    │  └───────────────┘  │                         │
│                    └──────────┬──────────┘                         │
└───────────────────────────────┼─────────────────────────────────────┘
                                │
                                ▼
                    ┌─────────────────────┐
                    │   OTel Collector    │
                    │  ┌───────────────┐  │
                    │  │   Receivers   │──┼──► OTLP, Jaeger, Zipkin
                    │  │   Processors  │──┼──► Batch, Filter, Transform
                    │  │   Exporters   │──┼──► Jaeger, Prometheus, Loki
                    │  └───────────────┘  │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              ▼                ▼                ▼
      ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
      │   Jaeger    │  │ Prometheus  │  │    Loki     │
      │   (Traces)  │  │  (Metrics)  │  │   (Logs)    │
      └──────┬──────┘  └──────┬──────┘  └──────┬──────┘
             │                │                │
             └────────────────┼────────────────┘
                              ▼
                      ┌─────────────┐
                      │   Grafana   │
                      │ (Dashboard) │
                      └─────────────┘

Estructura del Proyecto

observability-service/
├── src/
│   ├── instrumentation/
│   │   ├── index.ts              # Setup principal OTel
│   │   ├── tracing.ts            # Configuración de traces
│   │   ├── metrics.ts            # Configuración de métricas
│   │   └── logging.ts            # Configuración de logs
│   ├── middleware/
│   │   ├── request-context.ts    # Contexto de solicitud
│   │   ├── metrics.middleware.ts # Métricas HTTP
│   │   └── logging.middleware.ts # Logs estructurados
│   ├── utils/
│   │   ├── trace-context.ts      # Utilidades de trace
│   │   ├── custom-metrics.ts     # Métricas personalizadas
│   │   └── log-formatter.ts      # Formateo de logs
│   ├── exporters/
│   │   ├── jaeger.ts             # Exporter Jaeger
│   │   ├── prometheus.ts         # Exporter Prometheus
│   │   └── loki.ts               # Exporter Loki
│   └── app.ts
├── docker/
│   ├── otel-collector-config.yaml
│   ├── prometheus.yml
│   ├── loki-config.yaml
│   └── grafana/
│       └── dashboards/
│           └── microservices.json
├── docker-compose.observability.yml
└── package.json

Configuración del SDK OpenTelemetry

Instalación

bash

# Core OpenTelemetrynpm install @opentelemetry/api @opentelemetry/sdk-node # Instrumentación automáticanpm install @opentelemetry/auto-instrumentations-node # Exportersnpm install @opentelemetry/exporter-trace-otlp-httpnpm install @opentelemetry/exporter-metrics-otlp-httpnpm install @opentelemetry/exporter-logs-otlp-http # Recursos y semánticanpm install @opentelemetry/resourcesnpm install @opentelemetry/semantic-conventions

Setup Principal

typescript

// src/instrumentation/index.tsimport { NodeSDK } from '@opentelemetry/sdk-node';import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';import { OTLPLogExporter } from '@opentelemetry/exporter-logs-otlp-http';import { Resource } from '@opentelemetry/resources';import {  SEMRESATTRS_SERVICE_NAME,  SEMRESATTRS_SERVICE_VERSION,  SEMRESATTRS_DEPLOYMENT_ENVIRONMENT,} from '@opentelemetry/semantic-conventions';import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';import { BatchLogRecordProcessor } from '@opentelemetry/sdk-logs';import { diag, DiagConsoleLogger, DiagLogLevel } from '@opentelemetry/api'; // Configurar diagnóstico para debuggingif (process.env.OTEL_DEBUG === 'true') {  diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);} // Configuración del recurso (identifica el servicio)const resource = new Resource({  [SEMRESATTRS_SERVICE_NAME]: process.env.SERVICE_NAME || 'unknown-service',  [SEMRESATTRS_SERVICE_VERSION]: process.env.SERVICE_VERSION || '1.0.0',  [SEMRESATTRS_DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development',  'service.instance.id': process.env.HOSTNAME || 'local',  'service.namespace': 'microservices',}); // Configuración de los exportersconst traceExporter = new OTLPTraceExporter({  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces',  headers: {    'x-api-key': process.env.OTEL_API_KEY || '',  },}); const metricExporter = new OTLPMetricExporter({  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/metrics',}); const logExporter = new OTLPLogExporter({  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/logs',}); // Configuración del SDKconst sdk = new NodeSDK({  resource,  traceExporter,  metricReader: new PeriodicExportingMetricReader({    exporter: metricExporter,    exportIntervalMillis: 15000, // Exporta métricas cada 15s  }),  logRecordProcessor: new BatchLogRecordProcessor(logExporter),  instrumentations: [    getNodeAutoInstrumentations({      // Configuración específica por instrumentación      '@opentelemetry/instrumentation-http': {        requestHook: (span, request) => {          span.setAttribute('http.request.id', request.headers['x-request-id'] || '');        },        responseHook: (span, response) => {          span.setAttribute('http.response.content_length',            response.headers['content-length'] || 0);        },        ignoreIncomingRequestHook: (request) => {          // Ignora health checks          return request.url === '/health' || request.url === '/ready';        },      },      '@opentelemetry/instrumentation-express': {        enabled: true,      },      '@opentelemetry/instrumentation-pg': {        enhancedDatabaseReporting: true,      },      '@opentelemetry/instrumentation-redis': {        enabled: true,      },      '@opentelemetry/instrumentation-amqplib': {        enabled: true, // RabbitMQ      },    }),  ],}); // Inicializaciónexport async function initTelemetry(): Promise<void> {  try {    await sdk.start();    console.log('OpenTelemetry initialized successfully');     // Graceful shutdown    process.on('SIGTERM', async () => {      try {        await sdk.shutdown();        console.log('OpenTelemetry shut down successfully');      } catch (error) {        console.error('Error shutting down OpenTelemetry', error);      }    });  } catch (error) {    console.error('Error initializing OpenTelemetry', error);    throw error;  }} export { sdk };

Entry Point de la Aplicación

typescript

// src/index.tsimport { initTelemetry } from './instrumentation'; // ¡IMPORTANTE: Inicializar telemetría primero!async function bootstrap() {  await initTelemetry();   // Ahora importa el resto de la aplicación  const { createApp } = await import('./app');  const app = await createApp();   const port = process.env.PORT || 3000;  app.listen(port, () => {    console.log(`Server running on port ${port}`);  });} bootstrap().catch(console.error);

Distributed Tracing

El tracing distribuido permite seguir una solicitud a través de múltiples servicios.

Conceptos Fundamentales

┌─────────────────────────────────────────────────────────────────┐
│                          TRACE                                  │
│  TraceID: abc123                                                 │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ SPAN: API Gateway (Root Span)                              │ │
│  │ SpanID: span-1, ParentID: null                             │ │
│  │ Duration: 250ms                                             │ │
│  │ ┌────────────────────────────────────────────────────────┐ │ │
│  │ │ SPAN: User Service                                     │ │ │
│  │ │ SpanID: span-2, ParentID: span-1                       │ │ │
│  │ │ Duration: 50ms                                          │ │ │
│  │ └────────────────────────────────────────────────────────┘ │ │
│  │ ┌────────────────────────────────────────────────────────┐ │ │
│  │ │ SPAN: Order Service                                    │ │ │
│  │ │ SpanID: span-3, ParentID: span-1                       │ │ │
│  │ │ Duration: 150ms                                         │ │ │
│  │ │ ┌────────────────────────────────────────────────────┐ │ │ │
│  │ │ │ SPAN: Database Query                               │ │ │ │
│  │ │ │ SpanID: span-4, ParentID: span-3                   │ │ │ │
│  │ │ │ Duration: 45ms                                      │ │ │ │
│  │ │ └────────────────────────────────────────────────────┘ │ │ │
│  │ │ ┌────────────────────────────────────────────────────┐ │ │ │
│  │ │ │ SPAN: RabbitMQ Publish                             │ │ │ │
│  │ │ │ SpanID: span-5, ParentID: span-3                   │ │ │ │
│  │ │ │ Duration: 10ms                                      │ │ │ │
│  │ │ └────────────────────────────────────────────────────┘ │ │ │
│  │ └────────────────────────────────────────────────────────┘ │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Instrumentación Manual de Spans

typescript

// src/utils/trace-context.tsimport { trace, SpanStatusCode, SpanKind, context, propagation } from '@opentelemetry/api';import type { Span, SpanOptions, Context } from '@opentelemetry/api'; const tracer = trace.getTracer('microservice-tracer', '1.0.0'); // Decorator para tracing automáticoexport function Traced(  spanName?: string,  options?: SpanOptions): MethodDecorator {  return function (    target: any,    propertyKey: string | symbol,    descriptor: PropertyDescriptor  ) {    const originalMethod = descriptor.value;    const name = spanName || `${target.constructor.name}.${String(propertyKey)}`;     descriptor.value = async function (...args: any[]) {      return tracer.startActiveSpan(name, options || {}, async (span: Span) => {        try {          // Agrega parámetros como atributos (¡cuidado con datos sensibles!)          span.setAttribute('method.arguments.count', args.length);           const result = await originalMethod.apply(this, args);           span.setStatus({ code: SpanStatusCode.OK });          return result;        } catch (error) {          span.setStatus({            code: SpanStatusCode.ERROR,            message: error instanceof Error ? error.message : 'Unknown error',          });          span.recordException(error as Error);          throw error;        } finally {          span.end();        }      });    };     return descriptor;  };} // Crear span manualmenteexport function createSpan(  name: string,  fn: (span: Span) => Promise<any>,  options?: SpanOptions): Promise<any> {  return tracer.startActiveSpan(name, options || {}, async (span) => {    try {      const result = await fn(span);      span.setStatus({ code: SpanStatusCode.OK });      return result;    } catch (error) {      span.setStatus({        code: SpanStatusCode.ERROR,        message: error instanceof Error ? error.message : 'Unknown error',      });      span.recordException(error as Error);      throw error;    } finally {      span.end();    }  });} // Extraer/inyectar contexto para propagaciónexport function extractContext(headers: Record<string, string>): Context {  return propagation.extract(context.active(), headers);} export function injectContext(headers: Record<string, string>): void {  propagation.inject(context.active(), headers);} // Agregar eventos a un spanexport function addSpanEvent(  eventName: string,  attributes?: Record<string, string | number | boolean>): void {  const span = trace.getActiveSpan();  if (span) {    span.addEvent(eventName, attributes);  }} // Obtener trace ID actualexport function getCurrentTraceId(): string | undefined {  const span = trace.getActiveSpan();  return span?.spanContext().traceId;} // Obtener span ID actualexport function getCurrentSpanId(): string | undefined {  const span = trace.getActiveSpan();  return span?.spanContext().spanId;}

Uso en Services

typescript

// src/services/order.service.tsimport { Traced, createSpan, addSpanEvent } from '../utils/trace-context';import { trace, SpanKind } from '@opentelemetry/api'; export class OrderService {  private readonly tracer = trace.getTracer('order-service');   @Traced('OrderService.createOrder', { kind: SpanKind.INTERNAL })  async createOrder(orderData: CreateOrderDTO): Promise<Order> {    addSpanEvent('order.validation.started');     // Validación    await this.validateOrder(orderData);    addSpanEvent('order.validation.completed');     // Crear span hija para operación específica    const order = await createSpan('order.save', async (span) => {      span.setAttribute('order.items.count', orderData.items.length);      span.setAttribute('order.total', orderData.total);       const savedOrder = await this.orderRepository.save(orderData);       span.setAttribute('order.id', savedOrder.id);      return savedOrder;    });     // Publicar evento    await this.publishOrderCreated(order);     return order;  }   @Traced('OrderService.validateOrder')  private async validateOrder(orderData: CreateOrderDTO): Promise<void> {    // Validación con spans automáticas    await this.validateStock(orderData.items);    await this.validatePayment(orderData.paymentMethod);  }   private async publishOrderCreated(order: Order): Promise<void> {    // Span para mensajería    await createSpan(      'rabbitmq.publish.order_created',      async (span) => {        span.setAttribute('messaging.system', 'rabbitmq');        span.setAttribute('messaging.destination', 'orders.created');        span.setAttribute('messaging.message_id', order.id);         await this.messagePublisher.publish('orders.created', {          orderId: order.id,          timestamp: new Date().toISOString(),        });      },      { kind: SpanKind.PRODUCER }    );  }}

Propagación de Contexto entre Servicios

typescript

// src/middleware/request-context.tsimport { Request, Response, NextFunction } from 'express';import { context, propagation, trace } from '@opentelemetry/api';import { v4 as uuidv4 } from 'uuid'; export interface RequestContext {  traceId: string;  spanId: string;  requestId: string;  userId?: string;  correlationId: string;} declare global {  namespace Express {    interface Request {      context: RequestContext;    }  }} export function requestContextMiddleware(  req: Request,  res: Response,  next: NextFunction): void {  // Extraer contexto de propagación (si existe)  const extractedContext = propagation.extract(context.active(), req.headers);   context.with(extractedContext, () => {    const span = trace.getActiveSpan();    const spanContext = span?.spanContext();     // Crear contexto de la solicitud    req.context = {      traceId: spanContext?.traceId || uuidv4().replace(/-/g, ''),      spanId: spanContext?.spanId || uuidv4().replace(/-/g, '').substring(0, 16),      requestId: req.headers['x-request-id'] as string || uuidv4(),      userId: req.headers['x-user-id'] as string,      correlationId: req.headers['x-correlation-id'] as string || uuidv4(),    };     // Agregar headers de respuesta para debugging    res.setHeader('x-trace-id', req.context.traceId);    res.setHeader('x-request-id', req.context.requestId);     // Agregar atributos al span actual    if (span) {      span.setAttribute('request.id', req.context.requestId);      span.setAttribute('correlation.id', req.context.correlationId);      if (req.context.userId) {        span.setAttribute('user.id', req.context.userId);      }    }     next();  });} // Helper para propagar contexto en llamadas HTTPexport function getTracingHeaders(): Record<string, string> {  const headers: Record<string, string> = {};  propagation.inject(context.active(), headers);  return headers;}

Cliente HTTP con Propagación Automática

typescript

// src/utils/http-client.tsimport axios, { AxiosInstance, AxiosRequestConfig } from 'axios';import { getTracingHeaders, getCurrentTraceId } from './trace-context'; export function createTracedHttpClient(baseURL: string): AxiosInstance {  const client = axios.create({ baseURL });   // Interceptor para agregar headers de tracing  client.interceptors.request.use((config) => {    const tracingHeaders = getTracingHeaders();     config.headers = {      ...config.headers,      ...tracingHeaders,      'x-trace-id': getCurrentTraceId(),    };     return config;  });   // Interceptor para logging de errores  client.interceptors.response.use(    (response) => response,    (error) => {      const traceId = getCurrentTraceId();      console.error(`HTTP Error [trace: ${traceId}]:`, {        url: error.config?.url,        method: error.config?.method,        status: error.response?.status,        message: error.message,      });      throw error;    }  );   return client;}

Métricas Personalizadas

Tipos de Métricas

typescript

// src/instrumentation/metrics.tsimport { metrics, ValueType } from '@opentelemetry/api'; const meter = metrics.getMeter('microservice-metrics', '1.0.0'); // Counter - valores que solo aumentanexport const httpRequestsTotal = meter.createCounter('http_requests_total', {  description: 'Total number of HTTP requests',  unit: '1',}); // UpDownCounter - valores que pueden aumentar o disminuirexport const activeConnections = meter.createUpDownCounter('active_connections', {  description: 'Number of active connections',  unit: '1',}); // Histogram - distribución de valoresexport const httpRequestDuration = meter.createHistogram('http_request_duration_seconds', {  description: 'Duration of HTTP requests in seconds',  unit: 's',  advice: {    explicitBucketBoundaries: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],  },}); // Observable Gauge - valor actual que es observadoexport const memoryUsage = meter.createObservableGauge('process_memory_bytes', {  description: 'Process memory usage in bytes',  unit: 'By',}); memoryUsage.addCallback((result) => {  const usage = process.memoryUsage();  result.observe(usage.heapUsed, { type: 'heap_used' });  result.observe(usage.heapTotal, { type: 'heap_total' });  result.observe(usage.rss, { type: 'rss' });  result.observe(usage.external, { type: 'external' });}); // Observable Counter - contador observableexport const cpuUsage = meter.createObservableCounter('process_cpu_seconds_total', {  description: 'Total CPU time spent in seconds',  unit: 's',}); let previousCpuUsage = process.cpuUsage();cpuUsage.addCallback((result) => {  const currentCpuUsage = process.cpuUsage(previousCpuUsage);  result.observe((currentCpuUsage.user + currentCpuUsage.system) / 1e6, {});  previousCpuUsage = process.cpuUsage();});

Métricas de Negocio

typescript

// src/utils/business-metrics.tsimport { metrics } from '@opentelemetry/api'; const meter = metrics.getMeter('business-metrics', '1.0.0'); // Métricas de pedidosexport const ordersCreated = meter.createCounter('orders_created_total', {  description: 'Total orders created',}); export const orderValue = meter.createHistogram('order_value_dollars', {  description: 'Order value distribution',  unit: 'USD',  advice: {    explicitBucketBoundaries: [10, 25, 50, 100, 250, 500, 1000, 2500, 5000],  },}); export const orderProcessingTime = meter.createHistogram('order_processing_duration_seconds', {  description: 'Time to process an order',  unit: 's',}); // Métricas de usuariosexport const activeUsers = meter.createUpDownCounter('active_users', {  description: 'Number of currently active users',}); export const userRegistrations = meter.createCounter('user_registrations_total', {  description: 'Total user registrations',}); // Métricas de stockexport const stockLevel = meter.createObservableGauge('stock_level', {  description: 'Current stock level by product',}); // Métricas de pagoexport const paymentAttempts = meter.createCounter('payment_attempts_total', {  description: 'Total payment attempts',}); export const paymentAmount = meter.createHistogram('payment_amount_dollars', {  description: 'Payment amount distribution',  unit: 'USD',}); // Helper para registrar métricas de pedidoexport function recordOrderMetrics(order: {  id: string;  total: number;  items: number;  processingTimeMs: number;  paymentMethod: string;  region: string;}) {  const labels = {    payment_method: order.paymentMethod,    region: order.region,  };   ordersCreated.add(1, labels);  orderValue.record(order.total, labels);  orderProcessingTime.record(order.processingTimeMs / 1000, labels);}

Stack de Observabilidad Completa

Docker Compose

yaml

# docker-compose.observability.ymlversion: '3.8' services:  # OpenTelemetry Collector  otel-collector:    image: otel/opentelemetry-collector-contrib:0.91.0    container_name: otel-collector    command: ["--config=/etc/otel-collector-config.yaml"]    volumes:      - ./docker/otel-collector-config.yaml:/etc/otel-collector-config.yaml    ports:      - "4317:4317"   # OTLP gRPC      - "4318:4318"   # OTLP HTTP      - "8888:8888"   # Prometheus metrics exposed by the collector      - "8889:8889"   # Prometheus exporter metrics      - "13133:13133" # Health check      - "55679:55679" # zPages    depends_on:      - jaeger      - prometheus      - loki    networks:      - observability   # Jaeger - Distributed Tracing  jaeger:    image: jaegertracing/all-in-one:1.52    container_name: jaeger    ports:      - "16686:16686" # UI      - "14268:14268" # HTTP collector      - "14250:14250" # gRPC collector    environment:      - COLLECTOR_OTLP_ENABLED=true    networks:      - observability   # Prometheus - Metrics  prometheus:    image: prom/prometheus:v2.48.0    container_name: prometheus    volumes:      - ./docker/prometheus.yml:/etc/prometheus/prometheus.yml    ports:      - "9090:9090"    networks:      - observability   # Loki - Log Aggregation  loki:    image: grafana/loki:2.9.2    container_name: loki    ports:      - "3100:3100"    networks:      - observability   # Grafana - Visualization  grafana:    image: grafana/grafana:10.2.2    container_name: grafana    environment:      - GF_SECURITY_ADMIN_USER=admin      - GF_SECURITY_ADMIN_PASSWORD=admin123    ports:      - "3001:3000"    depends_on:      - prometheus      - loki      - jaeger    networks:      - observability networks:  observability:    driver: bridge

Checklist de Producción

Instrumentación

OpenTelemetry SDK configurado antes de otros imports
Auto-instrumentación habilitada para HTTP, base de datos, mensajería
Spans personalizados para operaciones de negocio críticas
Atributos relevantes agregados a los spans
Errores capturados y registrados correctamente

Métricas

Métricas RED para todos los endpoints
Métricas de negocio definidas
Histogramas con buckets apropiados
Labels consistentes entre servicios
Cardinalidad de labels controlada

Logs

Formato estructurado (JSON)
Correlación con trace ID
Niveles de log apropiados
Datos sensibles enmascarados
Rotación y retención configuradas

Alertas

Infraestructura

Collector con alta disponibilidad
Retención de datos adecuada
Backup de configuraciones
Sampling configurado para volumen
Recursos adecuados para stack

Conclusión

La observabilidad es la base para operar microservicios en producción con confianza. Los puntos clave son:

Tres Pilares: Traces, métricas y logs trabajan juntos para dar visibilidad completa
OpenTelemetry: Estándar vendor-neutral que simplifica la instrumentación
Correlación: Trace ID conecta logs, métricas y traces de una misma solicitud
SLOs: Define objetivos claros y monitorea error budgets
Alertas Inteligentes: Alerta sobre síntomas, no causas

Con esta serie completa, tienes todas las herramientas para construir microservicios robustos:

Arquitectura de Microservicios - Fundamentos y patrones
API Gateway con Kong - Gestión de tráfico
Mensajería con RabbitMQ - Comunicación asíncrona
Observabilidad con OpenTelemetry (este artículo) - Monitoreo y debugging

¿Por qué Observabilidad?

Los Tres Pilares de la Observabilidad

OpenTelemetry: El Estándar de la Industria

Arquitectura OpenTelemetry

Estructura del Proyecto

Configuración del SDK OpenTelemetry

Instalación

Setup Principal

Entry Point de la Aplicación

Distributed Tracing

Conceptos Fundamentales

Instrumentación Manual de Spans

Uso en Services

Propagación de Contexto entre Servicios

Cliente HTTP con Propagación Automática

Métricas Personalizadas

Tipos de Métricas

Métricas de Negocio

Stack de Observabilidad Completa

Docker Compose

Checklist de Producción

Instrumentación

Métricas

Logs

Alertas

Infraestructura

Conclusión

Pedro Farbo