Your application is live. It works. Then, without warning, it starts slowing down. Some users complain. The server looks fine, the database responds, but the checkout takes 8 seconds. You open the logs: hundreds of lines, all mixed, no structure. You have some CPU alerts, but they tell you nothing useful. You have monitoring, but not observability.
We at Meteora Web see this often. A client installed Nagios, set RAM thresholds, and thinks they're covered. Then the first bottleneck hits and they can't trace where it starts. Observability isn't a tool: it's an approach. It stands on three pillars. If you miss one, you only see half the problem.
This guide explains what logs, metrics, and traces are, why you need all three, and how to integrate them into a real application. No abstract theory: PHP examples with Prometheus and OpenTelemetry that you can use today.
The Three Pillars of Observability: Why a Ping Isn't Enough
Observability is the ability to understand a system's internal state from the data it produces. It doesn't just mean knowing if the server is alive; it's about answering questions like why is this request slow? or which service is dropping packets?
Three types of data provide the answer:
- Logs: textual records of discrete events.
- Metrics: numeric aggregations over time.
- Traces: complete paths of a request through services.
Each answers a different question. Using them together is like having a diary, a report card, and a map all at once.
Logs: The Event Chronicle
A log tells what happened and when. Each line is an event: an error, a login, a payment. But without structure, logs are noise.
Common mistake: writing informal messages like "Error in login". It helps a human, but not an analysis system. Use a structured format (JSON, key-value) that allows filtering and aggregation.
Practical example (PHP with Monolog):
<?php
use Monolog\Logger;
use Monolog\Handler\StreamHandler;
use Monolog\Formatter\JsonFormatter;
$log = new Logger('app');
$handler = new StreamHandler('/var/log/app.log', Logger::WARNING);
$handler->setFormatter(new JsonFormatter());
$log->pushHandler($handler);
$log->error('Payment failed', [
'order_id' => 12345,
'user_id' => 678,
'amount' => 49.99,
'gateway' => 'stripe',
'error_code' => 'card_declined'
]);
?>Now you can search all payment errors by gateway or amount, even with tools like Grafana Loki.
Metrics: The Numbers That Matter
A metric is a number measured at an instant: requests per second, memory used, average latency. It tells how much and trend.
Metrics are aggregatable (count, sum, avg, percentiles) and have a timestamp. They drive alert thresholds and reveal patterns over time.
Practical example (Prometheus client in PHP):
<?php
use Prometheus\CollectorRegistry;
use Prometheus\Storage\InMemory;
use Prometheus\RenderTextFormat;
$registry = new CollectorRegistry(new InMemory());
$counter = $registry->registerCounter(
'app',
'http_requests_total',
'Total number of HTTP requests',
['method', 'endpoint']
);
$counter->incBy(1, ['GET', '/api/orders']);
// Expose /metrics endpoint for Prometheus
$renderer = new RenderTextFormat();
echo $renderer->render($registry->getMetricFamilySamples());
?>Prometheus scrapes the endpoint every 15 seconds. With Grafana you build dashboards and alerts. The question answered: is the system getting worse?
Traces: The Request Journey
A trace follows a request through all involved services. Each step is a span with its duration. It tells where time is lost.
Without traces, in a microservices architecture you can't tell if the bottleneck is in the gateway, service A, or the database.
Practical example (OpenTelemetry in PHP):
<?php
use OpenTelemetry\API\Globals;
use OpenTelemetry\API\Trace\SpanKind;
use OpenTelemetry\SDK\Trace\TracerProviderFactory;
$tracer = Globals::tracerProvider()->getTracer('app');
$span = $tracer->spanBuilder('process-order')
->setSpanKind(SpanKind::KIND_INTERNAL)
->startSpan();
try {
// business logic
$span->setAttribute('order_id', 12345);
usleep(50000); // simulate work
} finally {
$span->end();
}
?>Export spans to Jaeger or Zipkin. You see the request spends 200ms in gateway, 150ms in order service, 5ms in database. The bottleneck is the gateway, not the DB.
Integrating the Three Pillars in a Real Application
It's not enough to install three libraries. They must talk to each other. For example, in a structured log you can include a trace ID and a correlation ID to tie events to a specific request. Then, when you see a latency anomaly in a metric, you can open the corresponding trace and search the logs for errors.
A practical pattern: use OpenTelemetry to generate traces, pass the trace ID to the logger, and enrich metrics with tags (e.g., trace_id if supported).
<?php
$span = OpenTelemetry\API\Globals::tracerProvider()->getTracer('app')->spanBuilder('handle-request')->startSpan();
$traceId = $span->getContext()->getTraceId();
$log->info('Request received', ['trace_id' => $traceId]);
// ...
?>Common Mistakes and How to Avoid Them
- Unstructured logs: if you use
echo "Error"orsyslogwithout a format, analysis tools can't process them. Solution: adopt JSON right away. - Metrics without context: a metric
http_requests_totalwithout labels (method, endpoint) doesn't tell which path is slow. Always add labels. - Tracing only the main service: if you don't propagate context (e.g., via HTTP headers), traces are fragmented. Use automatic propagation middleware (OpenTelemetry has packages for Laravel, Symfony, etc.).
- Ignoring costs: collecting everything costs resources (CPU, storage, transfer). We at Meteora Web recommend starting with what's critical: payment endpoints, login, checkout. Then expand.
In Summary — What to Do Now
- Structure your logs: switch to JSON with Monolog or any logger that supports formatters. Include context (user ID, order ID, trace ID).
- Add key metrics: install Prometheus and record at least latency (p50/p90/p99), error rate, and throughput for your main APIs.
- Trace critical requests: integrate OpenTelemetry in your backend and send traces to a compatible backend (Jaeger, Tempo). Propagate context across microservices.
- Create a unified dashboard: on Grafana, connect logs (Loki), metrics (Prometheus), and traces (Tempo). Use correlation to jump from a latency chart to a trace and then to logs.
- Measure the cost: track log and metric storage. Set retention policies. Don't collect data you won't use.
Observability is not a one-time project: it's a habit. Every new route, every new service should produce the three pillars from day zero. If you want to dive deeper into managing structured data in production, check our guide on MongoDB vs SQL. For now, start today with a health check endpoint that exposes metrics and traces: you'll see the difference at the first incident.
Sponsored Protocol