Prometheus from scratch: scraping, PromQL and alerting — Hands-on guide • Meteora Web Agency

Your server crashes at 3 AM and you find out at 9. Your e-commerce loses conversions for hours and you don't know it. That's the problem Prometheus solves: not a pretty dashboard, but a system that collects metrics, queries them in real time, and alerts you before damage is done. At Meteora Web, we've integrated it into several production architectures for clients with demanding stacks. And it's exactly what you need to stop chasing problems and start preventing them.

Why Prometheus instead of another tool?

Prometheus is an open-source monitoring and alerting system born at SoundCloud and now part of CNCF. Its strength is the pull model: Prometheus itself periodically scrapes targets (exporters, applications). No need to install agents on every machine that push data; Prometheus decides what and when to sample. Metrics are labeled, making queries extremely flexible. Why should you care? If you run Kubernetes, Prometheus is the de facto standard. Even without K8s, for any Linux stack, web apps or APIs, Prometheus is the most solid choice for those who want control and scalability, not a black-box SaaS with costs that grow with volume.

Common initial mistakes

Thinking Prometheus is a dashboard (it's not; Grafana is).
Not understanding the pull model and trying to configure push (Pushgateway exists for edge cases, not as a norm).
Skimping on label design: too many or too few labels make queries useless.

Getting started: installation and first scrape

Download the latest release from the official site. Basic config is a YAML file prometheus.yml.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Launch with ./prometheus --config.file=prometheus.yml. Visit http://localhost:9090. You already have data: Prometheus exposes its own metrics. This is the first step toward real monitoring.

Immediate action

Go to /targets in the Prometheus UI and verify the state is UP. If yes, you have a working system. Now add a real target.

Scraping with exporters: real examples

Prometheus doesn't natively understand every format; it needs an exporter that exposes metrics in its text-based format. For Linux the most common is node_exporter. Download, extract, and run:

wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-*.tar.gz
tar xvf node_exporter-*.tar.gz
cd node_exporter-*
./node_exporter

Now node_exporter listens on port 9100. Add it to your prometheus.yml:

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

Restart Prometheus or send SIGHUP. Within a minute you'll have metrics like node_cpu_seconds_total, node_memory_MemAvailable_bytes. This is the baseline: every service (MySQL, Nginx, PostgreSQL, Redis) has its own exporter. Find them on the Prometheus website or GitHub.

Custom metrics from your application

If you develop in Python, Go, Java, you can expose metrics directly using client libraries. Example in Python with prometheus_client:

from prometheus_client import start_http_server, Counter
import random
import time

c = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])

if __name__ == '__main__':
    start_http_server(8000)
    while True:
        c.labels(method='GET', endpoint='/api/orders').inc()
        time.sleep(0.5)

Configure a scrape on localhost:8000. Now every request to your app is counted. This allows you to monitor business metrics: orders, page views, errors.

PromQL: the query language

PromQL (Prometheus Query Language) is the analytical core. No SQL database needed; it's based on vector expressions and time series. Essential concepts:

Metrics and labels

A metric is a name with a set of labels. Example: http_requests_total{method="GET", endpoint="/home"}. The unique identifier is the combination name+labels. PromQL operates on time series.

Basic operations

node_memory_MemAvailable_bytes returns the instantaneous value for each series.
rate(node_cpu_seconds_total[5m]) computes per-second average over a 5-minute window. Used for CPU, requests per second.
irate for rapid changes but noisy.
increase(node_network_receive_bytes_total[1h]) absolute increase over 1 hour.

Filter and aggregate

node_cpu_seconds_total{job="node",mode="user"} filters by label. To aggregate: sum by (instance) (rate(node_cpu_seconds_total[5m])).

Practical example: CPU alert query

This query returns average CPU usage per instance over the last 5 minutes:

100 - (avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

If used in an alert rule, it will fire when above 80%.

Alerting with Alertmanager

Collecting metrics is useless if you don't act. Prometheus evaluates alerting rules and sends events to Alertmanager, which handles deduplication, grouping, and routing to channels (email, Slack, PagerDuty, Telegram).

Alerting rules

Create a file rules.yml:

groups:
  - name: node_alerts
    rules:
      - alert: HighCpuUsage
        expr: 100 - (avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU usage at {{ $value }}% for over 5 minutes."

Add rule_files: ['rules.yml'] to prometheus.yml. Restart.

Configure Alertmanager

Download alertmanager, configure e.g. to send to Slack via webhook. Here's an excerpt of alertmanager.yml:

global:
  slack_api_url: 'https://hooks.slack.com/services/...'

route:
  receiver: 'slack'
receivers:
  - name: 'slack'
    slack_configs:
      - channel: '#alerts'
        title: '{{ .GroupLabels.alertname }}'
        text: '{{ .CommonAnnotations.description }}'

Start alertmanager (port 9093) and configure Prometheus to send alerts: in prometheus.yml add alerting: { alertmanagers: [ - static_configs: [ targets: ['localhost:9093'] ] ] }. Now your alerts arrive in Slack.

Best practices and security

We often see misconfigured setups: sensitive metrics exposed, too many alerts, no retention policy. Here's what we do in real projects:

Retention: default 15 days. Set --storage.tsdb.retention.time=30d for longer archives.
Access: Prometheus has no native authentication. Put a reverse proxy (Nginx) with basic auth or use a third-party proxy.
Label cardinality: avoid labels with unique values per request (e.g. user_id). Prometheus will explode.
Job name: use descriptive names, not job1.
Alert fatigue: use for: 5m to avoid false alarms. Separate channels by severity.

What to do now

Install Prometheus and node_exporter on at least one server or locally. Check targets.
Write two alert rules: one for CPU, one for disk space (node_filesystem_avail_bytes).
Connect Alertmanager to a Slack (or Telegram) channel and verify the notification arrives.
Explore PromQL: use the Prometheus console to create a graph of HTTP requests per minute by endpoint.
Schedule a monthly check of label cardinality to avoid overflow.

If you need support on complex architectures or custom integrations, we at Meteora Web accompany businesses from initial setup to enterprise monitoring. Your site deserves to be always responsive — it's not just about technology, it's your revenue.