Your application is in production, logs flow into Prometheus or InfluxDB, but when you open Grafana you see an empty dashboard and don't know where to start. Or you already have some panels but alerts never fire, or worse, they fire randomly in the middle of the night. Sound familiar? It does to us, and more than once.
We, at Meteora Web, have been working with monitoring since 2017: servers, applications, e-commerce — everything goes through a well-built dashboard. And we've seen that 90% of Grafana installations are underutilized: set up a data source, throw in a random chart, and that's it. But Grafana is much more. In this guide we cover how to configure data sources solidly, build panels that actually tell you something (not just colored lines), and set up alerting that truly works — no false alarms.
This is an operational guide. Grab a terminal, open Grafana, and follow along. Let's start.
Why Grafana? And why you need a strategy, not just a chart
Grafana is the universal frontend for your metrics. No matter if you use Prometheus, InfluxDB, MySQL, Elasticsearch, or CloudWatch: a well-crafted panel turns raw numbers into decisions. But if you don’t understand how datasources and queries work, you only get noise.
We always start with a question: “What symptom do I want to see?” High latency? HTTP errors? CPU load? The answer decides which datasource and panel type to use. And remember: a dashboard without alerting is a dashboard without airbags. When the server goes down, you don’t have time to stare at graphs.
Sponsored Protocol
Datasources: connecting data to Grafana
Grafana talks to your sources via datasource plugins. The most common: Prometheus, InfluxDB, Loki (for logs), MySQL/PostgreSQL, and Elasticsearch. Each datasource has its own query syntax. Here we cover the two we use most in production: Prometheus and InfluxDB.
Configuring a Prometheus datasource
Assume Prometheus is running at http://prometheus.internal:9090. In Grafana:
- Go to Configuration > Data Sources > Add data source.
- Choose Prometheus.
- In URL enter the endpoint:
http://prometheus.internal:9090. - Set Access = Server (default) if Grafana can reach Prometheus.
- Leave Scrape interval = 15s (or as configured in Prometheus).
- Click Save & Test — you should see a green message.
Common pitfall: if Grafana and Prometheus are on different machines, check firewalls. Never use localhost if Grafana is in a container and Prometheus on the host. We’ve fixed cases where the client used http://localhost:9090 from the container — it never works.
Configuring an InfluxDB datasource (Flux)
InfluxDB 2.x uses Flux language. Here is a basic configuration:
Sponsored Protocol
URL: http://influxdb.internal:8086
Organization: myorg
Token: your-admin-token
Default Bucket: app-metrics
Then under Query Language choose Flux. Test the connection. If you want to use InfluxQL (legacy), you need to enable mapping in InfluxDB. We prefer Flux because it’s more powerful for temporal aggregations.
Common error: query timeout
With large datasets (millions of time series), Grafana may timeout. Increase the timeout in the datasource (e.g., 60s) and optimize queries with wider steps (e.g., [5m] instead of [15s] on long intervals).
Panels: turning metrics into decisions
Every panel in Grafana has a type (Time series, Stat, Gauge, Table, Bar gauge, etc.) and a query. The right type depends on the data: a time line for trends, a Stat for a single value, a Gauge for a threshold.
Time series: the daily bread of monitoring
For average CPU of a cluster:
avg(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance)
Configure the panel: Visualization = Time series, Legend shows {{instance}}. Add Thresholds: for example a line at 0.8 (80%) that turns orange. With the new Time series panel you can also add Transformations to calculate moving averages.
Operational tip: don’t put all series in one panel. Use Repeated panels with the variable $instance to have one panel per host. We see it often: a single graph with 50 unreadable lines. We always do this: one row per host, section by section.
Sponsored Protocol
Stat: the number that matters
To show the number of HTTP 5xx requests in the last hour:
sum(increase(http_requests_total{status=~"5.."}[1h]))
Visualization: Stat. Color mode = background, Thresholds: base green, above 10 orange, above 50 red. Add Sparkline to see the trend without taking much space.
Gauge: speed at a glance
For disk usage percentage:
100 - (avg(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100)
Panel Gauge, Max = 100, Thresholds: 0-80 green, 80-95 yellow, 95-100 red.
Table: granular details
To show top error endpoints:
topk(10, sum by (handler) (rate(http_requests_total{status=~"5.."}[5m])))
Panel Table, sort by value descending. Add Cell display mode to color cells.
Alerting: don’t get woken up for nothing
Grafana’s unified alerting system lets you create rules based on queries, with severity, notifications, and silences. It’s powerful but needs careful setup. Three principles:
- Validate before alerting: use
forto avoid false positives (e.g., 5 minutes of persistence). - Alert on symptoms, not causes: alert on high 95th percentile latency, not on “CPU load” (which can be normal).
- Smart routing: email for critical, Slack for warnings, Telegram for info.
Creating an alert rule in Grafana (PromQL)
Suppose we want an alert when average request latency exceeds 500ms for more than 5 minutes.
Sponsored Protocol
- Go to Alerting > Alert rules > New alert rule.
- Rule name: High API latency.
- Select a Prometheus datasource and write the query:
avg(rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])) > 0.5
- Condition: A (query) > 0.5 (value).
- For: 5m (wait 5 minutes of persistence).
- Severity: Critical.
- Notifications: Choose a channel (already configured in Contact points).
- Click Save.
Common mistake: forgetting the for causes alerts that fire on a single spike and immediately resolve — useless and noisy. We always set at least 3-5 minutes for non-critical metrics, 1 minute for immediate criticalities (e.g., server down).
Setting up Slack notifications
- Go to Alerting > Contact points > Add contact point.
- Type: Slack.
- Enter the Webhook URL (from Slack: Apps > Incoming Webhooks).
- Template: you can customize the message. Example:
{
"text": "{{ .Message }}\n{{ range .Alerts }}\n{{ .Annotations.summary }}\n{{ end }}"
}
- Configure Notification policies to route rules to the correct contact point.
Best practices for effective dashboards
After years building dashboards for clients (and for ourselves), we have a checklist:
Sponsored Protocol
- One dashboard per purpose: infrastructure, application, business. Don’t mix.
- Variables: use
$env,$instance,$datacenterto filter without duplicating panels. - Logical order: top to bottom — summary (stat/overview), details (time series), tables.
- JSON template: export the dashboard as JSON and version it in Git. We do this in a
monitoring/grafana-dashboardsrepo. - Public libraries: use official Grafana Labs dashboards for Prometheus Node Exporter or Redis. Then customize.
In summary — what to do now
- Check your datasources: verify Grafana can reach Prometheus/InfluxDB. If you have timeouts, increase the timeout or reduce query granularity.
- Replace useless panels with the right ones: use Stat for KPIs, Time series for trends, Gauge for thresholds.
- Configure at least one alert rule with
forand a Slack or email notification. Start simple: service uptime. - Build a dashboard with variables for environment and host. Export it as JSON and put it under version control.
- Read the official Grafana Alerting documentation to deep dive into silences and mute timings: Grafana Alerting Docs.
And if you want to see how we integrate Grafana into a complete monitoring and observability architecture, check out our pillar guide on monitoring (English version available).