Nagios Production Monitoring: Setup, Configuration and Best Practices
Why Nagios
Nagios Core remains one of the most widely deployed monitoring frameworks in production environments. Its plugin architecture, passive check support, and distributed monitoring capabilities make it a solid choice for organizations that need full control over their monitoring stack without vendor lock-in.
Core Components
A typical Nagios deployment consists of:
- Nagios Core — the monitoring engine that schedules checks, processes results, and triggers alerts
- NRPE (Nagios Remote Plugin Executor) — runs checks on remote hosts and returns results
- NSCA (Nagios Service Check Acceptor) — receives passive check results from remote systems
- Check plugins — the actual scripts that perform checks (disk, CPU, load, processes, custom)
Basic Server Setup
Install Nagios Core and the standard plugins:
apt install nagios4 nagios-plugins-contrib nagios-nrpe-plugin
The main configuration lives in /etc/nagios4/:
| File | Purpose |
|---|---|
nagios.cfg | Main daemon configuration |
conf.d/hosts.cfg | Host definitions |
conf.d/services.cfg | Service definitions |
conf.d/contacts.cfg | Alert recipients |
objects/commands.cfg | Check command definitions |
Adding Hosts and Services
Define each monitored server in conf.d/hosts.cfg:
define host {
use linux-server
host_name web-01
alias Production Web Server
address 10.0.1.10
check_command check-host-alive
max_check_attempts 3
notification_interval 120
}
Define which services to check on that host in conf.d/services.cfg:
define service {
use generic-service
host_name web-01
service_description HTTP Response
check_command check_http!-H 10.0.1.10 -u /health -w 3 -c 5
check_interval 5
retry_interval 1
}
Alert Configuration
Nagios sends notifications via the notify-host-by-email and notify-service-by-email commands. Configure recipients in contacts.cfg:
define contact {
contact_name ops-team
alias Operations Team
email ops@example.com
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
}
For Slack integration, replace the notification command with a webhook POST:
#!/bin/bash
# /usr/lib/nagios/plugins/notify-slack
WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK"
MESSAGE="Nagios: $NOTIFICATIONTYPE - $HOSTALIAS/$SERVICEDESC is $SERVICESTATE"
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$MESSAGE\"}" "$WEBHOOK_URL"
Distributed Monitoring with NSCA
For large environments, use a distributed setup: satellite Nagios instances perform local checks and forward results to a central server via NSCA.
On the central server (receiver):
apt install nsca
# configure /etc/nsca.cfg with server_address and password
On each remote satellite:
# After local check completes, send result to central
/usr/bin/send_nsca -H central.example.com -c /etc/send_nsca.cfg <<EOF
web-01\tHTTP Response\t0\tHTTP OK - 0.142s response
EOF
Best Practices
- Check intervals: Use 5-minute intervals for standard checks, 1-minute for critical services. Avoid sub-minute checks unless absolutely necessary.
- Max check attempts: Set to 3 minimum to prevent flapping from triggering alerts.
- Dependencies: Define service dependencies so downstream failures don't cascade (e.g., if a database is down, don't alert on the app that depends on it).
- Passive checks: Prefer passive checks for services that are expensive to probe (log analysis, complex application health).
- Configuration management: Store Nagios configs in Git and deploy with Ansible or similar tool.
- Log rotation: Monitor Nagios log size; spool files can grow quickly in large deployments.