RabbitMQ default checks and policies
Summarize
Summary of RabbitMQ default checks and policies
This content describes the default health monitoring checks and policies for RabbitMQ provided by the Agent Client Collector in the Yokohama release (updated January 30, 2025). These checks are designed for use in a Windows environment and require RabbitMQ discovery to be performed beforehand. They enable ServiceNow customers to monitor RabbitMQ server health, cluster status, consumer and message metrics, network partitions, node status, queue synchronization, and protocol responsiveness.
Show less
Default Checks and Events
The default event checks cover various critical health indicators of RabbitMQ servers:
- Server Availability: Checks if the RabbitMQ server is alive using the REST API, triggering alerts if down.
- Cluster Health: Monitors whether cluster nodes are running and alerts if nodes are down.
- Consumers and Messages: Tracks the number of consumers and queued messages, alerting based on configured thresholds.
- Network Partitions: Detects network partition events and triggers alerts accordingly.
- Node Health and Usage: Monitors node running state, resource usage (memory, file descriptors, sockets, processes, disk), and allows thresholds for warnings and critical alerts.
- Queue Drain Time: Estimates time to drain queues based on message exit rates, alerting if critical thresholds are exceeded.
- Queue Synchronization: Verifies synchronization status of mirrored queues with secondary queues.
- Protocol Responsiveness: Checks if RabbitMQ responds to STOMP protocol requests.
Each check is executed via specific scripts with parameters for host, port, virtual host, queue, and customizable warning/critical thresholds to tailor monitoring sensitivity.
Default Metrics
Two primary metric checks provide detailed statistical data:
- Overview Metrics: General RabbitMQ server statistics for performance monitoring.
- Queue Metrics: Detailed metrics on individual queues, optionally filtered by virtual host.
These metrics use parameterized commands that allow specifying host, port, and vhost to retrieve targeted data.
Practical Benefits for ServiceNow Customers
By leveraging these default RabbitMQ checks and policies, customers can:
- Gain comprehensive visibility into RabbitMQ server health and performance within their Windows environments.
- Receive proactive alerts on critical issues like server downtime, cluster failures, resource exhaustion, and messaging bottlenecks.
- Customize alert thresholds to align monitoring with operational requirements.
- Ensure reliable message processing through monitoring of queue synchronization and drain times.
- Monitor protocol-level responsiveness (STOMP) to confirm service availability.
Implementing these checks supports maintaining high availability and performance of RabbitMQ messaging infrastructure managed via ServiceNow.
Agent Client Collector provides the following default checks and policies for RabbitMQ health monitoring. You must perform RabbitMQ discovery before executing the checks. RabbitMQ checks are available only in a Windows environment.
| Type | Check | Description | Command |
|---|---|---|---|
| Event | check-rabbitmq-alive | Verifies whether the RabbitMQ server is alive, using the REST API. If the server is down, an alert triggers. | check-rabbitmq-alive.rb --host {{.labels.params_host}} --port
{{.labels.params_port}} -v {{.labels.params_vhost}} |
| Event | check-rabbitmq-cluster-health | Verifies whether the RabbitMQ server's cluster nodes are running. If the nodes are down, an alert triggers. | check-rabbitmq-cluster-health.rb --host {{.labels.params_host}} --port
{{.labels.params_port}} |
| Event | check-rabbitmq-consumers | Verifies the number of consumers on the RabbitMQ server and triggers an alert based on the configured threshold. | check-rabbitmq-consumers.rb {{if .labels.params_warn}} --warn
{{.labels.params_warn}} {{end}} {{if .labels.params_critical}} --critical
{{.labels.params_critical}} {{end}} --host {{.labels.params_host}} --port
{{.labels.params_port}} |
| Event | check-rabbitmq-messages | Verifies the total number of messages queued on the RabbitMQ server and triggers an alert based on the threshold. | check-rabbitmq-messages.rb --critical {{.labels.params_critical}} --port
{{.labels.params_port}} --warn {{.labels.params_warn}} --host
{{.labels.params_host}} |
| Event | check-rabbitmq-network-partitions | Verifies whether the RabbitMQ network partition has occurred and triggers an alert based on the threshold. | check-rabbitmq-network-partitions.rb --host {{.labels.params_host}} --port {{.labels.params_port}} |
| Event | check-rabbitmq-node-health | Verifies whether the RabbitMQ server node is in a running state. | |
| Event | check-rabbitmq-node-usage | Checks and displays usage of the RabbitMQ server node. | |
| Event | check-rabbitmq-queue-drain-time | Verifies the time it will take for each queue on the RabbitMQ server to drain,
based on the current message exit rate. For example, if a queue has 1,000 messages in it but only 1 message exits per second, an alert generates because the default critical level of 360 seconds has been exceeded. |
check-rabbitmq-queue-drain-time.rb --host {{.labels.params_host}} --port
{{.labels.params_port}} --warn {{.labels.params_warn}} --critical
{{.labels.params_critical}} |
| Event | check-rabbitmq-queues-synchronised | Verifies that all mirrored queues with secondary queues are synchronised. | check-rabbitmq-queues-synchronised.rb --host {{.labels.params_host}}
--port {{.labels.params_port}} |
| Event | check-rabbitmq-stomp-alive | Verifies whether the RabbitMQ server is alive and responding to STOMP. | check-rabbitmq-stomp-alive.rb --host {{.labels.params_host}} --queue
{{.labels.params_queue}} --port {{.labels.params_port}} |
| Type | Check | Description | Command |
|---|---|---|---|
| Metric | metrics-rabbitmq-overview | Provides RabbitMQ overview statistics. | metrics-rabbitmq-overview.rb --port {{.labels.params_port}} --host
{{.labels.params_host}} |
| Metric | metrics-rabbitmq-queue | Provides RabbitMQ metrics per queue. | metrics-rabbitmq-queue.rb --port {{.labels.params_port}} --host
{{.labels.params_host}} {{if .labels.params_vhost}} --vhost {{.labels.params_vhost}}
{{end}} |