RabbitMQ default checks and policies
Summarize
Summary of RabbitMQ Default Checks and Policies
ServiceNow’s Agent Client Collector offers a set of default checks and policies designed specifically for monitoring RabbitMQ health and performance within Windows environments. Before running these checks, customers must perform RabbitMQ discovery. These checks enable proactive monitoring by verifying server availability, cluster health, consumer counts, message queues, network partitions, node status, usage, queue synchronization, and protocol responsiveness.
Show less
Default Checks and Their Purposes
- check-rabbitmq-alive: Confirms if the RabbitMQ server is up using its REST API and triggers alerts if down.
- check-rabbitmq-cluster-health: Verifies operational status of RabbitMQ cluster nodes and alerts if any node is down.
- check-rabbitmq-consumers: Monitors the number of consumers and alerts based on user-defined warning and critical thresholds.
- check-rabbitmq-messages: Tracks total messages queued and triggers alerts when thresholds are exceeded.
- check-rabbitmq-network-partitions: Detects network partitions in the RabbitMQ cluster and raises alerts accordingly.
- check-rabbitmq-node-health: Checks if RabbitMQ server nodes are running and monitors various resource parameters such as memory, sockets, file descriptors, and alarms with configurable thresholds.
- check-rabbitmq-node-usage: Reports detailed node usage statistics including processor, disk, socket, and memory utilization, with alert thresholds for each metric.
- check-rabbitmq-queue-drain-time: Estimates the time for queues to drain based on current message exit rates; alerts if drain time exceeds configured levels.
- check-rabbitmq-queues-synchronised: Ensures all mirrored queues and their secondary counterparts are synchronized to avoid data inconsistency.
- check-rabbitmq-stomp-alive: Validates RabbitMQ responsiveness to the STOMP protocol to ensure messaging availability.
Metrics Collection
- metrics-rabbitmq-overview: Collects overall RabbitMQ server statistics for performance and health insights.
- metrics-rabbitmq-queue: Gathers detailed metrics at the queue level, optionally filtered by virtual host, enabling granular performance monitoring.
Practical Implications for ServiceNow Customers
By leveraging these default checks and policies, customers can maintain high availability and performance of their RabbitMQ environments. Alerts based on critical thresholds enable rapid identification and resolution of issues such as server downtime, resource exhaustion, message backlogs, and network problems. The metrics provide actionable data for capacity planning and optimization. Note that these checks operate exclusively in Windows environments, so customers should ensure their RabbitMQ instances meet this requirement.
Agent Client Collector provides the following default checks and policies for RabbitMQ health monitoring. You must perform RabbitMQ discovery before executing the checks. RabbitMQ checks are available only in a Windows environment.
| Type | Check | Description | Command |
|---|---|---|---|
| Event | check-rabbitmq-alive | Verifies whether the RabbitMQ server is alive, using the REST API. If the server is down, an alert triggers. | check-rabbitmq-alive.rb --host {{.labels.params_host}} --port
{{.labels.params_port}} -v {{.labels.params_vhost}} |
| Event | check-rabbitmq-cluster-health | Verifies whether the RabbitMQ server's cluster nodes are running. If the nodes are down, an alert triggers. | check-rabbitmq-cluster-health.rb --host {{.labels.params_host}} --port
{{.labels.params_port}} |
| Event | check-rabbitmq-consumers | Verifies the number of consumers on the RabbitMQ server and triggers an alert based on the configured threshold. | check-rabbitmq-consumers.rb {{if .labels.params_warn}} --warn
{{.labels.params_warn}} {{end}} {{if .labels.params_critical}} --critical
{{.labels.params_critical}} {{end}} --host {{.labels.params_host}} --port
{{.labels.params_port}} |
| Event | check-rabbitmq-messages | Verifies the total number of messages queued on the RabbitMQ server and triggers an alert based on the threshold. | check-rabbitmq-messages.rb --critical {{.labels.params_critical}} --port
{{.labels.params_port}} --warn {{.labels.params_warn}} --host
{{.labels.params_host}} |
| Event | check-rabbitmq-network-partitions | Verifies whether the RabbitMQ network partition has occurred and triggers an alert based on the threshold. | check-rabbitmq-network-partitions.rb --host {{.labels.params_host}} --port {{.labels.params_port}} |
| Event | check-rabbitmq-node-health | Verifies whether the RabbitMQ server node is in a running state. | |
| Event | check-rabbitmq-node-usage | Checks and displays usage of the RabbitMQ server node. | |
| Event | check-rabbitmq-queue-drain-time | Verifies the time it will take for each queue on the RabbitMQ server to drain,
based on the current message exit rate. For example, if a queue has 1,000 messages in it but only 1 message exits per second, an alert generates because the default critical level of 360 seconds has been exceeded. |
check-rabbitmq-queue-drain-time.rb --host {{.labels.params_host}} --port
{{.labels.params_port}} --warn {{.labels.params_warn}} --critical
{{.labels.params_critical}} |
| Event | check-rabbitmq-queues-synchronised | Verifies that all mirrored queues with secondary queues are synchronised. | check-rabbitmq-queues-synchronised.rb --host {{.labels.params_host}}
--port {{.labels.params_port}} |
| Event | check-rabbitmq-stomp-alive | Verifies whether the RabbitMQ server is alive and responding to STOMP. | check-rabbitmq-stomp-alive.rb --host {{.labels.params_host}} --queue
{{.labels.params_queue}} --port {{.labels.params_port}} |
| Type | Check | Description | Command |
|---|---|---|---|
| Metric | metrics-rabbitmq-overview | Provides RabbitMQ overview statistics. | metrics-rabbitmq-overview.rb --port {{.labels.params_port}} --host
{{.labels.params_host}} |
| Metric | metrics-rabbitmq-queue | Provides RabbitMQ metrics per queue. | metrics-rabbitmq-queue.rb --port {{.labels.params_port}} --host
{{.labels.params_host}} {{if .labels.params_vhost}} --vhost {{.labels.params_vhost}}
{{end}} |