Apache Kafka default checks and policies
Summarize
Summary of Apache Kafka default checks and policies
Agent Client Collector offers predefined policies and checks to monitor the health and performance of Apache Kafka deployments on both Windows and Linux platforms. These checks cover key components such as Kafka Zookeeper, topics, brokers, and relevant metrics to ensure Kafka clusters are operating reliably.
Show less
Kafka Topic Health Checks
- Zookeeper Status: Detects if the Kafka Zookeeper service is down and raises critical alerts accordingly.
- Topic Replicas: Identifies partitions with unknown replicas, allowing inclusion or exclusion of specific topics and detailed reporting of affected partitions.
- Replication Factor: Verifies if topics meet the expected replication factor, alerting when replication is below or above the specified threshold.
- Topic Leader: Flags partitions with unknown leaders or unpreferred replicas acting as leaders, with options to filter topics and provide detailed partition data.
- Topic Partitions: Checks if topics have fewer partitions than a defined minimum, supporting topic filtering and detailed alerts.
Flags commonly include setting Zookeeper port, specifying topic filters with wildcard support, and toggling detailed output to tailor monitoring to your Kafka environment.
Kafka Broker Health Checks
- Broker Status: Monitors Kafka Broker availability on the host and triggers critical events if the broker is down, with customizable port settings.
Kafka Metrics Collection
- Broker Metrics: Collects performance metrics from Kafka brokers using JMX, such as request rates and leader election stats, with configurable Java executable and JMX port.
- Zookeeper Metrics: Gathers Zookeeper performance data like outstanding requests and latency via the admin server port, aiding in comprehensive cluster health assessment.
Practical Application for ServiceNow Customers
By implementing these default checks and policies within the Agent Client Collector, ServiceNow customers can automate Kafka environment monitoring, receive timely alerts on critical issues, and gain actionable insights into Kafka cluster health. This enables proactive management of Kafka infrastructure, minimizes downtime, and supports service reliability.
Agent Client Collector provides the following policies for Apache Kafka health monitoring. Policies come with the checks specified in the indicated table. Policies and checks are available for both Windows and Linux.
| Check | Description | Usage | Output |
|---|---|---|---|
| kafka.check-zookeeper-status | Raises a critical event if the hosted Kafka Zookeeper is down. | commonchecks check-kafka-zk-status [flags]Where the flags are: -p, --port = Zookeeper Port (default "2181").Usage
example: |
Kafka Zookeeper Status OK: Kafka Zookeeper is Up! |
| kafka.check-topic-replicas | Raises critical event if any topic has partitions with unknown replicas. | commonchecks check-kafka-replicas [flags]Where the flags are:
|
<topic> has partitions with unknown replicas. Unknown replicas are: {"0":["0"],"1":["0"],"2":["0"]}. <topic> has partitions with unknown replicas. Unknown replicas are: {"0":["0"]}. |
| kafka.check-topic-replication-factor | Raises critical event if replication factor of at least one topic is above or below provided replication factor param. | commonchecks check-kafka-rf [flags]Where the flags are:
Examples: |
TestTopic has replication factor 1, which is less than expected: 2. accMetrics has replication factor 1, which is less than expected: 2. |
| kafka.check-topic-leader | Raises critical event if any topic has partitions with unknown leaders or unpreferred replica as leader. | commonchecks check-kafka-leader [flags]Where the flags are
Examples:
|
<topic> contains, partitions with unpreferred replica as leader.(partitions with unpreferred replicas are [0]). <topic> contains, partitions with unpreferred replica as leader.(partitions with unpreferred replicas are [0]). |
| kafka.check-topic-partitions | Raises critical events if number of partitions for a topic is less the min_partitions param. | commonchecks check-kafka-partitions [flags]
Where the flags are:
|
|
Usage example 1: |
<topic> has 1 partitions, expected at least 3. <topic> has 1 partitions, expected at least 3. <topic> has 1 partitions, expected at least 3. |
||
| Usage example 2: commonchecks check-kafka-partitions -H localhost -p 2181 -P 3 -i "accMetrics,*Topic" -e "testTopic" | <topic> has 1 partitions, expected at least 3. <topic> has 1 partitions, expected at least 3. |
| Check | Description | Usage | Output |
|---|---|---|---|
| kafka.check-broker-status | Raises critical event if Kafka Broker on the host is down. | commonchecks check-kafka-broker-status [flags]Where the flags are: -p, --port = Kafka Broker port (default
"9092").Usage example: |
Kafka Broker Status OK: Kafka Broker ubuntu20:9092 is Up! |
| Check | Description | Usage | Output |
|---|---|---|---|
| kafka.metrics.broker | Collects Kafka Broker Metrics from the host. | commonchecks metric-kafka-broker [flags]Where the flags
are:
Usage example: |
hostname.Kafka.Broker.ReplicaManager.IsrExpandsPerSec.OneMinuteRate 0.000 hostname.Kafka.Broker.DelayedOperationPurgatory.PurgatorySize.Fetch.Value 627.000 hostname.Kafka.Broker.ControllerStats.UncleanLeaderElectionsPerSec.OneMinuteRate 0.000 hostname.Kafka.Broker.RequestMetrics.RequestsPerSec.Produce.OneMinuteRate 0.000 |
| Check | Description | Usage | Output |
|---|---|---|---|
| kafka.metrics.zookeeper | Collects Zookeeper Metrics from the host. | commonchecks metric-kafka-zookeeper [flags]Where the flag
is: Usage example: |
hostname.Kafka.Zookeeper.outstanding_requests 2.000 1648183249 hostname.Kafka.Zookeeper.avg_latency 1.05 1648183249 hostname.Kafka.Zookeeper.num_alive_connections 1.000 1648183249 hostname.Kafka.Zookeeper.open_file_descriptor_count 124.000 1648183249 |