Apache Kafka default checks and policies
Summarize
Summary of Apache Kafka Default Checks and Policies
The Agent Client Collector offers various health monitoring policies for Apache Kafka, applicable to both Windows and Linux environments. These policies include checks that help monitor Kafka's status, replication factors, leader assignments, and partition counts, ensuring optimal performance and reliability.
Show less
Key Features
- Kafka Zookeeper Status Check: Monitors the health of the Kafka Zookeeper. A critical event is raised if it is down. Command:
commonchecks check-kafka-zk-status -p 2181. - Topic Replicas Check: Identifies topics with unknown replicas, raising critical events as needed. Command:
commonchecks check-kafka-replicas -H localhost -p 2181 -i "test" -e "accTopic,offsets" -d. - Replication Factor Check: Alerts if any topic's replication factor deviates from the specified parameter. Command:
commonchecks check-kafka-rf -H localhost -p 2181 -r 2. - Topic Leader Check: Raises events for topics with unknown leaders or unpreferred replicas. Command:
commonchecks check-kafka-leader -H localhost -p 2181 -d. - Topic Partitions Check: Checks if the number of partitions for a topic falls below a defined minimum. Command:
commonchecks check-kafka-partitions -H localhost -p 2181 -P 3. - Broker Status Check: Monitors the status of the Kafka Broker, raising alerts if it is down. Command:
commonchecks check-kafka-broker-status -p 9092. - Broker Metrics Collection: Gathers performance metrics from the Kafka Broker. Command:
commonchecks metric-kafka-broker -J "/usr/bin/java" -j 9999. - Zookeeper Metrics Collection: Collects metrics from the Zookeeper instance. Command:
commonchecks metric-kafka-zookeeper -p 8085.
Key Outcomes
By implementing these checks, ServiceNow customers can proactively monitor the health of their Apache Kafka environments, ensuring high availability and performance. This enables timely identification of issues, leading to quick resolutions and improved operational efficiency.
Agent Client Collector provides the following policies for Apache Kafka health monitoring. Policies come with the checks specified in the indicated table. Policies and checks are available for both Windows and Linux.
| Check | Description | Usage | Output |
|---|---|---|---|
| kafka.check-zookeeper-status | Raises a critical event if the hosted Kafka Zookeeper is down. | commonchecks check-kafka-zk-status [flags]Where the flags are: -p, --port = Zookeeper Port (default "2181").Usage
example: |
Kafka Zookeeper Status OK: Kafka Zookeeper is Up! |
| kafka.check-topic-replicas | Raises critical event if any topic has partitions with unknown replicas. | commonchecks check-kafka-replicas [flags]Where the flags are:
|
<topic> has partitions with unknown replicas. Unknown replicas are: {"0":["0"],"1":["0"],"2":["0"]}. <topic> has partitions with unknown replicas. Unknown replicas are: {"0":["0"]}. |
| kafka.check-topic-replication-factor | Raises critical event if replication factor of at least one topic is above or below provided replication factor param. | commonchecks check-kafka-rf [flags]Where the flags are:
Examples: |
TestTopic has replication factor 1, which is less than expected: 2. accMetrics has replication factor 1, which is less than expected: 2. |
| kafka.check-topic-leader | Raises critical event if any topic has partitions with unknown leaders or unpreferred replica as leader. | commonchecks check-kafka-leader [flags]Where the flags are
Examples:
|
<topic> contains, partitions with unpreferred replica as leader.(partitions with unpreferred replicas are [0]). <topic> contains, partitions with unpreferred replica as leader.(partitions with unpreferred replicas are [0]). |
| kafka.check-topic-partitions | Raises critical events if number of partitions for a topic is less the min_partitions param. | commonchecks check-kafka-partitions [flags]
Where the flags are:
|
|
Usage example 1: |
<topic> has 1 partitions, expected at least 3. <topic> has 1 partitions, expected at least 3. <topic> has 1 partitions, expected at least 3. |
||
| Usage example 2: commonchecks check-kafka-partitions -H localhost -p 2181 -P 3 -i "accMetrics,*Topic" -e "testTopic" | <topic> has 1 partitions, expected at least 3. <topic> has 1 partitions, expected at least 3. |
| Check | Description | Usage | Output |
|---|---|---|---|
| kafka.check-broker-status | Raises critical event if Kafka Broker on the host is down. | commonchecks check-kafka-broker-status [flags]Where the flags are: -p, --port = Kafka Broker port (default
"9092").Usage example: |
Kafka Broker Status OK: Kafka Broker ubuntu20:9092 is Up! |
| Check | Description | Usage | Output |
|---|---|---|---|
| kafka.metrics.broker | Collects Kafka Broker Metrics from the host. | commonchecks metric-kafka-broker [flags]Where the flags
are:
Usage example: |
hostname.Kafka.Broker.ReplicaManager.IsrExpandsPerSec.OneMinuteRate 0.000 hostname.Kafka.Broker.DelayedOperationPurgatory.PurgatorySize.Fetch.Value 627.000 hostname.Kafka.Broker.ControllerStats.UncleanLeaderElectionsPerSec.OneMinuteRate 0.000 hostname.Kafka.Broker.RequestMetrics.RequestsPerSec.Produce.OneMinuteRate 0.000 |
| Check | Description | Usage | Output |
|---|---|---|---|
| kafka.metrics.zookeeper | Collects Zookeeper Metrics from the host. | commonchecks metric-kafka-zookeeper [flags]Where the flag
is: Usage example: |
hostname.Kafka.Zookeeper.outstanding_requests 2.000 1648183249 hostname.Kafka.Zookeeper.avg_latency 1.05 1648183249 hostname.Kafka.Zookeeper.num_alive_connections 1.000 1648183249 hostname.Kafka.Zookeeper.open_file_descriptor_count 124.000 1648183249 |