Apache Kafka default checks and policies

  • Release version: Washingtondc
  • Updated February 1, 2024
  • 3 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Apache Kafka Default Checks and Policies

    The Agent Client Collector offers health monitoring policies for Apache Kafka, applicable for both Windows and Linux environments. These policies enable users to track the status of Kafka components, ensuring optimal performance and reliability.

    Show full answer Show less

    Key Features

    • Kafka Zookeeper Status Check: Alerts if the Kafka Zookeeper is down, using the command commonchecks check-kafka-zk-status -p 2181.
    • Topic Replicas Check: Notifies if any topic has partitions with unknown replicas. Command example: commonchecks check-kafka-replicas -H localhost -p 2181 -i "test" -e "accTopic,offsets" -d.
    • Replication Factor Check: Raises issues if the replication factor of any topic deviates from the expected value. Command example: commonchecks check-kafka-rf -H localhost -p 2181 -r 2 -i "accMetrics,Topic" -e "testTopic".
    • Topic Leader Check: Alerts on partitions with unknown leaders or unpreferred replicas. Command example: commonchecks check-kafka-leader -H localhost -p 2181 -d -e "offsets".
    • Topic Partitions Check: Raises alerts if the number of partitions for a topic is below a defined minimum. Command example: commonchecks check-kafka-partitions -H localhost -p 2181 -P 3.
    • Broker Status Check: Notifies if the Kafka Broker is down, using commonchecks check-kafka-broker-status -p 9092.
    • Broker Metrics Collection: Gathers performance metrics from the Kafka Broker. Command example: commonchecks metric-kafka-broker -J "/usr/bin/java" -j 9999.
    • Zookeeper Metrics Collection: Collects metrics from Zookeeper, using commonchecks metric-kafka-zookeeper -p 8085.

    Key Outcomes

    Utilizing these checks and policies allows ServiceNow customers to proactively monitor the health of their Kafka infrastructure. By receiving timely alerts and metrics, customers can enhance system reliability, optimize performance, and minimize downtime, ensuring seamless operations in their environments.

    Agent Client Collector provides the following policies for Apache Kafka health monitoring. Policies come with the checks specified in the indicated table. Policies and checks are available for both Windows and Linux.

    Table 1. Apache Kafka Topic Events
    Check Description Usage Output
    kafka.check-zookeeper-status Raises a critical event if the hosted Kafka Zookeeper is down. commonchecks check-kafka-zk-status [flags]

    Where the flags are:

    -p, --port = Zookeeper Port (default "2181").

    Usage example: commonchecks check-kafka-zk-status -p 2181

    Kafka Zookeeper Status OK: Kafka Zookeeper is Up!
    kafka.check-topic-replicas Raises critical event if any topic has partitions with unknown replicas. commonchecks check-kafka-replicas [flags]

    Where the flags are:

    • -p, --port = Zookeeper Port (default "2181").
    • -d, --detailed = Lists unknown replicas in each partition of a topic.
    • -i, --include_list = Comma seperated list to include topics (allows * wild character)
    • -e, --exclude_list = Comma seperated list to exclude topics (allows * wild character)
    Usage example: commonchecks check-kafka-replicas -H localhost -p 2181 -i "test*" -e "accTopic,*offsets" -d

    <topic> has partitions with unknown replicas. Unknown replicas are: {"0":["0"],"1":["0"],"2":["0"]}.

    <topic> has partitions with unknown replicas. Unknown replicas are: {"0":["0"]}.

    kafka.check-topic-replication-factor Raises critical event if replication factor of at least one topic is above or below provided replication factor param. commonchecks check-kafka-rf [flags]

    Where the flags are:

    • -p, --port = Zookeeper Port (default "2181").
    • -d, --detailed = Lists unknown replicas in each partition of a topic.
    • -i, --include_list = Comma separated list to include topics (allows * wild character)
    • -e, --exclude_list = Comma separated list to exclude topics (allows * wild character)
    • -r, --replication factor = Expected replication factor for a topic (default 1)

    Examples: commonchecks check-kafka-partitions -H localhost -p 2181 -r 2 -i "accMetrics,*Topic" -e "testTopic"

    TestTopic has replication factor 1, which is less than expected: 2.

    accMetrics has replication factor 1, which is less than expected: 2.

    kafka.check-topic-leader Raises critical event if any topic has partitions with unknown leaders or unpreferred replica as leader. commonchecks check-kafka-leader [flags]

    Where the flags are

    • -p, --port = Zookeeper Port (default "2181").
    • -d, --detailed = A list of partitions in each topic with unknown leaders or unpreferred replicas.
    • -i, --include_list = Comma separated list to include topics (allows * wild character)
    • -e, --exclude_list =Comma separated list to exclude topics (allows * wild character)

    Examples:

    commonchecks check-kafka-leader -H localhost -p 2181 -d -e "*offsets"

    <topic> contains, partitions with unpreferred replica as leader.(partitions with unpreferred replicas are [0]).

    <topic> contains, partitions with unpreferred replica as leader.(partitions with unpreferred replicas are [0]).

    kafka.check-topic-partitions Raises critical events if number of partitions for a topic is less the min_partitions param. commonchecks check-kafka-partitions [flags]

    Where the flags are:

    • -p, --port = Zookeeper Port (default "2181").
    • -P, --min_partitions = Minimum partitions for a topic (default 1).
    • -i, --include_list = Comma separated list to include topics (allows * wild character)
    • -e, --exclude_list =Comma separated list to exclude topics (allows * wild character)

    Usage example 1: commonchecks check-kafka-partitions -H localhost -p 2181 -P 3

    <topic> has 1 partitions, expected at least 3.

    <topic> has 1 partitions, expected at least 3.

    <topic> has 1 partitions, expected at least 3.

    Usage example 2: commonchecks check-kafka-partitions -H localhost -p 2181 -P 3 -i "accMetrics,*Topic" -e "testTopic"

    <topic> has 1 partitions, expected at least 3.

    <topic> has 1 partitions, expected at least 3.

    Note:
    Values for include_list and exclude_list parameters have to be wrapped between double quotes. For example: "test1,*topic".
    Table 2. Apache Kafka Broker Events
    Check Description Usage Output
    kafka.check-broker-status Raises critical event if Kafka Broker on the host is down. commonchecks check-kafka-broker-status [flags]

    Where the flags are:

    -p, --port = Kafka Broker port (default "9092").

    Usage example: commonchecks check-kafka-broker-status -p 9092

    Kafka Broker Status OK: Kafka Broker ubuntu20:9092 is Up!
    Table 3. Apache Kafka Broker Metrics
    Check Description Usage Output
    kafka.metrics.broker Collects Kafka Broker Metrics from the host. commonchecks metric-kafka-broker [flags]
    Where the flags are:
    • -J, --javapath = Java executable path (default "java").
    • -j, --jmxport = JMX Port (default "9999")

    Usage example: commonchecks metric-kafka-broker -J "/usr/bin/java" -j 9999

    hostname.Kafka.Broker.ReplicaManager.IsrExpandsPerSec.OneMinuteRate 0.000

    hostname.Kafka.Broker.DelayedOperationPurgatory.PurgatorySize.Fetch.Value 627.000

    hostname.Kafka.Broker.ControllerStats.UncleanLeaderElectionsPerSec.OneMinuteRate 0.000

    hostname.Kafka.Broker.RequestMetrics.RequestsPerSec.Produce.OneMinuteRate 0.000

    Table 4. Apache Kafka Zookeeper Metrics
    Check Description Usage Output
    kafka.metrics.zookeeper Collects Zookeeper Metrics from the host. commonchecks metric-kafka-zookeeper [flags]

    Where the flag is: -p, --adminserverport = Admin Server Port (default "8085")

    Usage example: commonchecks metric-kafka-zookeeper -p 8085

    hostname.Kafka.Zookeeper.outstanding_requests 2.000 1648183249

    hostname.Kafka.Zookeeper.avg_latency 1.05 1648183249

    hostname.Kafka.Zookeeper.num_alive_connections 1.000 1648183249

    hostname.Kafka.Zookeeper.open_file_descriptor_count 124.000 1648183249