Explanation of Monitoring Metrics
Category of Monitoring Metrics | Monitoring Metrics | Explanation |
---|---|---|
Kafka Metrics | Incoming Message Rate (Per/s) | It is the rate of messages (per/s) covering all topics, averaged over one minute. |
::: | Incoming Data Rate (B/s) | It is the rate of incoming data (B/s) covering all topics, averaged over one minute. |
::: | Outgoing Data Rate (B/s) | It is the rate of outgoing data (B/s) covering all topics, averaged over one minute. |
::: | Failed Consumer Requests (Per/s) | It is the number of failed requests by consumers, averaged over one minute. |
::: | Failed Producer Requests (Per/s) | It is the number of failed requests by producers, averaged over one minute. |
::: | Message Rejected by Broker (B/s) | It is the amount of messages rejected by broker, averaged over one minute. |
::: | Leader Election Speed (ms) | It is the speed at which a leader is elected when the Broker is down. The election should be completed in as short a time as possible by the cluster. |
::: | Number of Live Controllers (N) | At the same time, there can only be one controller in the cluster at most. It is same to the following metric of the number of management nodes (N). |
::: | Producer Request Response Time (ms) | It is the average response time of the producer. |
::: | Producer QPS (Per/s) | It is the QPS of the producer, averaged over one minute. |
::: | Consumer Request Response Time (ms) | It is the average response time of the consumer. |
::: | Consumer QPS (Per/s) | It is the QPS of the consumer, averaged over one minute. |
::: | Number of Live Kafka Nodes (N) | It is the number of live nodes in the cluster, which should be same to the number of cluster nodes. |
::: | Maximum Message Lag Between Follower and Leader (N) | It is the maximum number of messages follower lag behind leader replica. |
::: | Total Number of Partitions on This Node (N) | It is the total number of partitions on this node. |
::: | Total Number of Leader Partitions on This Node (N) | It is the total number of leader partitions on this node. |
::: | Total Number of Unreplicated Partitions (N) | It is the number of partitions waiting for replication, the normal value is 0 |
::: | ISR Shrink Rate (Per/s) | The shrinking rate of ISR.<\br> If a broker goes down, some partition’s ISR will shrink.<\br> When the broker is back online, ISR will expand once its replica is fully caught up.<\br> Besides, under normal circumstances, this value and the following expansion rate are both 0. |
::: | ISR Expansion Rate (Per/s) | The expansion rate of ISR. See the shrinking rate of ISR for details. |
::: | Number of Management Nodes (N) | Whether the current broker is a controller. <\br> Only one broker has this value as 1 in the cluster, others as 0. If all are 0, the cluster has problems. |
::: | Total Number of Offline Partitions (N) | The number of offline partitions. |
Node Metrics | CPU Utilization (%) | The CPU utilization of the node. |
::: | Disk Read/Write Throughput (Kb/s) | The throughput of disk read and write. |
::: | Disk Read/Write Times (Per/s) | The times of disk read and write. |
::: | Network Interface In/Out Bandwidth (Kb/s) | The bandwidth of network interface in/out. |
::: | Network Interface In/Out Packet Volume (Per/s) | The packet volume of network interface in/out. |
::: | Memory Utilization (%) | The memory utilization of the node. |
::: | Data Disk Utilization (%) | The utilization of data disk of node. |
::: | System Disk Utilization (%) | The utilization of system disk of node. |
Zookeeper Metrics | Current Active Connections of zk (N) | The number of current active connections of zk. |
::: | Max Request Delay of zk (ms) | The max request delay of zk. |
::: | Average Request Delay of zk (ms) | The average request delay of zk. |
::: | Min Request Delay of zk (ms) | The min request delay of zk. |
::: | Total Responses Recieved by zk (10k) | The total responses recieved by zk. |
::: | Total Responses Sent by zk (10k) | The total responses sent by zk. |
::: | Pending Connections of zk (N) | The number of pending connections of zk. |
::: | Watcher Number (N) | Watch mechanism used for active notifications by Zookeeper when data is changed.<\br> Watch could be attached to each node, so if there are 10k nodes in an application,<\br> there might be 10k watches (or even more) in Zookeeper. |
::: | Number of Znodes (N) | Znode is the node of Zookeeper, similar to the directory or file of a file system. |