FAQs
When creating a Topic, how should I set the number of Partitions and the number of replications (replication-factor)?
The number of replications is generally recommended to be 3. The number of Partitions can be set according to the total number of disk blocks of all nodes in the cluster (for example, each node of ordinary instances only mounts one disk, while Balanced I/O type mounts 2-8 disks depending on the configuration) and the concurrency required by the client. It is recommended not to set too large, which will affect the performance of the cluster.
How many Topics can a UKafka cluster create?
The cluster can support any number of Topics. It is recommended to use large-scale Topics instead of a large number of smaller Topics. For example, if you need to distinguish messages by day, you should add a timestamp to each message instead of automatically creating a new Topic every day.
It is recommended that the number of Topics and the total number of partitions should not be too large. A large number of Topics and partitions will seriously affect the performance of the cluster, and may cause the cluster monitoring data to time out and fail to report in severe cases.
How to increase the number of replications (ReplicationFactor) for a Topic?
Please refer to official documentation
When increasing the replicas of the topic, it is recommended to reserve enough space on the data disk of the node, as data will be copied.
How to deal with the situation where related directories under the kafka-logs directory of the node occupy a lot of disk space and are not released?
__consumer_offsets
is a Topic where Kafka stores the offset information of client consumption, and it adopts a compression strategy by default.
Modify the log.cleaner.enable
parameter to ture
, and then restart the Kafka service of each node in sequence.
Receive an alarm that the total number of offline partitions is >=10.0. What is the total number of offline partitions? How to deal with it and how to avoid affecting the service?
A partition is a physical concept of a Topic in Kafka. It divides messages into blocks to achieve distributed and highly available services.
In general, receiving an alarm for the total number of offline partitions is caused by the downtime or service exception of a certain node. You can view the information about “Associated Topics” of each node through the UKafka console “Node Management” page. If it is empty, it means that this node is abnormal. Then,
You can check the server.log in “Node Management” to see if there is any abnormal log. If the Kafka service is stuck, you can restart the Kafka service of this node on the console after evaluation.
During the troubleshooting process, the replication factor of each Topic should be greater than or equal to 3 to avoid business unavailability caused by single machine faults
How to consume messages exceeding 1MB per item?
The cluster default configuration message.max.bytes
is 1MB. If you need to support larger messages, you can modify message.max.bytes
and replica.fetch.max.bytes
through the cluster parameter configuration management. On the consumer side, you need to modify fetch.message.max.bytes
.
How to access the UKafka cluster from the Internet?
Please refer to: Internet Access
What should I do if the configuration of a single node in the cluster is not enough and need to be upgraded?
Older models do not support the vertical upgrade of a single node by default. If you need to expand resources, you can add nodes horizontally; if you encounter bottlenecks in memory or other single node resources, you can contact us for an upgrade in the background.
New models (o.kafka2m.*) support vertical upgrades of specifications and online upgrades of disks. If you need to upgrade old models to new models, please contact us.
How to check the monitoring data of the UKafka cluster?
The cluster monitoring view page provides cluster input, output data volume, message count monitoring, Kafka producer, consumer monitoring data, and Zookeeper related monitoring data, and provides CPU, MEM, disk, network card of each Broker, Kafka service, and Zookeeper monitoring data.
Found that the maximum latency of zookeeper is very high. Is there a problem?
The maximum latency of zookeeper (zk_max_latency) represents the maximum request latency that has occurred since the creation of the cluster, which cannot represent the current status. If you want to understand the current zookeeper request latency, it is recommended to pay attention to the average request latency monitoring item.
Error in Getting Consumer Details
Currently, console consumer information is obtained from zookeeper or kafka api according to the consumer type, but kafka client sdk can flexibly decide the storage method of consumer information, so when using sdk that has not stored information in a standard way , Consumer information may be wrong. For these consumers, we have not separately adapted them, and the known problematic sdk include:
- pykafka
- jstorm: will not store consumer group information in a standard way, it is managed by itself the corresponding relationship and the corresponding offset between consuming instance and topic partition, some information is stored in the
/jstorm
path of zookeeper - flink 0.9 version of kafka consumption information is managed by itself, and will not register to generate group information on the kafka side
When encountering information fetching errors, you can first use the command kafka-consumer-groups.sh --bootstrap-server $(hostname):9092 --describe --group $group
to confirm whether the consumer has missing or wrong information.
Repartitioning
After expanding nodes, topic partitions on the original machines will not automatically balance to the new machines, and need to use partition reallocation tools to balance
Official description of the repartition function
The function provided by the console is part of Automatically migrating data to new machines
Consumer groups are not displayed on the console
The console only displays consumer groups with live consumers. If your consumer group is not displayed on the console, you can troubleshoot as follows:
- Confirm there are live consumer instances in the consumer group
- Confirm whether the SDK has registered ConsumerGroup information to Kafka, refer to: Error in Getting Consumer Details
- Confirm whether the client uses the instance created by ConsumerGroup