Product Features

Terminology Explanation

Term	Explanation
Upstream	The upstream data source of the connector
Topic	UKafka’s Topic
Downstream	The downstream data target of the connector
Downstream Identifier	The name of the downstream UHadoop or UES cluster
File Splitting Mode	The split mode of data in HDFS, supports dates and hours
Plugin	Supports custom plugins for transforming the data flow
HDFS Path	The target file path of the downstream HDFS where the connector will transfer data
Index	ES’s Index
Index Type	ES’s Index Type
Processing Unit	The unit of data transmission stream, currently preset as 1 core 2G
Connector	Abbreviation for kafka connector

Upstream and Downstream Support

Currently, the upstream and downstream must be UCloud Global big data products, as detailed below:

Upstream	Downstream
UKafka	UHadoop, UES

Setting Up Connector

Click [Create Connector] on the control panel page to start setting up a new Kafka connector.

When selecting region and available zone, note that Kafka connectors must be in the same available zone as upstream and downstream products.

Once upstream and downstream info is complete, you can choose the quantity of processing units according to your own requirements. The processing power of the processing units can be referenced in the performance testing section.

It’s worth noting that the limit of the processing unit is the partition quantity of the upstream kafka topic. When the number of processing units is more than this value, the extra processing units will be idle.

Add Upstream

When configuring the upstream, you can select an existing UKafka cluster from the drop-down menu. After selecting the UKafka cluster, you can select the specific data source to be exported from the Topic drop-down menu.

Currently, the connector supports Kafka’s built-in ByteArrayDeserializer (org.apache.kafka.common.serialization.ByteArrayDeserializer) format, and other Kafka built-in formats and custom formats will be supported in the future.

Refer to this document Creating a UKafka Cluster for creating a UKafka cluster.

Add Downstream

1. Add HDFS Downstream

Firstly, choose HDFS as downstream type and select your UHadoop cluster in the Downstream Identifier.

To facilitate your data management, Kafka connectors allow dynamic configuration of data export directory, segregated according to the system clock (GMT +8) of the machine.

Lastly, specify the final export path of the data, which is the absolute path of UHadoop filesystem and doesn’t need to include the name of the UHadoop’s HDFS filesystem. If file splitting is enabled, Kafka connectors will generate corresponding time directories under this path.

Currently, the compression method of the file is uniformly gzip, and more compression methods will be supported in the future.

2. Add Elasticsearch Downstream

If Elasticsearch is selected as the downward type, you need to select your UES cluster in the Downstream Identifier and fill the name and type of the target index.

Plugin Management (Beta Release)

The custom plugin feature is currently in beta. Contact technical support to turn it on if required.

To better support your business, we allow you to parse through custom data formats by writing custom Java code, and dynamically specify the storage path of UHadoop.

The interface definition for the plugin is as follows:


package cn.ucloud.sinker.thirdpart;
 
import java.io.IOException;
 
public interface ThirdpartPlugin {
 
  /**
   * @param body original message
   * @return Converted message
   */
  public byte[] transform(byte[] body) throws IOException;
 
  /**
   * @param body original message
   * @return hdfs directory prefix for this message
   */
  public String extractDirName(byte[] body) throws IOException;
}

Ensure that the interface’s implementation class has the default constructor, so we can construct an instance of this implementation class.

When data is received from the upstream via Kafka connector, we will call the transform method for each message received from UKafka to transform the message, and at the same time call the extractDirName method for each received message to determine the prefix of the data storage path. You can ignore it by returning null if you do not need to customize the storage path.

After writing the custom plugin logic and packing it into a jar package, you need to upload this jar package to your corresponding region’s ufile storage.

The plugin’s ufile address in the form is the download address of the plugin in ufile, which can be obtained by clicking the download button in the ufile file management interface. The entry class is the reference path of the plugin interface implementation class, in the format of full-package-name.ClassName.

Monitor Explanation

Through the monitor view tab, you can see the 5 monitor items we provide.

The details of each monitor item are as follows:

Monitor Item	Description
Read Message Count	The total number of messages read from the upstream since the last node start, updated every minute.
Write Message Count	The total number of messages written to the downstream target since the last node start, updated every minute.
Read Messages Per Second	The number of messages read from the upstream per second, updated every minute.
Write Messages Per Second	The number of messages written to the downstream target per second, updated every minute.
Accumulated Message Count	The amount of unexported data accumulated in Kafka, updated every minute.