UHadoop - XXXCloud

Flink Operation Guide

1. Basic Introduction

Apache Flink is a distributed processing engine and framework for stateful computations over bounded and unbounded data streams. Flink is designed to run in all common cluster environments, performing computations at *arbitrary scale and in-memory speed. For more details, see Appache Flink .

2. Job Submission Mode

Flink in UHadoop uses Flink on Yarn mode for deployment. You can SSH into any node of the UHadoop cluster and submit Flink jobs from the command line.

Flink on Yarn mode provides Session, Per-Job Cluster, and Application modes to submit jobs, as explained below:

Mode	Description	Advantages and Disadvantages
Session	This mode creates a Flink cluster, all jobs will be submitted to this cluster to run, and the cluster will not be automatically released even after the jobs end. If some job encounters an exception causing a Task Manager to shut down, other jobs running on this Task Manager will fail as well.	Advantages: Low resource allocation overhead for job submission. Disadvantages: There is resource competition between jobs. Applicable scenarios: Deploy jobs that need shorter startup time and relatively short running time.
Per-Job Cluster	In this mode, each time a Flink job is submitted, YARN will start a new Flink cluster for this job and run the job. When the job ends or is cancelled, the Flink cluster belonging to the job will also be released.	Advantages: Resource isolation between jobs. The abnormal behavior of a job will not affect other jobs. Disadvantages: Larger overhead for starting jobs. Applicable scenarios: Jobs with longer running times.
Application	In this mode, each submission of a Flink Application (an Application contains one or more jobs), YARN will start a new Flink cluster for this Application. When the Application ends or is cancelled, the Flink cluster belonging to the Application will also be released. The difference between this mode and the Per-Job mode is that the main() method in the JAR file corresponding to the Application is executed in the Job Manager in the cluster. If the submitted JAR file contains multiple jobs, these jobs will execute in the cluster belonging to the Application.	Advantages: Can alleviate the burden of the client when submitting jobs. Disadvantages: Larger overhead for starting jobs. Applicable scenarios: Jobs with longer running times.

3. Job Submission

Prerequisite: Login to the cluster node via SSH.

3.1 Session Mode

Execute the following command to start a yarn session
```
yarn-session.sh --detached
```
Execute the following command to submit a job
```
flink run /home/hadoop/flink/examples/streaming/TopSpeedWindowing.jar
```
After successful submission, the Yarn ApplicationId corresponding to the job will be returned (used in subsequent steps), as shown in the red field below:

3.2 Per-Job Mode

Execute the following command to submit a job
```
flink run -t yarn-per-job --detached /home/hadoop/flink/examples/streaming/TopSpeedWindowing.jar
```
After successful submission, the Yarn ApplicationId corresponding to the job will be returned (used in subsequent steps), as shown in the red field below:

If an error occurs during task submission: Exception in thread "Thread-6" java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'.

Add classloader.check-leaked-classloader: false in the /home/hadoop/flink/conf/flink-conf.yaml file and resubmit the task.

3.3 Application Mode

Execute the following command to submit a job
```
flink run-application -t yarn-application /home/hadoop/flink/examples/streaming/TopSpeedWindowing.jar
```
After successful submission, the Yarn ApplicationId corresponding to the job will be returned (used in subsequent steps), as shown in the red field below:

4. Job Operations

4.1 View Jobs

Visit Yarn WebUI, for accessing methods refer to UHadoop Service WebUI Access Guide, select the corresponding task, and click the task ID.
After entering the new page, click the Tracking URL link:
Enter the Flink page to view job-related information.

Note: For completed jobs, logs need to be viewed separately, as follows:

Visit Yarn WebUI, for accessing methods refer to UHadoop Service WebUI Access Guide, select the corresponding task, and click the task ID.

You can view the job’s logs, as follows:

4.2 Stop Jobs

You can execute the following command to stop jobs:


flink cancel -t yarn-application -Dyarn.application.id=<applicationId> <jobId>

4.3 Specify Job Configuration

You can specify job configurations in the following ways:

Specify in the code: You can directly specify the configuration in the code, referred as below:


// instantiate table environment
Configuration configuration = new Configuration();
// set low-level key-value options
configuration.setString("table.exec.mini-batch.enabled", "true");
configuration.setString("table.exec.mini-batch.allow-latency", "5 s");
configuration.setString("table.exec.mini-batch.size", "5000");

...

For more details, refer to Flink Configuration .

Set in the job submission command, you can specify the configuration item through -D, as follows:
```
flink run-application -t yarn-application -D classloader.check-leaked-classloader=false
```
Modify the Flink configuration file /home/hadoop/flink/conf/flink-conf.yaml, modify or add configuration items.