Hue Development Guide
Hue is an open-source user interface for Hadoop that allows you to more easily run and develop Hive queries, manage files in HDFS, run and develop Pig scripts, and manage tables. The service is started by default, only need to configure the external IP, configure the open port in the firewall. If Hue is not installed, it can be turned on in the “Service Management” page of the cluster.
Access address: <http://external network ip:8888>
Default username/password: hadoop/hadoop or hue/hue, users can change it after login.
1. Configuring Workflow
To enable the workflow function of Hue, it needs to be installed on the cluster. Oozie can be enabled on the “Service Management” page of the cluster.
1.1 Creating a new workflow
In your browser, sequentially click【Workflows】->【Editors】->【Wokflows】, enter the Workflow Editor. Then click the 【Create】button on the right side of the page.
1.2 Creating Spark Task
Drag the spark tag from action to the workflow. Click the 【Settings】button in the upper right corner, a pop-up window appears, we can set the variable name for the Workflow here, and set the Workspace.
Now, add the input and output variables, and set the Workspace to the HDFS directory /user/admin/workspaces, as:
After setting, click the cross in the upper right corner to close this pop-up window, and return to the previous page (now you need to put the Jar package required by the Spark Job into the corresponding HDFS directory.
We drag the 【Spark】 icon to the corresponding position, and then continue to set it. We set the path of the Jar package to lib/oozie-examples.jar. Also need to set the main class.
Note this is HDFS path, and it is relative to the Workspace path (so the actual path is /user/admin/workspaces/lib/oozie-examples.jar)
Since the function of this main class is to copy HDFS files, it needs to pass two parameters to the main method when it runs, they are the src path and dest path, so we continue to add parameters:
Here ${input} and ${output} are the variable names we set before in 【settings】.
Okay, when everything is set up, click on the 【Save】 button in the upper right corner.
1.3 Creating Hive Task
Hive-server2 is used on uhadoop, so here we choose the hive-server2 tag and drag it to action.
Then, upload the prepared SQL scripts to HDFS, and configure parameters for the task submission
If ufd is defined, it can be specified by file parameters
Finally, save the workflow.
1.4 Creating Sqoop Task
Choose the sqoop1 tag and drag it to action. Then, add the Sqoop command to execute.
Note:
Passwords cannot be enclosed in additional quotes, Hue interprets all content after the -p parameter as a password;
Sqoop distributes tasks to all nodes of the cluster, make sure that all nodes of the cluster have read and write permissions to the target database.
Finally, save the workflow.
2. Hue Page Permission Control
- Click on【Manage Users】->【Groups】-> Select the group name to modify, set the corresponding permissions and save