Zeppelin Development Guide

Introduction

Zeppelin is an interactive data query analysis tool in the form of a Web notebook, which can use various languages and tools to query and analyze data online and generate reports.

Its default data execution engine is Spark, and it also provides multiple interpreters such as Hive, HBase, Markdown to support the analysis and processing of more data sources. The interpreters supported by the official are as follows:

In Zeppelin, interactive data exploration tasks are organized and managed in the form of notebooks. The role of the execution engine is to execute the code corresponding to the engine in the notebook for interactive data analysis and presentation.

Currently, in the u-hadoop cluster, the zeppelin service is installed in the /home/hadoop/zeppelin directory of the master1 node, and the Zeppelin service interface can be accessed through u-hadoop-xxxxx-master1:29090.

Start: service Zeppelin start
Restart: service Zeppelin restart
Shut down: service Zeppelin stop
Status check: service Zeppelin status

U-hadoop zeppelin does not allow anonymous users to use it. It uses Shiro for user authority control. All available user and password information are stored in /home/hadop/zeppelin/conf/shiro.ini. A default administrator user and three ordinary users are configured by default. The administrator can perform all operations. Ordinary users can only create notebooks based on existing interpreters, query data, and cannot create or modify interpreters.

Username	Password	Role
admin	password1	Admin
user1	password2	Normal user
user2	password3	Normal user
user3	password4	Normal user

After the user opens the zeppelin web page, click the button in the upper right corner to log in. You need to first log in with the administrator user and configure the interpreter related:

Admin configures interpreter:

After logging in with the admin user, click the drop-down box in the upper right corner and select “Interpreter”. You can see the default interpreter list. The administrator can create new interpreters based on existing interpreters (you need to create new interpreters based on the jdbc interpreter when testing hive, mysql, etc. data sources later).

Hive Interpreter Test

Hive Interpreter Configuration

Zeppelin-0.8.1 no longer provides a default Hive interpreter, only provides a jdbc interpreter, and users can create a dedicated Hive interpreter based on the jdbc interpreter.

And modify the option to the following configuration:

The main modifications are as follows:


default.driver： org.apache.hive.jdbc.HiveDriver

default.url： jdbc:hive2://uhadoop-xxxxxx-master2:10000   (for cluster hive-server2 jdbc address)

default.user： hive

zeppelin.jdbc.auth.type:  simple

Add the following two in Dependencies' artifact (choose to read from the cluster's local file here):

/home/hadoop/hive/jdbc/hive-jdbc-2.3.3-standalone.jar

/home/hadoop/lib/lib/hadoop-common-2.6.0-cdh5.13.3.jar

Hive Notebook Creation

Create Query in Hive Notebook

Click to open the /test/hive notebook created in the previous step, create a paragraph in the notebook, enter the following:

%hive show tables

Click the run triangle button on the right to execute, and you can see the output result below.

You can create many paragraphs in a notebook, hover the mouse at the bottom of each paragraph, click the pop-up “Add Paragraph” to create a new paragraph:

Here, three paragraphs are created in turn, demonstrating the process of creating, inserting data, and querying data in a table.

sql:



     1. create table test_hive (id int, name string)

     2. insert into test_hive values (1,'test_ucl'),(2,'test_hive')

     3. select * from test_hive

hbase Interpreter Test

Reference Documentation：https://zeppelin.apache.org/docs/latest/interpreter/hbase.html

The hbase interpreter is a built-in interpreter of Zeppelin, and there is no need to configure the interpreter again. Just use “%hbase” when using it to parse hbase statements

You can establish a new notebook for hbase testing, the method is the same as above, and then create the following paragraph and execute

paragraph1:


%hbase
create 'test_hbase', 'cf'

paragraph2:


%hbase
list

paragraph3:


%hbase

put 'test_hbase', 'row1', 'cf:a', 'value1'

paragraph4:


%hbase
scan 'test_hbase'

The execution result is as follows:

Use of spark Interpreter

Overview：

Reference Documentation：https://zeppelin.apache.org/docs/latest/interpreter/spark.html

The Zeppelin spark interpreter group supports the following interpreters:

Name	Class	Description
%spark	SparkInterpreter	Creates a SparkContext and provides a Scala environment
%spark.pyspark	PySparkInterpreter	Provides a Python environment
%spark.r	SparkRInterpreter	Provides an R environment with SparkR support
%spark.sql	SparkSQLInterpreter	Provides a SQL environment
%spark.dep	DepInterpreter	Dependency loader