Zeppelin Development Guide
Introduction
Zeppelin is an interactive data query analysis tool in the form of a Web notebook, which can use various languages and tools to query and analyze data online and generate reports.
Its default data execution engine is Spark, and it also provides multiple interpreters such as Hive, HBase, Markdown to support the analysis and processing of more data sources. The interpreters supported by the official are as follows:
In Zeppelin, interactive data exploration tasks are organized and managed in the form of notebooks. The role of the execution engine is to execute the code corresponding to the engine in the notebook for interactive data analysis and presentation.
Installation and Login
Currently, in the u-hadoop cluster, the zeppelin service is installed in the /home/hadoop/zeppelin directory of the master1 node, and the Zeppelin service interface can be accessed through u-hadoop-xxxxx-master1:29090.
- Start: service Zeppelin start
- Restart: service Zeppelin restart
- Shut down: service Zeppelin stop
- Status check: service Zeppelin status
U-hadoop zeppelin does not allow anonymous users to use it. It uses Shiro for user authority control. All available user and password information are stored in /home/hadop/zeppelin/conf/shiro.ini. A default administrator user and three ordinary users are configured by default. The administrator can perform all operations. Ordinary users can only create notebooks based on existing interpreters, query data, and cannot create or modify interpreters.
Username | Password | Role |
---|---|---|
admin | password1 | Admin |
user1 | password2 | Normal user |
user2 | password3 | Normal user |
user3 | password4 | Normal user |
After the user opens the zeppelin web page, click the button in the upper right corner to log in. You need to first log in with the administrator user and configure the interpreter related:
Admin configures interpreter:
After logging in with the admin user, click the drop-down box in the upper right corner and select “Interpreter”. You can see the default interpreter list. The administrator can create new interpreters based on existing interpreters (you need to create new interpreters based on the jdbc interpreter when testing hive, mysql, etc. data sources later).
Hive Interpreter Test
Hive Interpreter Configuration
Zeppelin-0.8.1 no longer provides a default Hive interpreter, only provides a jdbc interpreter, and users can create a dedicated Hive interpreter based on the jdbc interpreter.
And modify the option to the following configuration:
The main modifications are as follows:
default.driver: org.apache.hive.jdbc.HiveDriver
default.url: jdbc:hive2://uhadoop-xxxxxx-master2:10000 (for cluster hive-server2 jdbc address)
default.user: hive
zeppelin.jdbc.auth.type: simple
Add the following two in Dependencies' artifact (choose to read from the cluster's local file here):
/home/hadoop/hive/jdbc/hive-jdbc-2.3.3-standalone.jar
/home/hadoop/lib/lib/hadoop-common-2.6.0-cdh5.13.3.jar
Hive Notebook Creation
Create Query in Hive Notebook
Click to open the /test/hive notebook created in the previous step, create a paragraph in the notebook, enter the following:
%hive show tables
Click the run triangle button on the right to execute, and you can see the output result below.
You can create many paragraphs in a notebook, hover the mouse at the bottom of each paragraph, click the pop-up “Add Paragraph” to create a new paragraph:
Here, three paragraphs are created in turn, demonstrating the process of creating, inserting data, and querying data in a table.
sql:
1. create table test_hive (id int, name string)
2. insert into test_hive values (1,'test_ucl'),(2,'test_hive')
3. select * from test_hive
hbase Interpreter Test
Reference Documentation:https://zeppelin.apache.org/docs/latest/interpreter/hbase.html
The hbase interpreter is a built-in interpreter of Zeppelin, and there is no need to configure the interpreter again. Just use “%hbase” when using it to parse hbase statements
You can establish a new notebook for hbase testing, the method is the same as above, and then create the following paragraph and execute
paragraph1:
%hbase
create 'test_hbase', 'cf'
paragraph2:
%hbase
list
paragraph3:
%hbase
put 'test_hbase', 'row1', 'cf:a', 'value1'
paragraph4:
%hbase
scan 'test_hbase'
The execution result is as follows:
Use of spark Interpreter
Overview:
Reference Documentation:https://zeppelin.apache.org/docs/latest/interpreter/spark.html
The Zeppelin spark interpreter group supports the following interpreters:
Name | Class | Description |
---|---|---|
%spark | SparkInterpreter | Creates a SparkContext and provides a Scala environment |
%spark.pyspark | PySparkInterpreter | Provides a Python environment |
%spark.r | SparkRInterpreter | Provides an R environment with SparkR support |
%spark.sql | SparkSQLInterpreter | Provides a SQL environment |
%spark.dep | DepInterpreter | Dependency loader |