Sqoop2 Development Guide
Sqoop1 is generally operated using shell scripts, while Sqoop2 introduces Sqoop server, implementing centralized management of Connectors. Its access mode also becomes diversified, it can be accessed via REST API, JAVA API, WEB UI and CLI terminal, which is completely different from the architecture and usage of Sqoop.
If you check Sqoop2 when creating a cluster, Sqoop2 will be installed on the uhadoop-******-master2 node.
1. Export from MySQL to HDFS
1.1 Creating MySQL data
Take UDB as an example for the MySQL data source
Connect to MySQL, and create the sqoop database and table
[root@uhadoop-******-master2 ~]# mysql -h10.13.124.35 -uroot -p'password' sqoop
mysql> create database sqoop;
Query OK, 1 row affected (0.00 sec)
mysql> use sqoop
Database changed
mysql> CREATE TABLE `sqoop` (
-> `id` varchar(64) DEFAULT NULL,
-> `value` varchar(64) NOT NULL DEFAULT '',
-> PRIMARY KEY (`id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into sqoop values ('1', 'hha'),('2', 'zhang'),('3','hehe');
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
1.2 Starting Sqoop2 CLI
Executed on the uhadoop-******-master2 node
/home/hadoop/sqoop2/bin/sqoop.sh client
help can check the basic usage commands
sqoop:000> help
For information about Sqoop, visit: http://sqoop.apache.org/
Available commands:
exit (\x ) Exit the shell
history (\H ) Display, manage and recall edit-line history
help (\h ) Display this help message
set (\st ) Configure various client options and settings
show (\sh ) Display various objects and configuration options
create (\cr ) Create new object in Sqoop repository
delete (\d ) Delete existing object in Sqoop repository
update (\up ) Update objects in Sqoop repository
clone (\cl ) Create new object based on existing one
start (\sta) Start job
stop (\stp) Stop job
status (\stu) Display status of a job
enable (\en ) Enable object in Sqoop repository
disable (\di ) Disable object in Sqoop repository
For help on a specific command type: help command
1.3 Checking connector
sqoop:000> show connector
1.4 Creating mysql-link
sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: mysql-link
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://10.13.124.35/sqoop
Username: root
Password: **********
JDBC Connection Properties:
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp
entry#
New link was successfully created with validation status OK and persistent id 1
1.5 Creating hdfs-link
sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfs-link
Link configuration
HDFS URI: hdfs://Ucluster/
New link was successfully created with validation status OK and persistent id 2
1.6 Displaying link
sqoop:000> show link
1.7 Creating job
sqoop:000> create job -f 1 -t 2
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: job-mysql-to-hdfs
From database configuration
Schema name: sqoop
Table name: sqoop
Table SQL statement:
Table column names:
Partition column name: id
Null value allowed for the partition column:
Boundary query:
ToJob configuration
Override null value:
Null value:
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: /tmp/sqoop
Throttling resources
Extractors: 1
Loaders: 1
New job was successfully created with validation status OK and persistent id 1
Created job as below
sqoop:000> show job
1.8 Starting job
sqoop:000> start job -j 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-12-22 14:40:10 CST
Lastly updated by: root
External ID: job_1481968387780_0011
http://uhadoop-penomi-master1:23188/proxy/application_1481968387780_0011/
2016-12-22 14:40:10 CST: BOOTING - Progress is not available
2. Export from HDFS to MySQL
Here, the hdsf-link and mysql-link created when importing MySQL into HDFS are used.
2.1 Creating job
sqoop:000> create job -f 2 -t 1
Creating job for links with from id 2 and to id 1
Please fill following values to create new job object
Name: job-hdfs-to-mysql
From Job configuration
Input directory: /tmp/sqoop-input
Override null value:
Null value:
To database configuration
Schema name: sqoop
Table name: sqoop
Table SQL statement:
Table column names:
Stage table name:
Should clear stage table:
Throttling resources
Extractors: 1
Loaders: 1
New job was successfully created with validation status OK and persistent id 2
Added job is as below
2.2 Executing job
sqoop:000> start job -j 2
Submission details
Job ID: 2
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-12-22 15:38:00 CST
Lastly updated by: root
External ID: job_1481968387780_0012
http://uhadoop-penomi-master1:23188/proxy/application_1481968387780_0012/
2016-12-22 15:38:00 CST: BOOTING - Progress is not available
2.3 Checking MySQL data
mysql> select * from sqoop;
+----+--------+
| id | value |
+----+--------+
| 1 | hha |
| 2 | zhang |
| 3 | hehe |
| 4 | zhen |
| 5 | u |
+----+--------+
6 rows in set (0.00 sec)