Skip to Content
Developer GuideSqoop2 Development Guide

Sqoop2 Development Guide

Sqoop1 is generally operated using shell scripts, while Sqoop2 introduces Sqoop server, implementing centralized management of Connectors. Its access mode also becomes diversified, it can be accessed via REST API, JAVA API, WEB UI and CLI terminal, which is completely different from the architecture and usage of Sqoop.

If you check Sqoop2 when creating a cluster, Sqoop2 will be installed on the uhadoop-******-master2 node.

1. Export from MySQL to HDFS

1.1 Creating MySQL data

Take UDB as an example for the MySQL data source

Connect to MySQL, and create the sqoop database and table

[root@uhadoop-******-master2 ~]# mysql -h10.13.124.35 -uroot -p'password' sqoop mysql> create database sqoop; Query OK, 1 row affected (0.00 sec) mysql> use sqoop Database changed mysql> CREATE TABLE `sqoop` ( -> `id` varchar(64) DEFAULT NULL, -> `value` varchar(64) NOT NULL DEFAULT '', -> PRIMARY KEY (`id`) -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.01 sec) mysql> insert into sqoop values ('1', 'hha'),('2', 'zhang'),('3','hehe'); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0

1.2 Starting Sqoop2 CLI

Executed on the uhadoop-******-master2 node

/home/hadoop/sqoop2/bin/sqoop.sh client

help can check the basic usage commands

sqoop:000> help For information about Sqoop, visit: http://sqoop.apache.org/ Available commands: exit (\x ) Exit the shell history (\H ) Display, manage and recall edit-line history help (\h ) Display this help message set (\st ) Configure various client options and settings show (\sh ) Display various objects and configuration options create (\cr ) Create new object in Sqoop repository delete (\d ) Delete existing object in Sqoop repository update (\up ) Update objects in Sqoop repository clone (\cl ) Create new object based on existing one start (\sta) Start job stop (\stp) Stop job status (\stu) Display status of a job enable (\en ) Enable object in Sqoop repository disable (\di ) Disable object in Sqoop repository For help on a specific command type: help command

1.3 Checking connector

sqoop:000> show connector
sqoop:000> create link -c 4 Creating link for connector with id 4 Please fill following values to create new link object Name: mysql-link Link configuration JDBC Driver Class: com.mysql.jdbc.Driver JDBC Connection String: jdbc:mysql://10.13.124.35/sqoop Username: root Password: ********** JDBC Connection Properties: There are currently 0 values in the map: entry# protocol=tcp There are currently 1 values in the map: protocol = tcp entry# New link was successfully created with validation status OK and persistent id 1
sqoop:000> create link -c 3 Creating link for connector with id 3 Please fill following values to create new link object Name: hdfs-link Link configuration HDFS URI: hdfs://Ucluster/ New link was successfully created with validation status OK and persistent id 2
sqoop:000> show link

1.7 Creating job

sqoop:000> create job -f 1 -t 2 Creating job for links with from id 1 and to id 2 Please fill following values to create new job object Name: job-mysql-to-hdfs From database configuration Schema name: sqoop Table name: sqoop Table SQL statement: Table column names: Partition column name: id Null value allowed for the partition column: Boundary query: ToJob configuration Override null value: Null value: Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Choose: 0 Compression format: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOM Choose: 0 Custom compression format: Output directory: /tmp/sqoop Throttling resources Extractors: 1 Loaders: 1 New job was successfully created with validation status OK and persistent id 1

Created job as below

sqoop:000> show job

1.8 Starting job

sqoop:000> start job -j 1 Submission details Job ID: 1 Server URL: http://localhost:12000/sqoop/ Created by: root Creation date: 2016-12-22 14:40:10 CST Lastly updated by: root External ID: job_1481968387780_0011 http://uhadoop-penomi-master1:23188/proxy/application_1481968387780_0011/ 2016-12-22 14:40:10 CST: BOOTING - Progress is not available

2. Export from HDFS to MySQL

Here, the hdsf-link and mysql-link created when importing MySQL into HDFS are used.

2.1 Creating job

sqoop:000> create job -f 2 -t 1 Creating job for links with from id 2 and to id 1 Please fill following values to create new job object Name: job-hdfs-to-mysql From Job configuration Input directory: /tmp/sqoop-input Override null value: Null value: To database configuration Schema name: sqoop Table name: sqoop Table SQL statement: Table column names: Stage table name: Should clear stage table: Throttling resources Extractors: 1 Loaders: 1 New job was successfully created with validation status OK and persistent id 2

Added job is as below

2.2 Executing job

sqoop:000> start job -j 2 Submission details Job ID: 2 Server URL: http://localhost:12000/sqoop/ Created by: root Creation date: 2016-12-22 15:38:00 CST Lastly updated by: root External ID: job_1481968387780_0012 http://uhadoop-penomi-master1:23188/proxy/application_1481968387780_0012/ 2016-12-22 15:38:00 CST: BOOTING - Progress is not available

2.3 Checking MySQL data

mysql> select * from sqoop; +----+--------+ | id | value | +----+--------+ | 1 | hha | | 2 | zhang | | 3 | hehe | | 4 | zhen | | 5 | u | +----+--------+ 6 rows in set (0.00 sec)