Docs
uhadoop
Developer Guide
Sqoop2 Development Guide

Sqoop2 Development Guide

Sqoop1 is generally operated using shell scripts, while Sqoop2 introduces Sqoop server, implementing centralized management of Connectors. Its access mode also becomes diversified, it can be accessed via REST API, JAVA API, WEB UI and CLI terminal, which is completely different from the architecture and usage of Sqoop.

If you check Sqoop2 when creating a cluster, Sqoop2 will be installed on the uhadoop-******-master2 node.

1. Export from MySQL to HDFS

1.1 Creating MySQL data

Take UDB as an example for the MySQL data source

Connect to MySQL, and create the sqoop database and table

[root@uhadoop-******-master2 ~]# mysql -h10.13.124.35 -uroot -p'password' sqoop

mysql> create database sqoop;
Query OK, 1 row affected (0.00 sec)

mysql> use sqoop
Database changed
mysql> CREATE TABLE `sqoop` (
    ->   `id` varchar(64) DEFAULT NULL,
    ->   `value` varchar(64) NOT NULL DEFAULT '',
    ->     PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into sqoop values ('1', 'hha'),('2', 'zhang'),('3','hehe');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

1.2 Starting Sqoop2 CLI

Executed on the uhadoop-******-master2 node

/home/hadoop/sqoop2/bin/sqoop.sh client

help can check the basic usage commands

sqoop:000> help
For information about Sqoop, visit: http://sqoop.apache.org/

Available commands:
  exit    (\x  ) Exit the shell
  history (\H  ) Display, manage and recall edit-line history
  help    (\h  ) Display this help message
  set     (\st ) Configure various client options and settings
  show    (\sh ) Display various objects and configuration options
  create  (\cr ) Create new object in Sqoop repository
  delete  (\d  ) Delete existing object in Sqoop repository
  update  (\up ) Update objects in Sqoop repository
  clone   (\cl ) Create new object based on existing one
  start   (\sta) Start job
  stop    (\stp) Stop job
  status  (\stu) Display status of a job
  enable  (\en ) Enable object in Sqoop repository
  disable (\di ) Disable object in Sqoop repository

For help on a specific command type: help command

1.3 Checking connector

sqoop:000> show connector

1.4 Creating mysql-link

sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: mysql-link

Link configuration

JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://10.13.124.35/sqoop
Username: root
Password: **********
JDBC Connection Properties:
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp
entry#
New link was successfully created with validation status OK and persistent id 1

1.5 Creating hdfs-link

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfs-link

Link configuration

HDFS URI: hdfs://Ucluster/
New link was successfully created with validation status OK and persistent id 2

1.6 Displaying link

sqoop:000> show link

1.7 Creating job

sqoop:000> create job -f 1 -t 2
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: job-mysql-to-hdfs

From database configuration

Schema name: sqoop
Table name: sqoop
Table SQL statement:
Table column names:
Partition column name: id
Null value allowed for the partition column:
Boundary query:

ToJob configuration

Override null value:
Null value:
Output format:
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
Choose: 0
Compression format:
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: /tmp/sqoop

Throttling resources

Extractors: 1
Loaders: 1
New job was successfully created with validation status OK  and persistent id 1

Created job as below

sqoop:000> show job

1.8 Starting job

sqoop:000> start job -j 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-12-22 14:40:10 CST
Lastly updated by: root
External ID: job_1481968387780_0011
    http://uhadoop-penomi-master1:23188/proxy/application_1481968387780_0011/
2016-12-22 14:40:10 CST: BOOTING  - Progress is not available

2. Export from HDFS to MySQL

Here, the hdsf-link and mysql-link created when importing MySQL into HDFS are used.

2.1 Creating job

sqoop:000> create job -f 2 -t 1
Creating job for links with from id 2 and to id 1
Please fill following values to create new job object
Name: job-hdfs-to-mysql

From Job configuration

Input directory: /tmp/sqoop-input
Override null value:
Null value:

To database configuration

Schema name: sqoop
Table name: sqoop
Table SQL statement:
Table column names:
Stage table name:
Should clear stage table:

Throttling resources

Extractors: 1
Loaders: 1
New job was successfully created with validation status OK  and persistent id 2

Added job is as below

2.2 Executing job

sqoop:000> start job -j 2
Submission details
Job ID: 2
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2016-12-22 15:38:00 CST
Lastly updated by: root
External ID: job_1481968387780_0012
    http://uhadoop-penomi-master1:23188/proxy/application_1481968387780_0012/
2016-12-22 15:38:00 CST: BOOTING  - Progress is not available

2.3 Checking MySQL data

mysql> select * from sqoop;
+----+--------+
| id | value  |
+----+--------+
| 1  | hha    |
| 2  | zhang  |
| 3  | hehe   |
| 4  | zhen   |
| 5  | u      |
+----+--------+
6 rows in set (0.00 sec)