HBase Development Guide

HBase is a highly reliable, high-performance, column-oriented, scalable distributed storage system that supports real-time analysis through key/value storage, and batch analysis support through map-reduce.

1. HBase shell

Hbase shell is simple, it is a way to interact with HBase through shell, the following introduces its usage:

1.1 Start shell


[root@uhadoop-******-master1 ~]# hbase shell
2016-12-06 11:00:08,624 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0-cdh5.8.0, rUnknown, Tue Jul 12 16:11:18 PDT 2016

1.2 Create a table and insert 3 data


hbase(main):004:0> create 'test_tbl', 'cf'
0 row(s) in 1.2280 seconds

=> Hbase::Table - test_tbl
hbase(main):005:0> list 'test_tbl'
TABLE
test_tbl
1 row(s) in 0.0180 seconds

=> ["test_tbl"]
hbase(main):006:0> put 'test_tbl', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1780 seconds

hbase(main):007:0> put 'test_tbl', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0060 seconds

hbase(main):008:0> put 'test_tbl', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0060 seconds

1.3 Read all data


hbase(main):009:0> scan 'test_tbl'
ROW                                            COLUMN+CELL
 row1                                          column=cf:a, timestamp=1480993562401, value=value1
 row2                                          column=cf:b, timestamp=1480993575088, value=value2
 row3                                          column=cf:c, timestamp=1480993587152, value=value3
3 row(s) in 0.0610 seconds

1.4 get one line of data


hbase(main):010:0> get 'test_tbl', 'row1'
COLUMN                                         CELL
 cf:a                                          timestamp=1480993562401, value=value1
1 row(s) in 0.0090 seconds

1.5 Delete table


hbase(main):011:0> disable 'test_tbl'
0 row(s) in 2.3550 seconds

hbase(main):012:0> drop 'test_tbl'
0 row(s) in 1.4980 seconds

1.6 Exit HBase shell


hbase(main):013:0> exit

2. Enable LZO compression for Hbase

UHadoop cluster’s HBase has been configured to support LZO compression by default, just need to modify the COMPRESSION property of the table.

2.1 Create LZO table


hbase(main):001:0> create 'test-lzo', {NAME=>'cf', COMPRESSION=>'lzo'}
0 row(s) in 4.5140 seconds
=> Hbase::Table - test-lzo

hbase(main):002:0> desc 'test-lzo'
Table test-lzo is ENABLED
test-lzo
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZO', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELE
TED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.1470 seconds

2.2 Add COMPRESSION=LZO property to an existing table

Suppose you have created the test-nolzo table, if not, you can create it through the following statement


create 'test-nolzo', {NAME=>'cf'}

To add the COMPRESSION property, you need to first disable the table, then modify the COMPRESSION property to LZO, and finally enable the table.


hbase(main):002:0> disable 'test-nolzo'
0 row(s) in 2.4030 seconds

hbase(main):003:0> alter 'test-nolzo',NAME => 'cf', COMPRESSION => 'LZO'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9730 seconds

hbase(main):004:0> enable 'test-nolzo'
0 row(s) in 1.2140 seconds

Modifying the COMPRESSION will not compress the existing data, you can execute the following command to enforce the compression


hbase(main):005:0> major_compact 'test-nolzo'
0 row(s) in 0.5060 seconds

Note:

For tables with large amount of data, this operation may take a long time.

3. Using Hive to read HBase data

Please refer to [Hive Application Development](/docs/uhadoop/developer/hivedev#hive external table reading hbase data)

4. HBase Application Development

4.1 Using JAVA to read HBase (Implement to Create table, Insert data, Display data operations)

This example requires you to first log in to the UHadoop cluster master1 node, the following operations are performed on master1 node by default

4.1.1 Build JAVA code


mkdir  -p /data/hbase-example
cd /data/hbase-example
touch HbaseJob.java

The code of HbaseJob.java is as follows


import java.util.ArrayList;
import java.util.List;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
 
public class HbaseJob {
    static Configuration conf=null;
    static{
        conf=HBaseConfiguration.create();//hbase's configuration information
    }
    public static void main(String[] args)throws Exception {
        HbaseJob t=new HbaseJob();
        t.createTable("person", new String[]{"name","age"});
        t.insertRow("person", "1", "age", "hehe", "100");
        t.insertRow("person", "2", "age", "haha", "101");
        t.showAll("person");
    }
 
    /***
     * create a table
     * and specify the column families
     * */
    public void createTable(String tableName, String cols[])throws Exception{
        HBaseAdmin admin=new HBaseAdmin(conf);//client management tool class
        if(admin.tableExists(tableName)){
            System.out.println("This table already exists.......");
        }else{
            HTableDescriptor table=new HTableDescriptor(tableName);
            for(String c:cols){
                HColumnDescriptor col=new HColumnDescriptor(c);//column family name
                table.addFamily(col);//add to this table
            }
            admin.createTable(table);//create a table
            admin.close();
            System.out.println("Creating table succeeded!");
        }
    }
 
    public  void insertRow(String tableName, String row,
            String columnFamily, String column, String value) throws Exception {
        HTable table = new HTable(conf, tableName);
        Put put = new Put(Bytes.toBytes(row));
        put.add(Bytes.toBytes(columnFamily), Bytes.toBytes(column),
                Bytes.toBytes(value));
        table.put(put);
        table.close();//terminal
        System.out.println("Inserting a piece of data succeeded!");
    }
 
    public void showAll(String tableName)throws Exception{
        HTable h=new HTable(conf, tableName);
        Scan scan=new Scan();
        ResultScanner scanner=h.getScanner(scan);
        for(Result r:scanner){
            System.out.println("====");
            for(KeyValue k:r.raw()){
                System.out.println("Row number:  "+Bytes.toStringBinary(k.getRow()));
                System.out.println("Timestamp:  "+k.getTimestamp());
                System.out.println("Column Family:  "+Bytes.toStringBinary(k.getFamily()));
                System.out.println("Column:  "+Bytes.toStringBinary(k.getQualifier()));
                String ss=  Bytes.toString(k.getValue());
                System.out.println("Value:  "+ss);
            }
        }
        h.close();
    }
}

4.1.2 Build Compilation Program

Create a compilation directory and file


cd /data/hbase-example
touch hbase-test.sh

The code of hbase-test.sh is as follows：


#!/bin/bash
HBASE_HOME=/home/hadoop/hbase
CLASSPATH=.:$HBASE_HOME/conf/hbase-site.xml

for i in ${HBASE_HOME}/lib/*.jar ;
do
      CLASSPATH=$CLASSPATH:$i
done
#compile the program
javac -cp $CLASSPATH HbaseJob.java
#execute the program
java -cp $CLASSPATH HbaseJob

4.1.3 Execute HBase Program

Execute it under the compilation program directory


sh hbase-test.sh

The execution result is as follows：


Creating table succeeded!
Inserting a piece of data succeeded!
Inserting a piece of data succeeded!
====
Row number:  1
Timestamp:  1480991139173
Column Family:  age
Column:  hehe
Value:  100
====
Row number:  2
Timestamp:  1480991139240
Column Family:  age
Column:  haha
Value:  101

5. HBase Daily Operation and Maintenance Operations

The following operations need to be performed under the UHadoop cluster master node as a hadoop user, otherwise a permission insufficiency prompt will appear

Check hbase region status information query


hbase hbck

Fix inconsistent region -


hbase hbck -repair

Fix hbase holes -


hbase hbck -fixHdfsHoles

Fix meta information (According to the meta table, assign the region on the table to the regionserver)


hbase hbck -fixMeta

Re-fix meta table (According to the regioninfo file on hdfs, generate meta table)


hbase hbck -fixAssignments

Start region auto balance

Need to be activated in hbase shell:


[root@uhadoop-******-master1 ~]# hbase shell
2016-12-06 11:00:08,624 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0-cdh5.8.0, rUnknown, Tue Jul 12 16:11:18 PDT 2016
hbase(main):001:0> balance_switch true