Star Schema Benchmark

Download and Build Tools


[root@xxxxx test]# git clone https://github.com/electrum/ssb-dbgen.git
[root@xxxxx test]# cd ssb-dbgen
[root@xxxxx ssb-dbgen]# make

Generate Data

Generate 600 Million Records


[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T c
[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T l
[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T p
[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T s
[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T d

Table Name	Number of Rows	Size	Description
lineorder	600 Million (600037902)	67.1G	Product Order Table
customer	3 Million (3000000)	317M	Customer Table
part	1.4 Million (1400000)	135M	Parts Table
supplier	200 Thousand (200000)	19M	Supplier Table
date	2556	272K	Date Table
lineorder_flat	600 Million (600037902)	228G	Flattened Wide Table (Single Table)

Generate 3 Billion Records


[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T c
[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T l
[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T p
[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T s
[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T d

Table Name	Number of Rows	Size	Description
lineorder	3 Billion (3000028242)	347G	Product Order Table
customer	15 Million (15000000)	1.6G	Customer Table
part	1.8 Million (1800000)	173M	Parts Table
supplier	1 Million (1000000)	94M	Supplier Table
date	2556	272K	Date Table
lineorder_flat	3 Billion (3000346799)	1.2T	Flattened Wide Table (Single Table)

Generate 6 Billion Records


[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T c
[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T l
[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T p
[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T s
[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T d

Table Name	Number of Rows	Size	Description
lineorder	6 Billion (5999989709)	688G	Product Order Table
customer	30 Million (30000000)	3.2G	Customer Table
part	2 Million (2000000)	193M	Parts Table
supplier	2 Million (2000000)	188M	Supplier Table
date	2556	272K	Date Table
lineorder_flat	6 Billion (5999989709)	2.3T	Flattened Wide Table (Single Table)

Special Notes:

If a large amount of data is generated, the dbgen command supports file splitting by specifying the -C parameter, which indicates the number of threads. The higher the number of threads, the faster the data is generated. It is recommended to specify a higher number of cores when the data volume is large and machine conditions allow. For example, the following command generates 1.75 trillion-level data with the -C parameter specified for 32 threads. The above data examples can also have the -C parameter specified as needed, and the number specified by -C is the number of split files.


[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T c
[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T l
[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T p
[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T s
[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T d