Star Schema Benchmark
Download and Build Tools
[root@xxxxx test]# git clone https://github.com/electrum/ssb-dbgen.git
[root@xxxxx test]# cd ssb-dbgen
[root@xxxxx ssb-dbgen]# make
Generate Data
Generate 600 Million Records
[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T c
[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T l
[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T p
[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T s
[root@xxxxx ssb-dbgen]# ./dbgen -s 100 -T d
Table Name | Number of Rows | Size | Description |
---|---|---|---|
lineorder | 600 Million (600037902) | 67.1G | Product Order Table |
customer | 3 Million (3000000) | 317M | Customer Table |
part | 1.4 Million (1400000) | 135M | Parts Table |
supplier | 200 Thousand (200000) | 19M | Supplier Table |
date | 2556 | 272K | Date Table |
lineorder_flat | 600 Million (600037902) | 228G | Flattened Wide Table (Single Table) |
Generate 3 Billion Records
[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T c
[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T l
[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T p
[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T s
[root@xxxxx ssb-dbgen]# ./dbgen -s 500 -T d
Table Name | Number of Rows | Size | Description |
---|---|---|---|
lineorder | 3 Billion (3000028242) | 347G | Product Order Table |
customer | 15 Million (15000000) | 1.6G | Customer Table |
part | 1.8 Million (1800000) | 173M | Parts Table |
supplier | 1 Million (1000000) | 94M | Supplier Table |
date | 2556 | 272K | Date Table |
lineorder_flat | 3 Billion (3000346799) | 1.2T | Flattened Wide Table (Single Table) |
Generate 6 Billion Records
[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T c
[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T l
[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T p
[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T s
[root@xxxxx ssb-dbgen]# ./dbgen -s 1000 -T d
Table Name | Number of Rows | Size | Description |
---|---|---|---|
lineorder | 6 Billion (5999989709) | 688G | Product Order Table |
customer | 30 Million (30000000) | 3.2G | Customer Table |
part | 2 Million (2000000) | 193M | Parts Table |
supplier | 2 Million (2000000) | 188M | Supplier Table |
date | 2556 | 272K | Date Table |
lineorder_flat | 6 Billion (5999989709) | 2.3T | Flattened Wide Table (Single Table) |
Special Notes:
If a large amount of data is generated, the dbgen command supports file splitting by specifying the -C parameter, which indicates the number of threads. The higher the number of threads, the faster the data is generated. It is recommended to specify a higher number of cores when the data volume is large and machine conditions allow. For example, the following command generates 1.75 trillion-level data with the -C parameter specified for 32 threads. The above data examples can also have the -C parameter specified as needed, and the number specified by -C is the number of split files.
[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T c
[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T l
[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T p
[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T s
[root@xxxxx ssb-dbgen]# ./dbgen -C 32 -s 17500 -T d