Table Engines

Table engines (i.e., table types) determine:

How and where data is stored, where data is written to and read from
What queries are supported and how they are supported
Concurrent data access
Use of indexes (if present)
Whether multi-threaded requests can be executed
Data replication parameters

The cloud data warehouse UClickHouse supports table engines grouped into four major categories: MergeTree, Log, Integrations, and Special.

See the table below:

Series	Description	Table Engine	Features
MergeTree	MergeTree series engines are suitable for tasks under heavy load, supporting fast writes of large volumes of data and subsequent data processing. They are highly versatile and powerful. The common characteristics of this series include support for data replication, partitioning, and data sampling.	MergeTree	Used for inserting massive amounts of data into a table, data is quickly written in data fragments one after the other, and these fragments are merged according to certain rules.
		ReplacingMergeTree	Used to solve the issue of duplicate primary keys in MergeTree engines, it can delete duplicates with the same primary key value.
		CollapsingMergeTree	Adds a marker column `Sign` in the table creation statement to address limitations of ReplacingMergeTree. In distributed scenarios, data with the same primary key may be distributed across different nodes and deduplication may not be possible across various shards. Before a complete optimize operation, primary key deduplication might not be effective; for instance, some data may be deduplicated while another portion still has duplicate primary keys. Optimize is a background task; its exact execution time cannot be predicted. Manually executing optimize in scenarios with massive data requires a significant amount of time, failing to meet instant query demands of businesses.
		VersionedCollapsingMergeTree	Adds a `Version` column in the table creation statement to address out-of-order writes, preventing normal collapsing (deletion) issues.
		SummingMergeTree	Used for pre-aggregating primary key columns, merging all rows with the same primary key into one, significantly reducing storage space and improving aggregation performance.
		AggregatingMergeTree	A type of pre-aggregating engine meant to enhance aggregation performance, allowing various aggregation functions to be specified.
		GraphiteMergeTree	Used for storing and summarizing Graphite data, it can reduce storage space and improve the query efficiency of Graphite data.
Log	Log series engines are suitable for scenarios where data in small tables (around 1 million rows) is quickly written and entirely read later. Common characteristics of these engines include: Data is appended to the disk. Does not support `delete` or `update`. Does not support indexes. Does not support atomic writes. `insert` will block `select` operations.	TinyLog	Does not support concurrent read of data files, has a simple format, poor query performance, and is suitable for temporarily storing intermediate data.
		StripeLog	Supports concurrent reading of data files, stores all columns in one large file thereby reducing file numbers, and offers better query performance compared to TinyLog.
		Log	Supports concurrent reading of data files, each column is stored in a separate file, offering better query performance than TinyLog.
Integrations	Integration series engines are suitable for importing external data into the UClickHouse cloud data warehouse, or directly using external data sources within UClickHouse.	Kafka	Directly imports data from Kafka Topics into the UClickHouse cloud data warehouse.
		MySQL	Uses MySQL as the storage engine, allowing `select` and other operations on MySQL tables directly within UClickHouse.
		JDBC	Reads data sources through a specified JDBC connection string.
		ODBC	Reads data sources through a specified ODBC connection string.
		HDFS	Directly reads data files in specific formats from HDFS.
Special	Special series engines are suitable for specific functional scenarios.	Distributed	Does not store data itself but supports distributed querying on multiple servers.
		MaterializedView	Used for creating materialized views.
		Dictionary	Displays dictionary data as a UClickHouse table.
		Merge	Does not store data itself but can read data from any number of other tables simultaneously.
		File	Directly uses local files as data storage.
		NULL	Data written is discarded, read data is empty.
		Set	Data is always stored in RAM.
		Join	Data is always stored in memory.
		URL	Used for managing data on remote HTTP/HTTPS servers.
		View	Does not store data itself, only stores the specified `SELECT` query.
		Memory	Data is stored in memory, causing data loss upon restart. It offers excellent query performance and is suitable for small tables (less than 100 million), requiring no data persistence. In UClickHouse, it is commonly used for temporary tables.
		Buffer	Sets a memory buffer for the target table, which is written to disk once it reaches certain conditions.

For more detailed information on table engines, please refer to the official documentation Table Engine Documentation .

MergeTree

The MergeTree table engine is primarily used for massive data analysis, supporting data partitioning, ordered storage, primary key indexing, sparse indexing, and data TTL, among others. The MergeTree table engine supports all SQL syntax in the UClickHouse cloud data warehouse; however, some features may differ from standard SQL. In the MergeTree table engine, its primary function is to accelerate queries, and even after Compaction, rows with the same primary key remain co-existing.

This type of engine includes:

MergeTree
ReplacingMergeTree
SummingMergeTree
AggregatingMergeTree
CollapsingMergeTree
VersionedCollapsingMergeTree
GraphiteMergeTree

For more detailed information on the table engine, refer to the official documentation MergeTree .

Example:

Create a table lineorder, specify MergeTree, partition by LO_ORDERDATE, and sort by LO_ORDERDATE, LO_ORDERKEY.


CREATE TABLE lineorder_local on cluster ck_cluster
(
    LO_ORDERKEY             UInt32,
    LO_LINENUMBER           UInt8,
    LO_CUSTKEY              UInt32,
    LO_PARTKEY              UInt32,
    LO_SUPPKEY              UInt32,
    LO_ORDERDATE            Date,
    LO_ORDERPRIORITY        LowCardinality(String),
    LO_SHIPPRIORITY         UInt8,
    LO_QUANTITY             UInt8,
    LO_EXTENDEDPRICE        UInt32,
    LO_ORDTOTALPRICE        UInt32,
    LO_DISCOUNT             UInt8,
    LO_REVENUE              UInt32,
    LO_SUPPLYCOST           UInt32,
    LO_TAX                  UInt8,
    LO_COMMITDATE           Date,
    LO_SHIPMODE             LowCardinality(String)
)
ENGINE = MergeTree() 
PRIMARY KEY (LO_ORDERDATE, LO_ORDERKEY)
PARTITION BY toYear(LO_ORDERDATE) 
ORDER BY (LO_ORDERDATE, LO_ORDERKEY);

Write data with duplicate primary keys.

Refer to Quick Start -> Prepare Data and Import

Prepare the data and save it as a lineorder.csv file:


1,1,7381,155190,828,"1996-01-02","5-LOW",0,17,2116823,17366547,4,2032150,74711,2,"1996-02-12","TRUCK",
1,2,7381,67310,163,"1996-01-02","5-LOW",0,36,4598316,17366547,9,4184467,76638,6,"1996-02-28","MAIL",
1,3,7381,63700,71,"1996-01-02","5-LOW",0,8,1330960,17366547,10,1197864,99822,2,"1996-03-05","REG AIR",
1,4,7381,2132,943,"1996-01-02","5-LOW",0,28,2895564,17366547,9,2634963,62047,6,"1996-03-30","AIR",
1,5,7381,24027,1625,"1996-01-02","5-LOW",0,24,2282448,17366547,10,2054203,57061,4,"1996-03-14","FOB",
1,6,7381,15635,1368,"1996-01-02","5-LOW",0,32,4962016,17366547,7,4614674,93037,2,"1996-02-07","MAIL",
2,1,15601,106170,1066,"1996-12-01","1-URGENT",0,38,4469446,4692918,0,4469446,70570,5,"1997-01-14","RAIL",
3,1,24664,4297,1959,"1993-10-14","5-LOW",0,45,5405805,19384625,6,5081456,72077,0,"1994-01-04","AIR",
3,2,24664,19036,1667,"1993-10-14","5-LOW",0,49,4679647,19384625,10,4211682,57301,0,"1993-12-20","RAIL",
3,3,24664,128449,1409,"1993-10-14","5-LOW",0,27,3989088,19384625,6,3749742,88646,7,"1993-11-22","SHIP",

Import data:


clickhouse-client --host=<node IP> --port=9000 --user=admin --password=<password set at cluster creation> --database=<database name> --query="INSERT INTO lineorder_local FORMAT CSV" < /data/lineorder.csv

Query data.
```
select * from lineorder_local;
```
The query results are as follows:
Since the MergeTree series table engine uses a structure similar to LSM Tree, many storage layer processing logics will only happen during Compaction, so you need to execute the optimize statement to force background Compaction.
```
optimize table lineorder_local final;
```
Query the data again.
```
select * from lineorder_local;
```
The query results are as follows, and the data with duplicate primary keys still exists.

ReplacingMergeTree

To solve the issue of duplicate primary keys in the MergeTree table engine, UClickHouse provides the ReplacingMergeTree table engine, which deletes duplicate entries with the same primary key value.

Although the ReplacingMergeTree table engine provides the ability to deduplicate primary keys, there are still many limitations, so the ReplacingMergeTree table engine is more often used to ensure that data is eventually deduplicated and cannot guarantee that the primary key will not be duplicated during queries. The main limitations are as follows.

In distributed scenarios, data with the same primary key may be distributed to different nodes and deduplication may not be possible across different shards.
Before a complete optimize operation, primary key deduplication might not be effective, for example, some data may be deduplicated but another part may still have duplicate primary keys.
Optimize is a background task, its exact execution time cannot be predicted.
Manually executing optimize in massive data scenarios requires a significant amount of time, failing to meet instant business query needs.

For more information on the ReplacingMergeTree table engine, please refer to the official documentation ReplacingMergeTree

Example:

Create table lineorder_local_replacing.


CREATE TABLE lineorder_local_replacing on cluster ck_cluster
(
    LO_ORDERKEY             UInt32,
    LO_LINENUMBER           UInt8,
    LO_CUSTKEY              UInt32,
    LO_PARTKEY              UInt32,
    LO_SUPPKEY              UInt32,
    LO_ORDERDATE            Date,
    LO_ORDERPRIORITY        LowCardinality(String),
    LO_SHIPPRIORITY         UInt8,
    LO_QUANTITY             UInt8,
    LO_EXTENDEDPRICE        UInt32,
    LO_ORDTOTALPRICE        UInt32,
    LO_DISCOUNT             UInt8,
    LO_REVENUE              UInt32,
    LO_SUPPLYCOST           UInt32,
    LO_TAX                  UInt8,
    LO_COMMITDATE           Date,
    LO_SHIPMODE             LowCardinality(String)
)
ENGINE = ReplacingMergeTree() 
PRIMARY KEY (LO_ORDERDATE, LO_ORDERKEY)
PARTITION BY toYear(LO_ORDERDATE) 
ORDER BY (LO_ORDERDATE, LO_ORDERKEY);

Write data with duplicate primary keys.

Refer to Quick Start -> Prepare Data and Import

Prepare the data and save it as a lineorder.csv file:


1,1,7381,155190,828,"1996-01-02","5-LOW",0,17,2116823,17366547,4,2032150,74711,2,"1996-02-12","TRUCK",
1,2,7381,67310,163,"1996-01-02","5-LOW",0,36,4598316,17366547,9,4184467,76638,6,"1996-02-28","MAIL",
1,3,7381,63700,71,"1996-01-02","5-LOW",0,8,1330960,17366547,10,1197864,99822,2,"1996-03-05","REG AIR",
1,4,7381,2132,943,"1996-01-02","5-LOW",0,28,2895564,17366547,9,2634963,62047,6,"1996-03-30","AIR",
1,5,7381,24027,1625,"1996-01-02","5-LOW",0,24,2282448,17366547,10,2054203,57061,4,"1996-03-14","FOB",
1,6,7381,15635,1368,"1996-01-02","5-LOW",0,32,4962016,17366547,7,4614674,93037,2,"1996-02-07","MAIL",
2,1,15601,106170,1066,"1996-12-01","1-URGENT",0,38,4469446,4692918,0,4469446,70570,5,"1997-01-14","RAIL",
3,1,24664,4297,1959,"1993-10-14","5-LOW",0,45,5405805,19384625,6,5081456,72077,0,"1994-01-04","AIR",
3,2,24664,19036,1667,"1993-10-14","5-LOW",0,49,4679647,19384625,10,4211682,57301,0,"1993-12-20","RAIL",
3,3,24664,128449,1409,"1993-10-14","5-LOW",0,27,3989088,19384625,6,3749742,88646,7,"1993-11-22","SHIP",

Import data:


clickhouse-client --host=<node IP> --port=9000 --user=admin --password=<password set at cluster creation> --database=<database name> --query="INSERT INTO lineorder_local_replacing FORMAT CSV" < /data/lineorder.csv

Query data.
```
select * from lineorder_local_replacing;
```
The query results are as follows, finding the duplicate data has been removed:

CollapsingMergeTree

The CollapsingMergeTree table engine is used to eliminate the functional limitations of the ReplacingMergeTree table engine. This table engine requires specifying a marker column Sign in the table creation statement, dividing rows into two types based on the value of Sign: rows with Sign=1 are called state rows used for adding states, while rows with Sign=-1 are called cancel rows used for deleting states.

Note:

Even though the CollapsingMergeTree table engine solves the problem of immediate deletion of data with the same primary key, the position of state rows and cancel rows may get disordered in cases where states are constantly changing, and writes occur in multi-threaded parallel mode, leading to unsuccessful collapsing (deletion).

During background Compaction, rows with the same primary key but opposite Sign are collapsed (deleted), while data that have not undergone Compaction still coexist with both state and cancel rows. Therefore, to achieve the goal of primary key collapse (deletion), the business layer needs to perform the following operations.

Record the original value of the state row, or query the database to get the original state row value before executing the delete state operation.

Specific reason: When executing the delete state operation, a cancel row needs to be inserted, and the cancel row must contain data identical to the original state row (except for the Sign column).
In aggregate calculations like count() and sum(col), duplicated data might occur. To get correct results, the business layer needs to rewrite the SQL, changing count() and sum(col) to sum(Sign) and sum(col * Sign) respectively.

The specific reasons are as follows.
- The Compaction timing is unpredictable. When querying, state rows and cancel rows may not have been collapsed (deleted).
- The cloud database ClickHouse cannot ensure that rows with the same primary key reside on the same node, and data not on the same node cannot be collapsed (deleted).

For more information on the CollapsingMergeTree table engine, please refer to the official documentation CollapsingMergeTree .

Example:

Create table lineorder_local_collapsing.


CREATE TABLE lineorder_local_collapsing
(
    LO_ORDERKEY UInt64,
    LO_LINENUMBER UInt8,
    LO_CUSTKEY UInt8,
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY LO_ORDERKEY;

Insert state row Sign=1.
```
INSERT INTO lineorder_local_collapsing VALUES (1, 5, 146, 1);
```
Note: If you insert a cancel row first and then a state row, it may lead to disordered positions, making it impossible to collapse (delete) the primary key even after forcing background Compaction.
Insert cancel row Sign=-1, with values identical to the inserted state row (except Sign). Meanwhile, insert a new state row with the same primary key.
```
INSERT INTO lineorder_local_collapsing VALUES (1, 5, 146, -1), (2, 6, 185, 1);
```
Query data.
```
SELECT * FROM lineorder_local_collapsing;
```
The query result is as follows:

If you need to aggregate on a specific column, take sum(col) as an example. To get correct results, the SQL statement needs to be rewritten as follows.


SELECT
    LO_ORDERKEY,
    sum(LO_LINENUMBER * Sign) AS LO_LINENUMBERS,
    sum(LO_CUSTKEY * Sign) AS LO_CUSTKEYS
FROM lineorder_local_collapsing
GROUP BY LO_ORDERKEY
HAVING sum(Sign) > 0;

After execution, the result is as follows:

Since the MergeTree series table engine uses a structure similar to LSM Tree, many storage layer processing logics will only happen during Compaction, so you need to execute the optimize statement to force background Compaction.
```
optimize table lineorder_local_collapsing final;
```
Query the data again.
```
SELECT * FROM lineorder_local_collapsing;
```
The query result is as follows:

VersionedCollapsingMergeTree

To solve the problem of disordered writes in CollapsingMergeTree table engines causing unsuccessful collapsing (deletion), the ClickHouse cloud database provides the VersionedCollapsingMergeTree table engine, which adds a column Version in the table creation statement to record the relationship between state rows and cancel rows in disordered scenarios. During background Compaction, rows with the same primary key, same Version, and opposite Sign are collapsed (deleted). Similar to the CollapsingMergeTree table engine, when performing aggregate calculations like count() and sum(col), the business layer needs to rewrite the SQL, changing count() and sum(col) to sum(Sign) and sum(col * Sign) respectively.

Note:

For more information on the VersionedCollapsingMergeTree table engine, please refer to the official documentation VersionedCollapsingMergeTree .

Example:

Create lineorder_local_versioned


CREATE TABLE lineorder_local_versioned
(
    LO_ORDERKEY UInt64,
    LO_LINENUMBER UInt8,
    LO_CUSTKEY UInt8,
    Sign Int8,
    Version UInt8
)
ENGINE = VersionedCollapsingMergeTree(Sign, Version)
ORDER BY LO_ORDERKEY;

Insert cancel row Sign=-1.


INSERT INTO lineorder_local_versioned VALUES (1, 3, 222, -1, 1);

Insert state row Sign=1, Version=1, with other column values identical to the inserted cancel row. Meanwhile, insert a new state row with the same primary key.
```
INSERT INTO lineorder_local_versioned VALUES (1, 3, 222, 1, 1),(2, 6, 345, 1, 2);
```

Query data.


SELECT * FROM lineorder_local_versioned;

The query result is as follows:

If you need to aggregate on a specific column, take sum(col) as an example. To get correct results, the SQL statement needs to be rewritten as follows.


SELECT
    LO_ORDERKEY,
    sum(LO_LINENUMBER * Sign) AS LO_LINENUMBERS,
    sum(LO_CUSTKEY * Sign) AS LO_CUSTKEYS
FROM lineorder_local_versioned
GROUP BY LO_ORDERKEY
HAVING sum(Sign) > 0;

After execution, the query result is as follows:

Since the MergeTree series table engine uses a structure similar to LSM Tree, many storage layer processing logics will only happen during Compaction, so you need to execute the optimize statement to force background Compaction.
```
optimize table lineorder_local_versioned final;
```
Query the data again.
```
SELECT * FROM lineorder_local_versioned;
```
The query result is as follows:

SummingMergeTree

The SummingMergeTree table engine is used for pre-aggregating primary key columns by merging all rows with the same primary key into a single row, significantly reducing storage space usage and improving aggregation computation performance.

When using the SummingMergeTree table engine, please note the following points:

UClickHouse only pre-aggregates primary key columns during background Compaction, and the timing of Compaction execution cannot be predicted, so there may be situations where some data has been pre-aggregated, and some data has not. Thus, GROUP BY clauses still need to be used when performing aggregation calculations.
During pre-aggregation, UClickHouse pre-aggregates all columns other than the primary key columns. If these columns are aggregatable (e.g., numeric types), sum is directly applied. If non-aggregatable (e.g., String types), a random value will be chosen.
It is usually recommended to use SummingMergeTree table engines in combination with MergeTree table engines. MergeTree table engines store complete data, and SummingMergeTree table engines store pre-aggregated results.

Note:

For more information on the SummingMergeTree table engine, please refer to the official documentation SummingMergeTree .

Example:

Create table lineorder_local_summing.


CREATE TABLE lineorder_local_summing
(
    key UInt32,
    value UInt32
)
ENGINE = SummingMergeTree()
ORDER BY key;

Write data.


INSERT INTO lineorder_local_summing Values(1,1),(1,2),(2,1);

Query data.
```
select * from lineorder_local_summing;
```
The query result is as follows:

AggregatingMergeTree

The AggregatingMergeTree table engine is also a type of pre-aggregating engine used to enhance the performance of aggregation calculations. The difference from SummingMergeTree is that SummingMergeTree is used to sum aggregate non-primary key columns, whereas AggregatingMergeTree allows specifying various aggregation functions. The syntax for AggregatingMergeTree is more complex and needs to be used in conjunction with materialized views or the special data type AggregateFunction in the ClickHouse cloud database.

Note:

For more information on the AggregatingMergeTree table engine, please see the official documentation AggregatingMergeTree .

Example:

In Conjunction with Materialized Views

Create lineorder.


CREATE TABLE lineorder
(
    LO_ORDERKEY UInt64,
    LO_LINENUMBER UInt8,
    LO_ORDERDATE Date,
    SIGN Int8
)
ENGINE = CollapsingMergeTree(SIGN)
ORDER BY LO_ORDERKEY;

Create a materialized view lineorder_view for lineorder, and use sumState and uniqState functions for pre-aggregating the detail table.


CREATE MATERIALIZED VIEW lineorder_view
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(LO_ORDERDATE) ORDER BY (LO_LINENUMBER, LO_ORDERDATE)
AS SELECT
    LO_LINENUMBER,
    LO_ORDERDATE,
    sumState(SIGN)    AS SIGNS,
    uniqState(LO_ORDERKEY) AS ORDERKEYS
FROM lineorder
GROUP BY LO_LINENUMBER, LO_ORDERDATE;

Write data to lineorder.


INSERT INTO lineorder VALUES(0, 0, '2019-11-11', 1);
INSERT INTO lineorder VALUES(1, 1, '2020-11-12', 1);

Use the aggregation functions sumMerge and uniqMerge for aggregating the materialized view and query the aggregate data.
```
SELECT
    LO_ORDERDATE,
    sumMerge(SIGNS) AS SIGNS,
    uniqMerge(ORDERKEYS) AS ORDERKEYS
FROM lineorder_view
GROUP BY LO_ORDERDATE
ORDER BY LO_ORDERDATE
```
Note: Functions sum and uniq cannot be used anymore; otherwise, the SQL error will be: Illegal type AggregateFunction(sum, Int8) of argument for aggregate function sum…

The query results are as follows:

In Conjunction with the Special Data Type AggregateFunction

Create lineorder.


CREATE TABLE lineorder
(   LO_ORDERKEY UInt8,
    LO_ORDERDATE Date,
    LO_LINENUMBER UInt64
) ENGINE = MergeTree() 
PARTITION BY toYYYYMM(LO_ORDERDATE) 
ORDER BY (LO_ORDERKEY, LO_ORDERDATE);

Write data to the detail table lineorder.


INSERT INTO lineorder VALUES(0, '2019-11-11', 1);
INSERT INTO lineorder VALUES(1, '2020-11-12', 1);

Create the aggregation table lineorder_agg, where the LO_LINENUMBER column’s type is AggregateFunction


CREATE TABLE lineorder_agg
(   LO_ORDERKEY UInt8,
    LO_ORDERDATE Date,
    LO_LINENUMBER AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree() 
PARTITION BY toYYYYMM(LO_ORDERDATE) 
ORDER BY (LO_ORDERKEY, LO_ORDERDATE);

Use the aggregation function uniqState to insert data from the detail table into the aggregation table.
```
INSERT INTO lineorder_agg
select LO_ORDERKEY, LO_ORDERDATE, uniqState(LO_LINENUMBER)
from lineorder
group by LO_ORDERKEY, LO_ORDERDATE;
```
Note: You cannot use INSERT INTO lineorder_agg VALUES(1, '2019-11-12', 1); to insert data into the aggregation table, or the SQL error will be: Cannot convert UInt64 to AggregateFunction(uniq, UInt64)…
Use the aggregation function uniqMerge for aggregating the aggregation table and querying the aggregate data.
```
SELECT uniqMerge(LO_LINENUMBER) AS LINENUMBER 
FROM lineorder_agg 
GROUP BY LO_ORDERKEY, LO_ORDERDATE;
```
The query results are as follows:

Log

Lightweight engines with minimal functionality. This type of engine is best when you need to quickly write many small tables (up to about 1 million rows) and read them later as a whole.

This type of engine includes:

TinyLog
StripeLog
Log

Integration Engines

These engines are for integrating with other data storage and processing systems.

This type of engine includes:

Kafka
MySQL
ODBC
JDBC
HDFS

Engines for Other Specific Functions

This type of engine includes:

Distributed
MaterializedView
Dictionary
Merge
File
Null
Set
Join
URL
View
Memory
Buffer

Virtual Columns

Virtual columns are a part of the table engine definition included in the corresponding table engine source code.
Virtual columns cannot be specified in CREATE TABLE and will not be included in the results of the SHOW CREATE TABLE and DESCRIBE TABLE queries. Virtual columns are read-only, so data cannot be written into them.
When querying data in virtual columns, the virtual column name must be included in the SELECT query. SELECT * will not return data from virtual columns.
If there is a column with the same name as a virtual column in the created table, the virtual column will no longer be accessible. To avoid such column name conflicts, virtual column names are typically prefixed with an underscore.