Skip to Content
Developer GuideTable Engine

Table Engines

Table engines (i.e., table types) determine:

  • How and where data is stored, where data is written to and read from
  • What queries are supported and how they are supported
  • Concurrent data access
  • Use of indexes (if present)
  • Whether multi-threaded requests can be executed
  • Data replication parameters

The cloud data warehouse UClickHouse supports table engines grouped into four major categories: MergeTree, Log, Integrations, and Special.

See the table below:

Series Description Table Engine Features
MergeTree MergeTree series engines are suitable for tasks under heavy load, supporting fast writes of large volumes of data and subsequent data processing. They are highly versatile and powerful. The common characteristics of this series include support for data replication, partitioning, and data sampling. MergeTree Used for inserting massive amounts of data into a table, data is quickly written in data fragments one after the other, and these fragments are merged according to certain rules.
ReplacingMergeTree Used to solve the issue of duplicate primary keys in MergeTree engines, it can delete duplicates with the same primary key value.
CollapsingMergeTree Adds a marker column Sign in the table creation statement to address limitations of ReplacingMergeTree.
  • In distributed scenarios, data with the same primary key may be distributed across different nodes and deduplication may not be possible across various shards.
  • Before a complete optimize operation, primary key deduplication might not be effective; for instance, some data may be deduplicated while another portion still has duplicate primary keys.
  • Optimize is a background task; its exact execution time cannot be predicted.
  • Manually executing optimize in scenarios with massive data requires a significant amount of time, failing to meet instant query demands of businesses.
VersionedCollapsingMergeTree Adds a Version column in the table creation statement to address out-of-order writes, preventing normal collapsing (deletion) issues.
SummingMergeTree Used for pre-aggregating primary key columns, merging all rows with the same primary key into one, significantly reducing storage space and improving aggregation performance.
AggregatingMergeTree A type of pre-aggregating engine meant to enhance aggregation performance, allowing various aggregation functions to be specified.
GraphiteMergeTree Used for storing and summarizing Graphite data, it can reduce storage space and improve the query efficiency of Graphite data.
Log Log series engines are suitable for scenarios where data in small tables (around 1 million rows) is quickly written and entirely read later. Common characteristics of these engines include:
  • Data is appended to the disk.
  • Does not support delete or update.
  • Does not support indexes.
  • Does not support atomic writes.
  • insert will block select operations.
TinyLog Does not support concurrent read of data files, has a simple format, poor query performance, and is suitable for temporarily storing intermediate data.
StripeLog Supports concurrent reading of data files, stores all columns in one large file thereby reducing file numbers, and offers better query performance compared to TinyLog.
Log Supports concurrent reading of data files, each column is stored in a separate file, offering better query performance than TinyLog.
Integrations Integration series engines are suitable for importing external data into the UClickHouse cloud data warehouse, or directly using external data sources within UClickHouse. Kafka Directly imports data from Kafka Topics into the UClickHouse cloud data warehouse.
MySQL Uses MySQL as the storage engine, allowing select and other operations on MySQL tables directly within UClickHouse.
JDBC Reads data sources through a specified JDBC connection string.
ODBC Reads data sources through a specified ODBC connection string.
HDFS Directly reads data files in specific formats from HDFS.
Special Special series engines are suitable for specific functional scenarios. Distributed Does not store data itself but supports distributed querying on multiple servers.
MaterializedView Used for creating materialized views.
Dictionary Displays dictionary data as a UClickHouse table.
Merge Does not store data itself but can read data from any number of other tables simultaneously.
File Directly uses local files as data storage.
NULL Data written is discarded, read data is empty.
Set Data is always stored in RAM.
Join Data is always stored in memory.
URL Used for managing data on remote HTTP/HTTPS servers.
View Does not store data itself, only stores the specified SELECT query.
Memory Data is stored in memory, causing data loss upon restart. It offers excellent query performance and is suitable for small tables (less than 100 million), requiring no data persistence. In UClickHouse, it is commonly used for temporary tables.
Buffer Sets a memory buffer for the target table, which is written to disk once it reaches certain conditions.

For more detailed information on table engines, please refer to the official documentation Table Engine Documentation.

MergeTree

The MergeTree table engine is primarily used for massive data analysis, supporting data partitioning, ordered storage, primary key indexing, sparse indexing, and data TTL, among others. The MergeTree table engine supports all SQL syntax in the UClickHouse cloud data warehouse; however, some features may differ from standard SQL. In the MergeTree table engine, its primary function is to accelerate queries, and even after Compaction, rows with the same primary key remain co-existing.

This type of engine includes:

  • MergeTree
  • ReplacingMergeTree
  • SummingMergeTree
  • AggregatingMergeTree
  • CollapsingMergeTree
  • VersionedCollapsingMergeTree
  • GraphiteMergeTree

For more detailed information on the table engine, refer to the official documentation MergeTree.

Example:

  1. Create a table lineorder, specify MergeTree, partition by LO_ORDERDATE, and sort by LO_ORDERDATE, LO_ORDERKEY.

    CREATE TABLE lineorder_local on cluster ck_cluster ( LO_ORDERKEY UInt32, LO_LINENUMBER UInt8, LO_CUSTKEY UInt32, LO_PARTKEY UInt32, LO_SUPPKEY UInt32, LO_ORDERDATE Date, LO_ORDERPRIORITY LowCardinality(String), LO_SHIPPRIORITY UInt8, LO_QUANTITY UInt8, LO_EXTENDEDPRICE UInt32, LO_ORDTOTALPRICE UInt32, LO_DISCOUNT UInt8, LO_REVENUE UInt32, LO_SUPPLYCOST UInt32, LO_TAX UInt8, LO_COMMITDATE Date, LO_SHIPMODE LowCardinality(String) ) ENGINE = MergeTree() PRIMARY KEY (LO_ORDERDATE, LO_ORDERKEY) PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);
  2. Write data with duplicate primary keys.

    Refer to Quick Start -> Prepare Data and Import

    Prepare the data and save it as a lineorder.csv file:

    1,1,7381,155190,828,"1996-01-02","5-LOW",0,17,2116823,17366547,4,2032150,74711,2,"1996-02-12","TRUCK", 1,2,7381,67310,163,"1996-01-02","5-LOW",0,36,4598316,17366547,9,4184467,76638,6,"1996-02-28","MAIL", 1,3,7381,63700,71,"1996-01-02","5-LOW",0,8,1330960,17366547,10,1197864,99822,2,"1996-03-05","REG AIR", 1,4,7381,2132,943,"1996-01-02","5-LOW",0,28,2895564,17366547,9,2634963,62047,6,"1996-03-30","AIR", 1,5,7381,24027,1625,"1996-01-02","5-LOW",0,24,2282448,17366547,10,2054203,57061,4,"1996-03-14","FOB", 1,6,7381,15635,1368,"1996-01-02","5-LOW",0,32,4962016,17366547,7,4614674,93037,2,"1996-02-07","MAIL", 2,1,15601,106170,1066,"1996-12-01","1-URGENT",0,38,4469446,4692918,0,4469446,70570,5,"1997-01-14","RAIL", 3,1,24664,4297,1959,"1993-10-14","5-LOW",0,45,5405805,19384625,6,5081456,72077,0,"1994-01-04","AIR", 3,2,24664,19036,1667,"1993-10-14","5-LOW",0,49,4679647,19384625,10,4211682,57301,0,"1993-12-20","RAIL", 3,3,24664,128449,1409,"1993-10-14","5-LOW",0,27,3989088,19384625,6,3749742,88646,7,"1993-11-22","SHIP",

    Import data:

    clickhouse-client --host=<node IP> --port=9000 --user=admin --password=<password set at cluster creation> --database=<database name> --query="INSERT INTO lineorder_local FORMAT CSV" < /data/lineorder.csv
  3. Query data.

    select * from lineorder_local;

    The query results are as follows:

  4. Since the MergeTree series table engine uses a structure similar to LSM Tree, many storage layer processing logics will only happen during Compaction, so you need to execute the optimize statement to force background Compaction.

    optimize table lineorder_local final;
  5. Query the data again.

    select * from lineorder_local;

    The query results are as follows, and the data with duplicate primary keys still exists.

ReplacingMergeTree

To solve the issue of duplicate primary keys in the MergeTree table engine, UClickHouse provides the ReplacingMergeTree table engine, which deletes duplicate entries with the same primary key value.

Although the ReplacingMergeTree table engine provides the ability to deduplicate primary keys, there are still many limitations, so the ReplacingMergeTree table engine is more often used to ensure that data is eventually deduplicated and cannot guarantee that the primary key will not be duplicated during queries. The main limitations are as follows.

  • In distributed scenarios, data with the same primary key may be distributed to different nodes and deduplication may not be possible across different shards.
  • Before a complete optimize operation, primary key deduplication might not be effective, for example, some data may be deduplicated but another part may still have duplicate primary keys.
  • Optimize is a background task, its exact execution time cannot be predicted.
  • Manually executing optimize in massive data scenarios requires a significant amount of time, failing to meet instant business query needs.

For more information on the ReplacingMergeTree table engine, please refer to the official documentation ReplacingMergeTree

Example:

  1. Create table lineorder_local_replacing.

    CREATE TABLE lineorder_local_replacing on cluster ck_cluster ( LO_ORDERKEY UInt32, LO_LINENUMBER UInt8, LO_CUSTKEY UInt32, LO_PARTKEY UInt32, LO_SUPPKEY UInt32, LO_ORDERDATE Date, LO_ORDERPRIORITY LowCardinality(String), LO_SHIPPRIORITY UInt8, LO_QUANTITY UInt8, LO_EXTENDEDPRICE UInt32, LO_ORDTOTALPRICE UInt32, LO_DISCOUNT UInt8, LO_REVENUE UInt32, LO_SUPPLYCOST UInt32, LO_TAX UInt8, LO_COMMITDATE Date, LO_SHIPMODE LowCardinality(String) ) ENGINE = ReplacingMergeTree() PRIMARY KEY (LO_ORDERDATE, LO_ORDERKEY) PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);
  2. Write data with duplicate primary keys.

    Refer to Quick Start -> Prepare Data and Import

    Prepare the data and save it as a lineorder.csv file:

    1,1,7381,155190,828,"1996-01-02","5-LOW",0,17,2116823,17366547,4,2032150,74711,2,"1996-02-12","TRUCK", 1,2,7381,67310,163,"1996-01-02","5-LOW",0,36,4598316,17366547,9,4184467,76638,6,"1996-02-28","MAIL", 1,3,7381,63700,71,"1996-01-02","5-LOW",0,8,1330960,17366547,10,1197864,99822,2,"1996-03-05","REG AIR", 1,4,7381,2132,943,"1996-01-02","5-LOW",0,28,2895564,17366547,9,2634963,62047,6,"1996-03-30","AIR", 1,5,7381,24027,1625,"1996-01-02","5-LOW",0,24,2282448,17366547,10,2054203,57061,4,"1996-03-14","FOB", 1,6,7381,15635,1368,"1996-01-02","5-LOW",0,32,4962016,17366547,7,4614674,93037,2,"1996-02-07","MAIL", 2,1,15601,106170,1066,"1996-12-01","1-URGENT",0,38,4469446,4692918,0,4469446,70570,5,"1997-01-14","RAIL", 3,1,24664,4297,1959,"1993-10-14","5-LOW",0,45,5405805,19384625,6,5081456,72077,0,"1994-01-04","AIR", 3,2,24664,19036,1667,"1993-10-14","5-LOW",0,49,4679647,19384625,10,4211682,57301,0,"1993-12-20","RAIL", 3,3,24664,128449,1409,"1993-10-14","5-LOW",0,27,3989088,19384625,6,3749742,88646,7,"1993-11-22","SHIP",

    Import data:

    clickhouse-client --host=<node IP> --port=9000 --user=admin --password=<password set at cluster creation> --database=<database name> --query="INSERT INTO lineorder_local_replacing FORMAT CSV" < /data/lineorder.csv
  3. Query data.

    select * from lineorder_local_replacing;

    The query results are as follows, finding the duplicate data has been removed:

CollapsingMergeTree

The CollapsingMergeTree table engine is used to eliminate the functional limitations of the ReplacingMergeTree table engine. This table engine requires specifying a marker column Sign in the table creation statement, dividing rows into two types based on the value of Sign: rows with Sign=1 are called state rows used for adding states, while rows with Sign=-1 are called cancel rows used for deleting states.

Note:

Even though the CollapsingMergeTree table engine solves the problem of immediate deletion of data with the same primary key, the position of state rows and cancel rows may get disordered in cases where states are constantly changing, and writes occur in multi-threaded parallel mode, leading to unsuccessful collapsing (deletion).

During background Compaction, rows with the same primary key but opposite Sign are collapsed (deleted), while data that have not undergone Compaction still coexist with both state and cancel rows. Therefore, to achieve the goal of primary key collapse (deletion), the business layer needs to perform the following operations.

  • Record the original value of the state row, or query the database to get the original state row value before executing the delete state operation.

    Specific reason: When executing the delete state operation, a cancel row needs to be inserted, and the cancel row must contain data identical to the original state row (except for the Sign column).

  • In aggregate calculations like count() and sum(col), duplicated data might occur. To get correct results, the business layer needs to rewrite the SQL, changing count() and sum(col) to sum(Sign) and sum(col * Sign) respectively.

    The specific reasons are as follows.

    • The Compaction timing is unpredictable. When querying, state rows and cancel rows may not have been collapsed (deleted).
    • The cloud database ClickHouse cannot ensure that rows with the same primary key reside on the same node, and data not on the same node cannot be collapsed (deleted).

For more information on the CollapsingMergeTree table engine, please refer to the official documentation CollapsingMergeTree.

Example:

  1. Create table lineorder_local_collapsing.

    CREATE TABLE lineorder_local_collapsing ( LO_ORDERKEY UInt64, LO_LINENUMBER UInt8, LO_CUSTKEY UInt8, Sign Int8 ) ENGINE = CollapsingMergeTree(Sign) ORDER BY LO_ORDERKEY;
  2. Insert state row Sign=1.

    INSERT INTO lineorder_local_collapsing VALUES (1, 5, 146, 1);

    Note: If you insert a cancel row first and then a state row, it may lead to disordered positions, making it impossible to collapse (delete) the primary key even after forcing background Compaction.

  3. Insert cancel row Sign=-1, with values identical to the inserted state row (except Sign). Meanwhile, insert a new state row with the same primary key.

    INSERT INTO lineorder_local_collapsing VALUES (1, 5, 146, -1), (2, 6, 185, 1);
  4. Query data.

    SELECT * FROM lineorder_local_collapsing;

    The query result is as follows:

  5. If you need to aggregate on a specific column, take sum(col) as an example. To get correct results, the SQL statement needs to be rewritten as follows.

    SELECT LO_ORDERKEY, sum(LO_LINENUMBER * Sign) AS LO_LINENUMBERS, sum(LO_CUSTKEY * Sign) AS LO_CUSTKEYS FROM lineorder_local_collapsing GROUP BY LO_ORDERKEY HAVING sum(Sign) > 0;

    After execution, the result is as follows:

  6. Since the MergeTree series table engine uses a structure similar to LSM Tree, many storage layer processing logics will only happen during Compaction, so you need to execute the optimize statement to force background Compaction.

    optimize table lineorder_local_collapsing final;
  7. Query the data again.

    SELECT * FROM lineorder_local_collapsing;

    The query result is as follows:

VersionedCollapsingMergeTree

To solve the problem of disordered writes in CollapsingMergeTree table engines causing unsuccessful collapsing (deletion), the ClickHouse cloud database provides the VersionedCollapsingMergeTree table engine, which adds a column Version in the table creation statement to record the relationship between state rows and cancel rows in disordered scenarios. During background Compaction, rows with the same primary key, same Version, and opposite Sign are collapsed (deleted). Similar to the CollapsingMergeTree table engine, when performing aggregate calculations like count() and sum(col), the business layer needs to rewrite the SQL, changing count() and sum(col) to sum(Sign) and sum(col * Sign) respectively.

Note:

For more information on the VersionedCollapsingMergeTree table engine, please refer to the official documentation VersionedCollapsingMergeTree.

Example:

  1. Create lineorder_local_versioned

    CREATE TABLE lineorder_local_versioned ( LO_ORDERKEY UInt64, LO_LINENUMBER UInt8, LO_CUSTKEY UInt8, Sign Int8, Version UInt8 ) ENGINE = VersionedCollapsingMergeTree(Sign, Version) ORDER BY LO_ORDERKEY;
  2. Insert cancel row Sign=-1.

    INSERT INTO lineorder_local_versioned VALUES (1, 3, 222, -1, 1);
  3. Insert state row Sign=1, Version=1, with other column values identical to the inserted cancel row. Meanwhile, insert a new state row with the same primary key.

    INSERT INTO lineorder_local_versioned VALUES (1, 3, 222, 1, 1),(2, 6, 345, 1, 2);
  4. Query data.

    SELECT * FROM lineorder_local_versioned;

    The query result is as follows:

  5. If you need to aggregate on a specific column, take sum(col) as an example. To get correct results, the SQL statement needs to be rewritten as follows.

    SELECT LO_ORDERKEY, sum(LO_LINENUMBER * Sign) AS LO_LINENUMBERS, sum(LO_CUSTKEY * Sign) AS LO_CUSTKEYS FROM lineorder_local_versioned GROUP BY LO_ORDERKEY HAVING sum(Sign) > 0;

    After execution, the query result is as follows:

  6. Since the MergeTree series table engine uses a structure similar to LSM Tree, many storage layer processing logics will only happen during Compaction, so you need to execute the optimize statement to force background Compaction.

    optimize table lineorder_local_versioned final;
  7. Query the data again.

    SELECT * FROM lineorder_local_versioned;

    The query result is as follows:

SummingMergeTree

The SummingMergeTree table engine is used for pre-aggregating primary key columns by merging all rows with the same primary key into a single row, significantly reducing storage space usage and improving aggregation computation performance.

When using the SummingMergeTree table engine, please note the following points:

  • UClickHouse only pre-aggregates primary key columns during background Compaction, and the timing of Compaction execution cannot be predicted, so there may be situations where some data has been pre-aggregated, and some data has not. Thus, GROUP BY clauses still need to be used when performing aggregation calculations.
  • During pre-aggregation, UClickHouse pre-aggregates all columns other than the primary key columns. If these columns are aggregatable (e.g., numeric types), sum is directly applied. If non-aggregatable (e.g., String types), a random value will be chosen.
  • It is usually recommended to use SummingMergeTree table engines in combination with MergeTree table engines. MergeTree table engines store complete data, and SummingMergeTree table engines store pre-aggregated results.

Note:

For more information on the SummingMergeTree table engine, please refer to the official documentation SummingMergeTree.

Example:

  1. Create table lineorder_local_summing.

    CREATE TABLE lineorder_local_summing ( key UInt32, value UInt32 ) ENGINE = SummingMergeTree() ORDER BY key;
  2. Write data.

    INSERT INTO lineorder_local_summing Values(1,1),(1,2),(2,1);
  3. Query data.

    select * from lineorder_local_summing;

    The query result is as follows:

AggregatingMergeTree

The AggregatingMergeTree table engine is also a type of pre-aggregating engine used to enhance the performance of aggregation calculations. The difference from SummingMergeTree is that SummingMergeTree is used to sum aggregate non-primary key columns, whereas AggregatingMergeTree allows specifying various aggregation functions. The syntax for AggregatingMergeTree is more complex and needs to be used in conjunction with materialized views or the special data type AggregateFunction in the ClickHouse cloud database.

Note:

For more information on the AggregatingMergeTree table engine, please see the official documentation AggregatingMergeTree.

Example:

  • In Conjunction with Materialized Views
  1. Create lineorder.

    CREATE TABLE lineorder ( LO_ORDERKEY UInt64, LO_LINENUMBER UInt8, LO_ORDERDATE Date, SIGN Int8 ) ENGINE = CollapsingMergeTree(SIGN) ORDER BY LO_ORDERKEY;
  2. Create a materialized view lineorder_view for lineorder, and use sumState and uniqState functions for pre-aggregating the detail table.

    CREATE MATERIALIZED VIEW lineorder_view ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(LO_ORDERDATE) ORDER BY (LO_LINENUMBER, LO_ORDERDATE) AS SELECT LO_LINENUMBER, LO_ORDERDATE, sumState(SIGN) AS SIGNS, uniqState(LO_ORDERKEY) AS ORDERKEYS FROM lineorder GROUP BY LO_LINENUMBER, LO_ORDERDATE;
  3. Write data to lineorder.

    INSERT INTO lineorder VALUES(0, 0, '2019-11-11', 1); INSERT INTO lineorder VALUES(1, 1, '2020-11-12', 1);
  4. Use the aggregation functions sumMerge and uniqMerge for aggregating the materialized view and query the aggregate data.

    SELECT LO_ORDERDATE, sumMerge(SIGNS) AS SIGNS, uniqMerge(ORDERKEYS) AS ORDERKEYS FROM lineorder_view GROUP BY LO_ORDERDATE ORDER BY LO_ORDERDATE

    Note: Functions sum and uniq cannot be used anymore; otherwise, the SQL error will be: Illegal type AggregateFunction(sum, Int8) of argument for aggregate function sum…

    The query results are as follows:

  • In Conjunction with the Special Data Type AggregateFunction
  1. Create lineorder.

    CREATE TABLE lineorder ( LO_ORDERKEY UInt8, LO_ORDERDATE Date, LO_LINENUMBER UInt64 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(LO_ORDERDATE) ORDER BY (LO_ORDERKEY, LO_ORDERDATE);
  2. Write data to the detail table lineorder.

    INSERT INTO lineorder VALUES(0, '2019-11-11', 1); INSERT INTO lineorder VALUES(1, '2020-11-12', 1);
  3. Create the aggregation table lineorder_agg, where the LO_LINENUMBER column’s type is AggregateFunction

    CREATE TABLE lineorder_agg ( LO_ORDERKEY UInt8, LO_ORDERDATE Date, LO_LINENUMBER AggregateFunction(uniq, UInt64) ) ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(LO_ORDERDATE) ORDER BY (LO_ORDERKEY, LO_ORDERDATE);
  4. Use the aggregation function uniqState to insert data from the detail table into the aggregation table.

    INSERT INTO lineorder_agg select LO_ORDERKEY, LO_ORDERDATE, uniqState(LO_LINENUMBER) from lineorder group by LO_ORDERKEY, LO_ORDERDATE;

    Note: You cannot use INSERT INTO lineorder_agg VALUES(1, '2019-11-12', 1); to insert data into the aggregation table, or the SQL error will be: Cannot convert UInt64 to AggregateFunction(uniq, UInt64)…

  5. Use the aggregation function uniqMerge for aggregating the aggregation table and querying the aggregate data.

    SELECT uniqMerge(LO_LINENUMBER) AS LINENUMBER FROM lineorder_agg GROUP BY LO_ORDERKEY, LO_ORDERDATE;

    The query results are as follows:

Log

Lightweight engines with minimal functionality. This type of engine is best when you need to quickly write many small tables (up to about 1 million rows) and read them later as a whole.

This type of engine includes:

  • TinyLog
  • StripeLog
  • Log

Integration Engines

These engines are for integrating with other data storage and processing systems.

This type of engine includes:

  • Kafka
  • MySQL
  • ODBC
  • JDBC
  • HDFS

Engines for Other Specific Functions

This type of engine includes:

  • Distributed
  • MaterializedView
  • Dictionary
  • Merge
  • File
  • Null
  • Set
  • Join
  • URL
  • View
  • Memory
  • Buffer

Virtual Columns

  • Virtual columns are a part of the table engine definition included in the corresponding table engine source code.
  • Virtual columns cannot be specified in CREATE TABLE and will not be included in the results of the SHOW CREATE TABLE and DESCRIBE TABLE queries. Virtual columns are read-only, so data cannot be written into them.
  • When querying data in virtual columns, the virtual column name must be included in the SELECT query. SELECT * will not return data from virtual columns.
  • If there is a column with the same name as a virtual column in the created table, the virtual column will no longer be accessible. To avoid such column name conflicts, virtual column names are typically prefixed with an underscore.