Data Backup
UES’s data backup function uses the snapshot API of Elasticsearch (hereinafter referred to as ES), with the support of the elasticsearch-repository-ufile plugin (hereinafter referred to as the US3 plugin), to generate snapshots according to a predetermined backup plan, which are saved in US3 object storage, thus achieving effective backup of UES index data.
Glossary
Snapshot: A snapshot is a backup of an ES cluster at a snapshot generation time point during operation. Snapshots can be generated for individual indexes or all indexes. Snapshots are generated using an incremental mechanism, which means that only the data not present in previous snapshots is included in subsequent snapshots.
Repository: A repository is a storage space for storing snapshot files. A repository must be created before snapshot backup operations can be performed. Through the installation of plugins, ES can support the use of different types of storage systems as snapshot repositories.
Object Storage US3: Object Storage US3 is an unstructured file cloud storage service provided by UCloud Global. By installing the US3 plugin, UES uses US3 as a repository for storing snapshots.
Prerequisites
1. Create US3 Storage Space
Log in to the US3 console and create storage space to save UES snapshot data. Requirements: (1) The region of the storage space must match the region of the UES cluster you need to snapshot. (2) For the space type, please select “Private Space” to ensure that the data within the storage space can be accessed only with key authorization.
2. Create US3 Token
On the US3 token management page, create a token to authorize access to the above storage space, set a reasonable validity period, authorize access to the above storage space, and grant all files the upload, download, delete, and file list permissions.
3. Install US3 Plugin for the UES Cluster to be Backed Up
Log in to the UES console, go to the Plugin Management page of the target UES cluster, and install the US3 plugin. The specific method can be referred to the Plugin Management section of the documentation.
Procedures
Data backup function entry: Log in to the UES console, click the “Details” button of the target cluster in the cluster list, enter the cluster detailed page, and switch to the “Data Backup” tab.
1. Register Repository
In the “Repository Management” sub-page, click the “Register Repository” button. In the pop-up dialog box, follow the page prompts to complete the filling out of the relevant information, and click confirm. The system will create a repository for storing snapshots for the cluster.
Parameter | Description |
---|---|
US3 Storage space | Used to store the snapshot of the UES cluster.Note: After establishing the repository, the US3 storage space bound to the repository cannot be deleted. Otherwise, the generated snapshot file will be lost, and the new snapshot will not be generated. |
US3 Token | Used to authorize access to the above storage space, it must have upload, download, delete, and file list permissions for the storage space.Note: After establishing the repository, the US3 token bound to the repository cannot be deleted, and none of the essential permissions can be cancelled. It is also necessary to ensure that the US3 token has not expired, otherwise operations like snapshot generation, snapshot recovery fail. |
Repository Name | Used to identify a repository. Multiple repositories with the same name are not allowed within a UES cluster. |
Base Path | The path of the snapshot file in the storage space, which is ues_backup by default. The generated snapshot file will be stored in this directory in the US3 storage space. |
Read-Only | Controls the read and write permissions of the UES cluster for the repository.Note: When registering repositories pointing to the same path (US3 storage space + base path) in multiple UES clusters, only one cluster should have write permission for it, and the others should be set to read-only to ensure the integrity and consistency of the snapshot files. |
Snapshot Compression | Once compression is enabled, the index mapping and settings files in the snapshot will be compressed, but the data files will not be compressed. |
Chunk Size | Used to limit the size of a single file chunk during the snapshot process. |
Max Speed of Snapshot Generation | Used to limit the maximum speed of snapshot generation on each node. |
Max Speed of Snapshot Restoration | Used to limit the maximum speed of snapshot recovery on each node. |
Snapshot Auto-Cleanup | If snapshot auto-cleanup is enabled, the system will regularly delete the expired snapshot files in the repository according to the set time length. |
2. Set Backup Rule
In the “Backup Policy” sub-page, click the “New Rule” button. In the pop-up dialog box, follow the page prompts to complete the filling out of the relevant information, and click confirm. The system will create a rule for automatically generating snapshots for the cluster.
Parameter | Description |
---|---|
Rule Name | Used to identify a backup rule. |
Snapshot Name | Specifies the name of the snapshot generated by this backup rule. The system will add a unique identifier to the snapshot name to distinguish snapshots generated at different times. |
Repository | Choose a repository to store the snapshot. You can only choose repositories that were set to have write permission when registering the repository. |
Execution plan | Set up the frequency and time point for automatic backups. Currently, it supports automatic backups at a specified time every day. |
Backup Index | Default is to backup all indexes. You can also turn off this switch, manually enter the index name to backup specified index, separate multiple index names with English commas, and support index expressions. |
Ignore Unavailable Indexes | Used to control the handling of unavailable indexes during the snapshot backup. When this option is on, if an unavailable index is encountered, a snapshot not containing these indexes will continue to be generated. When this option is off, if an unavailable index is encountered, the snapshot backup will fail. |
Allow Partial Indexes | Used to control the handling of indexes with unavailable primary shards during the snapshot backup. When this option is on, if such a situation is encountered, it will allow the use of indexes with unavailable primary shards to generate snapshots. When this option is off, if such a situation is encountered, the snapshot backup will fail. |
Include Global State | Used to control whether to save the global state of the cluster in the snapshot. |
3. Execute Backup Rule
There are two methods of executing backup rules: automatic scheduled execution and manual execution.
(1) Automatic Scheduled Execution
After setting the backup rule, the system will automatically perform the snapshot backup operation on the set time according to the parameters configured in the rule.
(2) Manual Execution
In the “Backup Policy” sub-page, click the “Execute Now” button on the right side of a rule to run a snapshot backup operation immediately according to the parameters set in the rule.
4. Manage Generated Snapshots
In the “Snapshot Management” sub-page, you can view or delete the generated snapshots.
(1) View Snapshot Details
Click the “Details” button on the right side of a snapshot to view detailed information about this snapshot, including the version of the snapshot, backup status, included indexes, etc.
(2) Delete Snapshot
Click the “Delete” button on the right side of a snapshot to delete this snapshot from the repository. If the snapshot auto-cleanup function was enabled when registering the repository, the system will regularly delete the expired snapshot files in this repository according to the set time length.
Index Expressions
Index expressions are used to specify multiple indexes. Common types of expressions include:
(1) Multiple index names separated by English commas
For example: index_1,index_2,index_3 specifies 3 indexes.
(2) Index name with wildcard *
For example: index_* can represent all indexes starting with index_, such as index_1, index_2020.02.02, etc.
(3) When using the wildcard *, use - to exclude unnecessary indexes.
For example: index_*,-index_1 represents all indexes starting with index_, excluding index_1.
(4) Index name based on date math expression
<static_name{date_math_expr{date_format|time_zone}}>
In the above expression, the meanings of the fields are as follows:
static_name: the static part of the index name
date_math_expr: dynamic date expression
date_format: date format (optional), default is YYYY.MM.dd
time_zone: time zone (optional), default is UTC
Example (the angle brackets <> at both ends of the expression are essential):
Assume the current time is noon on February 20, 2020 (UTC).
Expression | Value of the Expression |
---|---|
<index_{now/d}> | index_2020.02.20 |
<index_{now-1d}> | index_2020.02.19 |
<index_{now/M}> | index_2020.02.01 |
<index_{now/M{YYYY.MM}}> | index_2020.02 |
Reference Information
For more information about creating snapshots, restoring data from snapshots, querying snapshot information, etc., please refer to the official Elasticsearch documentation: