Fault Recovery
Recovery from yellow state
The yellow
state indicates the presence of unassigned replica shards.
Query index status
curl -s -XGET 'http://<host>:9200/_cat/indices?v'
curl -s -XGET 'http://<host>:9200/_cluster/health?level=indices'
Query unassigned shards
curl -s -XGET 'http://<host>:9200/_cat/shards?v' | grep UNASSIGNED
curl -s -XGET 'http://<host>:9200/_cluster/health?level=shards'
* Unreasonable index replica setting
If the number of indexed replicas is set to be greater than the number of data nodes, leading to the cluster being in a yellow
state, adjust the number of replicas to rectify the cluster status.
curl -XPUT \
http://<host>:9200/unassigned_index/_settings \
-H 'Content-Type: application/json' \
-d '{
"index": {
"number_of_replicas": replicasCount
}
}'
# unassigned_index is the index of the unassigned shard
# replicasCount is the new number of index replicas
Under normal circumstances, unassigned replica shards will be automatically assigned and the cluster status will recover to green
. Under special circumstances, it might be necessary to manually assign unassigned replica shards.
curl -XPOST \
http://<host>:9200/_cluster/reroute \
-H 'Content-Type: application/json' \
-d '{
"commands": [{
"allocate_replica": {
"index": "unassigned_index",
"shard": num,
"node": "nodeName"
}
}]
}'
# unassigned_index is the index of the unassigned shard
# num is the sequence number of the unassigned shard
# nodeName is the node name, or can be the node ID, such as kVWViI1PQt2Bk2rP7PlrbQ
The cluster will attempt to allocate a maximum of index.allocation.max_retries time slices in a row (default is 5) before giving up and leaving the shard. This situation might be caused by structural problems, such as a analyzer referring to a stop word file that does not exist on any node. Once this problem has been resolved, manual retry allocation can be done by calling retry_failed on the Reroute API, which will attempt to retry these shards once.
POST /_cluster/reroute?retry_failed=true
Worse cluster situations may result in unassigned primary shards. For manual allocation, refer to Reroute.
* Node disk usage exceeds threshold
For the impact of node disk usage on shard allocation, refer to Disk-based Shard Allocation.
If high disk usage is causing the cluster to have unassigned shards, consider modifying the disk usage policy for temporary relief, or increase the number of nodes.
Additionally, if it is certain that some historical index data can be permanently deactivated, the cluster status can be restored by deleting such indices.