Cluster Usage

After creating the cluster, how do I use it?

- If you have purchased a UHost with a public IP in the same availability zone, you can log in to the UHost and then connect to any node’s internal IP in the cluster via ssh.

For more details, please see: My host only has an internal IP, how can I access this machine?

- Bind the master node of the cluster to the external network IP, and connect to this external network IP via ssh. HDFS, Hive, HBase, Spark commands can be used directly on the cluster node.

What do I need to do to view the job logs if the job I submitted fails?

- Access the details of the job through the external network IP bound to master1/master2 at port 23188 (you need to open port 23188 on the external network firewall that is bound)

- View job details through the Hue page

- By default, job logs are stored in /var/log/hadoop-yarn/apps/[submituser]/logs on hdfs (submituser is the username that submitted the job), you can view the hdfs file via the web, or download the Job log to your local machine for viewing via the hdfs command

When submitting a job from the client, a `UnkownHostException` prompt appears, how do I deal with this?

Cluster nodes communicate by default using the Hostname method. You need to copy the host of the uhadoop node in the /etc/hosts of master1 to the /etc/hosts of other nodes.

I received a disk alarm notification that the disk usage rate is greater than 95%, how do I handle this?

First, confirm whether it is a system disk or data disk alarm. If it is a system disk alarm, execute du --max-depth=1 -h in the / directory, if a data disk alarm, execute in the /data directory, you can examine which file occupies space layer by layer, and clear it.

My HDFS is full, I received an HDFS usage warning notice, the usage rate of HDFS is greater than 80%, what should I do?

First, log in to the master1 node, switch to the hadoop user, and execute hadoop fs -du -h / to see which directory occupies more space.

If it’s the /var directory, you can continue with executing hadoop fs -du -h /var/log to see; If it is /var/log/hadoop-yarn/, you can modify these two parameters according to your own requirements, and you need to restart the cluster


  yarn.log-aggregation.retain-seconds    (Retention period, recommended value: 2592000, i.e., 30 days)
  yarn.log-aggregation.retain-check-interval-seconds (Check cycle: 86400, i.e., 1 day)

If it’s the /var/log/spark directory, generally it’s the accumulation of spark task logs, which are not yet able to be cleared automatically. This can be solved by the following way:

Create a clear-spark-logs.sh file under /etc/cron.daily/ and set permissions to chmod 777 /etc/cron.daily/clear-spark-logs.sh

For the first time, you can manually clean it by executing sh /etc/cron.daily/clear-spark-logs.sh, and it will automate it every day thereafter


#!/bin/bash
# Delete expired data (default to 30 days)

days=30
old_file_list=$(hadoop fs -ls /var/log/spark/ | awk 'BEGIN{ days_ago=strftime("%F", systime()-"'$days'"*24*3600) }{ split($8,arr,"/"); if(arr[7]<days_ago){printf "%s\n", $8} }')
arr=(${old_file_list// / })
for file in ${arr[@]}
do
    su -s /bin/bash hadoop -c "hadoop fs -rm -r $file"
done

I received a node memory alarm notification, the memory usage rate is greater than 95%, how should I deal with this?

First, confirm whether it is a Master node or Core node warning.

1. If it is a Master node, you can log into that node and find the process that uses more memory through the ps command. If you confirm that the process is started by your own business and is unnecessary, you can directly kill it. If it is a process related to the hadoop cluster, it is recommended to upgrade the Master node to solve it thoroughly.

2. If it’s a Core node, if no service has been installed by yourself, DataNode, HBase-Regionserver, and Yarn generally use more memory. This can be resolved through the following steps:

— If HBase is not used, you can shut down unused services via the “Cluster Service Management” page under UHadoop in the management console;

— If it hasn’t recovered after shutting down, you can reduce the memory allocated to Yarn on the node by modifying the Hadoop parameter yarn.nodemanager.resource.memory-mb (this can be modified via the “Parameter Configuration” function under the Hadoop Tab in UHadoop in the management console);

— If more memory resources are required for business, appropriate Core/Task nodes can be added

— If a single node’s CPU, MEM is not enough, you can upgrade the node specifications;

If the configuration of a single node, such as CPU, MEM or disk, is not enough, how do I upgrade?

Upgrade Master: please provide (cluster ID, original Master machine type, Master machine type to be upgraded, available operation time) via ticket. We currently support non-standard upgrades, and both Masters must be upgraded at the same time. During the upgrade, the Master node needs to restart, which will stop the service for about 1 minute. Please ensure that the cluster is not in arrears and there is sufficient balance in your account to cover the upgrade cost.

If it is a Core node, by default, vertical upgrades of single nodes are not supported. If you need to expand resources, you can add nodes horizontally. If there is a bottleneck in memory or other resources of a single node, we can provide backend upgrades for you. The upgrade is completed through the replacement of low configuration nodes with high configuration nodes. During this process, a large amount of data needs to be migrated, which may affect the business of the cluster. This operation is not recommended for large data clusters.

I want to close a certain service on a certain machine, or to disable service boot startup, what should I do?

- Stop service:

Under normal circumstances, you can use the “service service name stop” command to stop the running service process. If the service cannot be stopped, you can forcefully stop the service process using the kill command when there are no tasks.

- Disable service auto-rerun:

You can modify the value of startmonitor in /etc/default/static_conf.json to 0.

- Stop service and disable startup at boot:

Perform the above actions and back up /etc/default/services. After backing up, delete the related service field content in the original file.