Docs
uhadoop
Operation Guide
Cluster Management
Cluster Creation

Cluster Creation

You can click the “Create Cluster” button on the list page to go to the creation page to create a cluster:

Basic Configuration

  • Cluster Framework: Different cluster frameworks can be selected according to different application scenarios.

    • Hadoop Framework: Both HDFS and YARN are deployed in the cluster at the same time, suitable for storage and calculation in the same cluster.

    • HDFS Framework: Only HDFS is deployed in the cluster. It is used for storage clusters, with dedicated HDFS node types.

    • Calculation Framework: HDFS is not deployed and only deploys YARN.

    HDFS and Calculation Framework are suitable for storage calculating separated architecture. HDFS clusters can serve as storage clusters for several independent calculation clusters. The relationship between the computing cluster and storage cluster (Hadoop framework, HDFS framework) is many-to-one. You can see the connected clusters on the cluster details page.

  • Distribution: The naming convention for distributions: uhadoop [ version number ]. Each distribution has a number of big data ecosystem software, such as HBase, Spark, Hive, etc.

  • Framework Version: The version of Hadoop in the cluster, different distributions have different framework versions.

  • Cluster Types: Different types represent the cluster will install different cluster software. Software not selected here can be added later through cluster management after cluster creation.

  • Independent Metadata Management: If independent metadata management is chosen, UHadoop will store Hive metadata in a separate UDB MySQL database. Multiple UHadoop clusters in the same project can access the same meta-database, see Metadata Management for details.

Node Configuration

The cluster includes Master, Core, Task three types of nodes, the specific explanation is as follows:

  • Master Node: The management node, responsible for coordinating the whole cluster service. There are only two management nodes in a cluster, one primary and one backup, ensuring high availability. In addition to the management end of the base services (such as Hadoop, Hive, HBase) deployed on the Master, some plugins (such as Hue, Oozie, Sqoop2, Airflow) will also be installed on the Master node.

  • Core Node: Core node, used for storing data (HDFS) and running tasks. Since core nodes are used for data storage, their quantity must be greater or equal to 3 (the default cluster file replica configuration is 3), you can add more core nodes based on your business needs. The Core1 node doubles up as the management node, deploying services like Zookeeper, and requires careful operation.

  • Task Node: Task node used for executing tasks. Task nodes do not store data, and you can dynamically add and delete them during the cluster runtime.

Depending on the user’s actual scenario, configure the nodes as needed. The nodes support dynamic scaling, vertical scaling, storage expansion, and the cluster specification can also be dynamically adjusted during the cluster usage process.