Product Architecture
1. HDFS
By default, HDFS is deployed in an HA(High Availability) mode, with 2 NameNodes deployed on master1 and master2 respectively. DataNodes are distributed on all Core nodes, while Task nodes do not deploy DataNode.
2. Yarn
Yarn is also deployed in HA mode by default, with 2 ResourceManagers deployed on master1 and master2 respectively. NodeManagers are distributed on all Core and Task nodes.
3. Hive
Hive currently only supports the ‘On yarn’ mode. Two Hive-MetaStores are deployed on master1 and master2 respectively. The metadata database supports cloud databases or local MySQL, which avoids the failure of Hive services caused by the downtime of a single master node. Hive services can be connected through HiveCli or Beeline.
4. HBase
By default, HBase uses HA mode for deployment, with 2 HMasters deployed on master1 and master2 respectively. HRegionServers are distributed on all Core nodes.
5. Spark
Spark uses the ‘On Yarn’ mode. For more details, please refer to the Spark Development Guide.