UK8S - XXXCloud

What is Prometheus

About Prometheus

Prometheus is an open source system monitoring and alarming framework. Its design was inspired by Google’s borgmon monitoring system, which was created by SoundCloud in 2012. It evolved into a community-driven open source project and was officially released in 2015. In 2016, Prometheus officially joined the Cloud Native Computing Foundation (CNCF) and is now the second most popular project after Kubernetes. It is widely used in Kubernetes cluster monitoring systems and is swiftly becoming the standard solution for Kubernetes cluster monitoring.

Advantages of Prometheus

Powerful multidimensional data model:
- Time series data is differentiated by metric names and key-value pairs.
- All metrics can be given multiple dimension tags.
- The data model is more flexible and does not need to be set as a dot-separated string.
- The data model can be aggregated, segmented, and sliced.
- Supports double-precision floating point types, tags can be set to full unicode.
Flexible and powerful query language (PromQL): In the same query, multiple metrics can be multiplied, added, concatenated, decimal places can be taken, etc.
Easy to manage: Prometheus server is a standalone binary file that can work locally without depending on distributed storage.
Efficient: Each sample point only takes up 3.5 bytes, and a Prometheus server can handle millions of metrics.
Dynamic acquisition: Monitoring targets can be obtained through service discovery or static configuration.
Uses the pull model to collect time series data, which can prevent problematic servers from pushing bad metrics.
Supports the push gateway method to push time series data to the Prometheus server side.
Various visualization interfaces.

Prometheus Architecture and Components

Picture is from Prometheus official documentation

The above figure is the architecture diagram of Prometheus, which includes the core modules of Prometheus and the components in the ecosystem. A brief introduction is as follows:

Prometheus Server: Responsible for collecting and storing time series data.
Client Library: The client library generates corresponding metrics for services that require monitoring and exposes them to the Prometheus server. When the Prometheus server comes to pull, it directly returns the metrics of the real-time status.
Push Gateway: Mainly used for short-term jobs. Since these jobs have a shorter existence, they may disappear before Prometheus comes to pull. Therefore, these jobs can directly push their metrics to the Prometheus server. This method is mainly used for service-level metrics. For machine-level metrics, it is recommended to use node exporter.
Exporters: Used to expose the metrics of existing third-party services to Prometheus.
Alertmanager: After receiving alerts from the Prometheus server, Alertmanager removes duplicate data, groups them, and routes them to the corresponding acceptance methods, and sends out alarms.

Working Principle

As seen in the above diagram, the main modules of Prometheus include: Prometheus server, exporters, Pushgateway, PromQL, Alertmanager, and the graphic interface. The basic workflow is:

Prometheus server regularly pulls metrics from configured jobs or exporters, or receives metrics sent by Pushgateway, or pulls metrics from other Prometheus servers.
Prometheus server stores the collected metrics locally, runs the defined alert.rules, records new time series or pushes alerts to Alertmanager.
Alertmanager processes the received alerts according to the configuration file and sends out alarms.
Visualize the collected data in the graphic interface.

The core of Prometheus’ operation is to use the Pull mode to collect Metrics data from monitored objects, and then store these data in a TSDB (such as OpenTSDB, InfluxDB, etc.) for subsequent time-based retrieval.

Applicable Scenarios

Prometheus is very suitable for recording pure time series data. It is suitable for monitoring oriented to server and other hardware indicators and is also suitable for high dynamic service architecture monitoring widely used in microservices. Prometheus’s multi-dimensional data collection and data filtering query language are very powerful. Prometheus is designed for service reliability. When there is a fault in the service, it can help you quickly locate and diagnose the problem, its deployment process does not have a strong dependent relationship on hardware and services.

Prometheus emphasizes reliability, you can still view the available statistics about the system at any time in case of failure. If you need 100% accuracy, such as billing per request, Prometheus is not a good choice, because the collected data may not be detailed and complete.

In summary, in business scenarios that require high availability, Prometheus is a very good choice, but for business scenarios that require high precision and accurate rates, Prometheus is not the best choice.