Understanding Metric Types of Prometheus

0
928
prometheus-monitoring
prometheus-monitoring

Introduction: Prometheus is an open source system. It is widely adopted by the industries for active monitoring and alerting. It supports the multidimensional data model.
In this blog, I will concentrate on the metric definition and various types available with Prometheus.

Metric: Time series is uniquely identified with the metric. Metric is a combination of metric name and its dimension. Where the dimension is differentiated with the label and it’s value.

For example,
In a time series the metric name http_total_requests and the labels method=”POST” and status_code=”200” can be written as. http_total_requests{method=”POST”, status_code=”200”}
This is notation in prometheus for a metric.

Metric Types:

There are four metric types are available with Prometheus. The following are the types one by one.

Counter:

It is an incremental value or can be zero when the counter gets initialized or restarted. For example, it’s possible to measure the number of requests, errors, etc.
The values which can be increased or decreased should not be measured with the counters. For example, memory in use and number of processes are running cannot be measured with counters.

Gauge:

Gauge is the incremental or decremental value that changes over time. Example of gauge includes the heap memory used or CPU usage.

Histogram:

Histogram is used for observations. A Histogram consists the combination of following metrics:
i. Buckets: Buckets are counter of observations. It should have the upper bound and lower bound. The format for bucket is <basename>_bucket {le = “<bound_value>”}
ii. Sum of the observation: This is sum of all observations. The format for sum of observation is <basename>_sum
iii. Count of the observation: This is count of all event in observations. The format for count of the observation is <basename>_count
The value of <basename>_bucket{le=”+Inf”} must be same as <basename>_count value.
histogram_quantile() 
function of the prometheus will help to calculate the quantiles from histogram.

Example:

requests_time_seconds_sum{app=”projectx”} 5.366133242442994e+07
requests_time_seconds_bucket{app=”projectx”,le=”0.005″} 2343340162
requests_time_seconds_bucket{app=”projectx”,le=”0.01″} 3210191267
requests_time_seconds_bucket{app=”projectx”,le=”0.025″} 3670131549
requests_time_seconds_bucket{app=”projectx”,le=”0.05″} 3784098107
requests_time_seconds_bucket{app=”projectx”,le=”0.1″} 3877150771
requests_time_seconds_bucket{app=”projectx”,le=”0.25″} 3950479685
requests_time_seconds_bucket{app=”projectx”,le=”0.5″} 3970164416
requests_time_seconds_bucket{app=”projectx”,le=”1″} 3972385603
requests_time_seconds_bucket{app=”projectx”,le=”2.5″} 3973394069
requests_time_seconds_bucket{app=”projectx”,le=”5″} 3973827253
requests_time_seconds_bucket{app=”projectx”,le=”10″} 3973860780
requests_time_seconds_bucket{app=”projectx”,le=”15″} 3973861209
requests_time_seconds_bucket{app=”projectx”,le=”+Inf”} 3973861256
requests_time_seconds_count{app=”projectx”} 3973861256

Summary:

Summaries also use the count and sum of the observations like histogram. But it also provides the quantiles over the time. It exposes the <basename>_count and <basename>_sum time series along with the <basename>{quantile=”<φ>”} where φ-quantiles (0 ≤ φ ≤ 1) of observed events.

If application is exposing the internal state then its good to try with exposing the counter metrics. Alerting with the histogram_quantile() function will help to monitor the SLA’s.

Some of my recommendation while adopting the metrics:
1. Study the data you can expose and decide which metric type you can use.
2. If you are exposing the metrics the first time then start with important metrics. Example query per second and response time metrics. Make sure the logic of exposing does not have the overhead in latency-sensitive applications.
3. You must expose the metrics with the right dimensions. For example differentiate the status codes (2xx,3xx,4xx,5xx) with the dimension for a metric.
4. Exposing the right data will help to reduce the querying time for aggregation etc by Prometheus.
5. Implement the histogram and summary for your application metrics.
6. Add alerting with the specific criteria.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments