2022年7月

Prometheus 学习笔记

Introduction

Overview

  1. Prometheus is an open-source systems monitoring and alerting toolkit.
  2. Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.
  3. Features

    1. a multi-dimensional data model with time series data identified by metric name and key/value pairs
    2. PromQL, a flexible query language to leverage this dimensionality
    3. no reliance on distributed storage; single server nodes are autonomous
    4. time series collection happens via a pull model over HTTP
    5. pushing time series is supported via an intermediary gateway
    6. targets are discovered via service discovery or static configuration
    7. multiple modes of graphing and dashboarding support
  4. Components

    1. the main Prometheus server which scrapes and stores time series data
    2. client libraries for instrumenting application code
    3. a push gateway for supporting short-lived jobs
    4. special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
    5. an alertmanager to handle alerts
    6. various support tools
  5. Prometheus configuration file: prometheus.yml

    1. global.scrape_interval
    2. global.evaluation_interval
    3. rule_files: []
    4. scrape_configs: {job_name:"", static_configs:""}
  6. Prometheus server UI

    1. status page: http://:9090/
    2. self metrics page: http://:9090/metrics
    3. expression browser: http://:9090/graph
  7. glossary

    1. The Alertmanager takes in alerts, aggregates them into groups, de-duplicates, applies silences, throttles, and then sends out notifications to email, Pagerduty, Slack etc.

Concepts

  1. Data models

    1. <metric_name>{<label_name>=<label_value>, ...}
    2. metrics_name 符合: /a-zA-Z_:*/ 字母数字下划线分号(分号只是用在定义 recording rule 的)
    3. dimensional(维度)需要通过 labels 定义
    4. time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions.
    5. 添加/去除等改变 label value 的操作会导致创建新的 time series
    6. label name 符合 /a-zA-Z_*/ 字母数字下划线 (2个连续下划线(__)开头的 label name 是系统保留用的)
    7. label value 可以使用任何 Unicode 字符
    8. A label with an empty label value is considered equivalent to a label that does not exist
  2. Metrics Types

    1. Counter: 单调增长的计数器, 重启后变为0再自增
    2. Gauge: 可增可减的数值
    3. Histogram: 柱状图 指标 basename
    1. _bucket{le=""}
    2. _sum: total sum
    3. _count: =_bucket{le="+Inf"}

      1. Summary
  3. Jobs & Instances

    1. When Prometheus scrapes a target, it attaches some labels automatically to the scraped time series which serve to identify the scraped target: job: <job_name> & instance: :

    Prometheus

  4. Configuration

    1. command-line flags configure immutable system parameters (such as storage locations, amount of data to keep on disk and in memory, etc.)
    2. configuration file defines everything related to scraping jobs and their instances, as well as which rule files to load.
    3. Prometheus can reload its configuration at runtime.
    1. send SIGHUP;
    2. HTTP POST request to the /-/reload endpoint

      1. scrape_config
    3. Targets with static_configs or dynamic service-discovery;

      1. rule check -> promtool check rules /path/to/example.rules.yml
  5. PromQL

    1. can evaluate: instant vector, range vector, scalar, string
    2. metrics_name{} 可以写为: {__name__="metrics_name"} 比如查询多个 metrics {__name__=~"job:.*"}
    3. subQuery: <instant_query> '[' ':' [] ']' [ @ <float_literal> ] [ offset ] ( is optional. Default is the global evaluation interval.)
    4. Vector matching
    1. ignoring(
    2. on(
  6. Storage

    1. format: https://github.com/prometheus/prometheus/blob/release-2.36/tsdb/docs/format/README.md