Prometheus 学习笔记
Introduction
Overview
- Prometheus is an open-source systems monitoring and alerting toolkit.
- Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.
Features
- a multi-dimensional data model with time series data identified by metric name and key/value pairs
- PromQL, a flexible query language to leverage this dimensionality
- no reliance on distributed storage; single server nodes are autonomous
- time series collection happens via a pull model over HTTP
- pushing time series is supported via an intermediary gateway
- targets are discovered via service discovery or static configuration
- multiple modes of graphing and dashboarding support
Components
- the main Prometheus server which scrapes and stores time series data
- client libraries for instrumenting application code
- a push gateway for supporting short-lived jobs
- special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
- an alertmanager to handle alerts
- various support tools
Prometheus configuration file: prometheus.yml
- global.scrape_interval
- global.evaluation_interval
- rule_files: []
- scrape_configs: {job_name:"", static_configs:""}
Prometheus server UI
- status page: http://
:9090/ - self metrics page: http://
:9090/metrics - expression browser: http://
:9090/graph
- status page: http://
glossary
- The Alertmanager takes in alerts, aggregates them into groups, de-duplicates, applies silences, throttles, and then sends out notifications to email, Pagerduty, Slack etc.
Concepts
Data models
- <metric_name>{<label_name>=<label_value>, ...}
- metrics_name 符合: /a-zA-Z_:*/ 字母数字下划线分号(分号只是用在定义 recording rule 的)
- dimensional(维度)需要通过 labels 定义
- time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions.
- 添加/去除等改变 label value 的操作会导致创建新的 time series
- label name 符合 /a-zA-Z_*/ 字母数字下划线 (2个连续下划线(__)开头的 label name 是系统保留用的)
- label value 可以使用任何 Unicode 字符
- A label with an empty label value is considered equivalent to a label that does not exist
Metrics Types
- Counter: 单调增长的计数器, 重启后变为0再自增
- Gauge: 可增可减的数值
- Histogram: 柱状图 指标 basename
_bucket{le=" "} _sum: total sum _count: = _bucket{le="+Inf"} - Summary
Jobs & Instances
- When Prometheus scrapes a target, it attaches some labels automatically to the scraped time series which serve to identify the scraped target: job: <job_name> & instance:
:
Prometheus
- When Prometheus scrapes a target, it attaches some labels automatically to the scraped time series which serve to identify the scraped target: job: <job_name> & instance:
Configuration
- command-line flags configure immutable system parameters (such as storage locations, amount of data to keep on disk and in memory, etc.)
- configuration file defines everything related to scraping jobs and their instances, as well as which rule files to load.
- Prometheus can reload its configuration at runtime.
- send SIGHUP;
HTTP POST request to the /-/reload endpoint
- scrape_config
Targets with static_configs or dynamic service-discovery;
- rule check ->
promtool check rules /path/to/example.rules.yml
- rule check ->
PromQL
- can evaluate: instant vector, range vector, scalar, string
- metrics_name{} 可以写为: {__name__="metrics_name"} 比如查询多个 metrics {__name__=~"job:.*"}
- subQuery: <instant_query> '['
':' [ ] ']' [ @ <float_literal> ] [ offset ] ( is optional. Default is the global evaluation interval.) - Vector matching
ignoring( on(
Storage