分类 默认分类 下的文章

Prometheus 学习笔记

Introduction

Overview

  1. Prometheus is an open-source systems monitoring and alerting toolkit.
  2. Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.
  3. Features

    1. a multi-dimensional data model with time series data identified by metric name and key/value pairs
    2. PromQL, a flexible query language to leverage this dimensionality
    3. no reliance on distributed storage; single server nodes are autonomous
    4. time series collection happens via a pull model over HTTP
    5. pushing time series is supported via an intermediary gateway
    6. targets are discovered via service discovery or static configuration
    7. multiple modes of graphing and dashboarding support
  4. Components

    1. the main Prometheus server which scrapes and stores time series data
    2. client libraries for instrumenting application code
    3. a push gateway for supporting short-lived jobs
    4. special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
    5. an alertmanager to handle alerts
    6. various support tools
  5. Prometheus configuration file: prometheus.yml

    1. global.scrape_interval
    2. global.evaluation_interval
    3. rule_files: []
    4. scrape_configs: {job_name:"", static_configs:""}
  6. Prometheus server UI

    1. status page: http://:9090/
    2. self metrics page: http://:9090/metrics
    3. expression browser: http://:9090/graph
  7. glossary

    1. The Alertmanager takes in alerts, aggregates them into groups, de-duplicates, applies silences, throttles, and then sends out notifications to email, Pagerduty, Slack etc.

Concepts

  1. Data models

    1. <metric_name>{<label_name>=<label_value>, ...}
    2. metrics_name 符合: /a-zA-Z_:*/ 字母数字下划线分号(分号只是用在定义 recording rule 的)
    3. dimensional(维度)需要通过 labels 定义
    4. time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions.
    5. 添加/去除等改变 label value 的操作会导致创建新的 time series
    6. label name 符合 /a-zA-Z_*/ 字母数字下划线 (2个连续下划线(__)开头的 label name 是系统保留用的)
    7. label value 可以使用任何 Unicode 字符
    8. A label with an empty label value is considered equivalent to a label that does not exist
  2. Metrics Types

    1. Counter: 单调增长的计数器, 重启后变为0再自增
    2. Gauge: 可增可减的数值
    3. Histogram: 柱状图 指标 basename
    1. _bucket{le=""}
    2. _sum: total sum
    3. _count: =_bucket{le="+Inf"}

      1. Summary
  3. Jobs & Instances

    1. When Prometheus scrapes a target, it attaches some labels automatically to the scraped time series which serve to identify the scraped target: job: <job_name> & instance: :

    Prometheus

  4. Configuration

    1. command-line flags configure immutable system parameters (such as storage locations, amount of data to keep on disk and in memory, etc.)
    2. configuration file defines everything related to scraping jobs and their instances, as well as which rule files to load.
    3. Prometheus can reload its configuration at runtime.
    1. send SIGHUP;
    2. HTTP POST request to the /-/reload endpoint

      1. scrape_config
    3. Targets with static_configs or dynamic service-discovery;

      1. rule check -> promtool check rules /path/to/example.rules.yml
  5. PromQL

    1. can evaluate: instant vector, range vector, scalar, string
    2. metrics_name{} 可以写为: {__name__="metrics_name"} 比如查询多个 metrics {__name__=~"job:.*"}
    3. subQuery: <instant_query> '[' ':' [] ']' [ @ <float_literal> ] [ offset ] ( is optional. Default is the global evaluation interval.)
    4. Vector matching
    1. ignoring(
    2. on(
  6. Storage

    1. format: https://github.com/prometheus/prometheus/blob/release-2.36/tsdb/docs/format/README.md

常见的工具 docker 安装

有一个闲置的 server, 把常用的工具都安装到上面, 方便平时去试验一些东西:

Nginx

sudo docker run --restart always --name nginx -v /home/supra/work/data/nginx_html:/usr/share/nginx/html:ro -v /home/supra/work/data/nginx_config/mime.types:/etc/nginx/mime.types:ro  -p 80:80 -d nginx

mongoDB

sudo docker network create mongo-network
sudo docker run --network mongo-network --restart always -p 27017:27017 --volume /home/supra/work/data/mongo/grafana:/data/db --name mongodb -d mongo
sudo docker run --network mongo-network --restart always -e ME_CONFIG_MONGODB_SERVER=mongodb -p 8081:8081 --name mongoui mongo-express

elasticSearch & kibana

参考: https://www.elastic.co/guide/en/kibana/current/docker.html

sudo docker network create elastic
sudo docker run --restart always --name es01 --network elastic -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -d docker.elastic.co/elasticsearch/elasticsearch:7.16.1
sudo docker run --restart always --name kib01 --network elastic -p 5601:5601 -e "ELASTICSEARCH_HOSTS=http://es01:9200" -d docker.elastic.co/kibana/kibana:7.16.1

Splunk

sudo docker run --restart always -p 8000:8000 -e "SPLUNK_START_ARGS=--accept-license"  -e "SPLUNK_PASSWORD=Sre@2021" --name splunk -d splunk/splunk

Clickhouse

docker run -d --name clickhouse-server --ulimit nofile=5120:5120 --volume=/home/supra/work/data/clickhouse:/var/lib/clickhouse -p 8123:8123 -p 9000:9000 yandex/clickhouse-server

Redis

参考: https://hub.docker.com/_/redis

$ docker network create redis-network
$ sudo docker run --network redis-network --restart always --volume /home/supra/work/data/redis/data:/data --name redis -p 6379:6379 -d redis redis-server --save 60 1 --loglevel warning
$ docker run -it --network redis-network --rm redis redis-cli -h redis

Prometheus

参考: https://prometheus.io/docs/prometheus/latest/installation/

sudo docker run -d --restart always --name prometheus -p 9090:9090 -v /home/supra/work/data/prometheus:/etc/prometheus prom/prometheus

MySQL & phpmyadmin

参考: https://hub.docker.com/_/mysql

mkdir  /home/supra/work/data/mysql/data
 docker run --restart always --name mysqld -v /home/supra/work/data/mysql/data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=Sre@2022 -d -p 3306:3306 -e "MYSQL_USER=sre" -e "MYSQL_PASSWORD=Sre@2022" mysql//create phpmyadmin UI
 docker run --restart always --name phpmyadmin -d --link mysqld:db -p 8082:80 phpmyadmin

continuumio/anaconda3

docker run --restart always -d --name anaconda3  -p 8888:8888 continuumio/anaconda3 /bin/bash -c "\
    conda install jupyter -y --quiet && \
    mkdir -p /opt/notebooks && \
    jupyter notebook --NotebookApp.token='' --NotebookApp.password='' \
    --notebook-dir=/opt/notebooks --ip='*' --port=8888 \
    --no-browser --allow-root"

另外在 nginx 的 html 目录放一个 index.html

<li><a target="_blank" href=":5601/">elastic</a></li>
<li><a target="_blank" href=":8000/">Splunk(admin/Sre@2021)if expired, reinstall</a></li>
<li><a target="_blank" href=":8081/">MongoUI</a></li>
<li><a target="_blank" href=":8123/">ClickHouseUI</a></li>
<li><a target="_blank" href="/">RedisUI</a></li>
<li><a target="_blank" href=":9090/">Prometheus</a></li>

<li><a target="_blank" href="https://www.tianxiaohui.com/display/~xiatian/supra">wiki</a></li>

<script>
    (function() {
        const links = document.querySelectorAll("a");
        links.forEach(function(ele){
                ele.href = ele.href.replace("/:", ":");
        });
    })();
</script>

iterm 个性化的改动记录

更改前景色和背景色 如图

colors.png
效果:
result.png

更改 Key Mappings

Profiles -> Keys -> Key Mappings -> Natural Text Editing. 这样做的好处是 FN + 左右箭头. Option + 箭头
key.png

光标闪烁

blind.png

站点的流量少了

今天有空去看了下最近的这个站点的流量, 发现少了一些. 再看一周的流量, 也比以前少了. 之前至少每天有30~70的PV. 可是最近少了一半.

打开更长的周期和来源分析一下, 从5月开始,发现从百度来的明显减少了, 不知道是为什么? 最近几天几乎为0.
百度统计.png

大概5月13号左右换过一次 IP. 但是看上去并不相关.

去看了下最近百度的抓取状态, 发现最近半个月没有任何抓取数据. 不知道是不是因为最近没有更新文章导致的.

MAC 的 PATH 环境变量

在 MAC 上工作, 竟然遇到要更改 PATH 环境变量的情况. 比如有的 Java 工程是基于 JDK8 的, 有的是基于 JDK11 的, 所以经常要改这个变量. 那么我们看到的 PATH 环境变量是从哪里来的, 那些配置文件会改这些 PATH 值呢?

首先, 我们先看下当前的 PATH 变量:

~ xiatian$ echo $PATH
/usr/local//Cellar/curl/7.80.0_1/bin/:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Applications/Wireshark.app/Contents/MacOS:/Users/xiatian/work/tools:/Applications/Visual\ Studio\ Code.app/Contents/Resources/app/bin/

这里面有好多路径, 那么我们好奇当系统启动的过程中, 最初的 PATH 是什么呢?

  1. MAC 上最初的 PATH 是从 /etc/paths 文件读的:

    ~ xiatian$ cat /etc/paths
    /usr/local/bin
    /usr/bin
    /bin
    /usr/sbin
    /sbin
  2. 然后, 它会把 /etc/path.d 里面的路径逐个添加到后面, 比如我们可以看到我的 wireshark 路径在上面的 PATH 里:

    ~ xiatian$ ls -l /etc/paths.d/
    total 8
    -rw-r--r--  1 root  wheel  43 Oct 19  2017 Wireshark
    ~ xiatian$ cat /etc/paths.d/Wireshark
    /Applications/Wireshark.app/Contents/MacOS
  3. 在个人的 home 目录, 还有2个文件会影响 PATH 的值: .bash_profile & .bash_rc. 他们的区别是: 当一个 shell 是 login shell 的时候, 它会先执行 .bash_profile, 当是一个非 login 交互式 shell 的时候, 它执行 .bashrc. 但是在 MAC 上, 默认都是 login shell, 所以应该都执行. 来源于这个问答. 我试过了自带的 Terminal 和 安装的 iterm, 发现我的 MAC 都不是这么玩的.
  4. 我本地是执行的 ~/.profile. 查看 bash 的 man, 你会在 INVOCATION section 看到具体的配置执行顺序:

    ~ xiatian$ man bash
    When bash is invoked as an interactive login shell, or as a non-interactive shell with the --login option, it first reads and executes commands from the file  /etc/profile,  if that  file exists.  After reading that file, it looks for ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order, and reads and executes commands from the first one that exists and is readable.  The --noprofile option may be used when the shell is started to inhibit this behavior.
    When an interactive shell that is not a login shell is started, bash reads and executes commands from ~/.bashrc, if that file exists.  This may be inhibited by using the --norc option.  The --rcfile file option will force bash to read and execute commands from file instead of ~/.bashrc.

所以, 只要仔细阅读 man bash 的内容, 你就知道 PATH 是怎么变化的啦.