2022年6月

VI 快捷键

虽然 vi/vim 编辑器有很多快捷键, 但是常用的并不多. 把我认为需要常用的记录在这:

命令模式

i – 光标处插入(进入插入模式)
a – 光标后插入(进入插入模式)
A – 行尾插入(进入插入模式)
o – 新建一行(进入插入模式)
u – 撤销前面的改动
U – 撤销当前行的所有改动
D – 删除当前行光标后所有字符
x – 删除当前光标处字符
R – 当前行从光标处开始替换
r – 仅替换当前光标处字符, 之后还是命令模式
s – 替换当前光标处字符并且进入插入模式
S – 删除当前行所有字符, 回到当前行行首, 进入插入模式
~ – 当前字符大小写替换
dd – 删除当前行(还是命令模式)
3dd – 删除3行
dw – 删除一个字符
4dw – 删除4个字符
Shift+zz 保存并关闭

插入模式

ESC – 退出插入模式

导航(各种跳)

行跳跃

l - 向右
h - 向左
j - 向下
k - 向上

0 - (零字符)行首
^ - (正则表达式行开始字符) 行首第一个非空字符
$ - (正则表达式行结束字符) 行尾

屏幕跳跃

H – 屏幕第一行
M – 屏幕中间行
L – 屏幕最后一行

单词跳跃

WORD – 非空字符隔开的.
word – 字母,数字,下划线组成的串.
例如:
192.168.1.1 – single WORD
192.168.1.1 – seven words.

e – go to the end of the current word.
E – go to the end of the current WORD.
b – go to the previous (before) word.
B – go to the previous (before) WORD.
w – go to the next word.
W – go to the next WORD.

段落跳跃

{ - 段落开始处
} - 段落结尾处

ubuntu 20.04.4 安装 eBPF bcc

按理讲, 装个 bcc 有啥可记录的? 官方都有详细的安装说明, 直接一步步来不就好了. 其实我一开始也是这么想的. 然而现实很残酷, 花了我至少30分钟.

环境

supra@suprabox:~$ cat /etc/issue
Ubuntu 20.04.4 LTS \n \l

supra@suprabox:~$ uname -a
Linux suprabox 5.4.0-117-generic #132-Ubuntu SMP Thu Jun 2 00:39:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

官方安装文档

链接: https://github.com/iovisor/bcc/blob/master/INSTALL.md
关于 kernel 的配置, 由于 Ubuntu 20.04.4 的 kernel 已经是5.4.0, 所以默认已经全配置了.
由于官方说使用 package binary 的2种方式的 package 已经 outdated. 所以选用 source 编译安装.
自己编译需要 LLVM, Clang, cmake, gcc 根据不同的 Ubuntu 版本有不同的安装包, 复制命令执行就好

到真正安装和编译 BCC 的部分的时候, 出问题了:

git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake ..
make
sudo make install
cmake -DPYTHON_CMD=python3 .. # build python3 binding
pushd src/python/
make
sudo make install
popd

首先, git clone 在这个国家 clone 不下来, 于是设置代理:

git config --global http.proxy http://proxy.mycompany:80
//如果代理需要用户名密码:
git config --global http.proxy http://mydomain\\myusername:mypassword@myproxyserver:8080/

好的, clone 成功, 然后一步一步安装编译, 编译的时候, 有出错了, 错误消息大概是这样:

/tmp/bcc/src/cc/bpf_module.cc:108:46: error: no matching function for call to ‘llvm::object::SectionRef::getName() const’
       auto sec_name = section.get()->getName();

这个帖子一样的问题, 还给出了解决方案. 我采用的是使用 v0.24.0 版本. 所以只要切换到这个 tag 就好了:

git checkout v0.24.0

安装完成之后, 直接执行测试命令:

supra@suprabox:/usr/lib/python3/dist-packages/bcc$ sudo ~/bpf/bcc/examples/hello_world.py
Traceback (most recent call last):
  File "/home/supra/bpf/bcc/examples/hello_world.py", line 9, in <module>
    from bcc import BPF
ImportError: No module named bcc

看到之前编译的时候, 使用的是 Python3, 所以看了一些, 系统默认的 python 是2.7:

supra@suprabox:~$ $(which python) --version
Python 2.7.18

于是改用 python3, 就好了:

sudo python3  ~/bpf/bcc/examples/hello_world.py
[sudo] password for supra:
b'         splunkd-5799    [005] .... 47245.389935: 0: Hello, World!'
b'         splunkd-5799    [005] .... 47245.393749: 0: Hello, World!'

supra@suprabox:~$ sudo python3 ~/bpf/bcc/examples/tracing/tcpv4connect.py
PID    COMM         SADDR            DADDR            DPORT
158736 python3.7    127.0.0.1        127.0.0.1        8089
838    qualys-cloud 10.249.64.103    64.39.104.103    443
159247 curl         127.0.0.1        127.0.0.1        8089
158736 python3.7    127.0.0.1        127.0.0.1        8089
159329 curl         127.0.0.1        127.0.0.1        8089

至此, 安装成功.

ftrace

本文简单介绍 Ftrace 相关的内容, 然后给出几个实用的例子.

关键概念

  1. Ftrace 官方是首字母大写 Ftrace;
  2. Ftrace 的 Kernel 文档: https://www.kernel.org/doc/html/latest/trace/ftrace.html
  3. Ftrace 不是一个简单的工具/命令, 而是一个观测内核的框架. 它可以用来 debug/分析延迟/profiling 内核;
  4. 它的延迟分析能够分析在 interrupt 关闭/打开, 抢占关闭/打开的情况下的延迟;
  5. 内核提供了上百个事件(event)的观测, 通过配置 tracefs, 可以通过 Ftrace 看到这些 event 的情况;
  6. kernel 支持的 tracer 以及配置 tracer

    pi@raspberrypi:~/tmp $ sudo cat /sys/kernel/tracing/available_tracers
    blk function_graph wakeup_dl wakeup_rt wakeup irqsoff function nop
    sudo echo function_graph > /sys/kernel/tracing/current_tracer

tracefs

Ftrace 使用 tracefs 配置产生输出, tracefs 是一个虚拟的文件系统, 用来与 kernel 提供的tracing 框架交互. 它能够动态的打开/关闭某个tracing feature. tracefs 提供一个层级路径结构, 不同的子目录对应不同的 tracing 事件. 它的主路径在 /sys/kernel/tracing.
通过 tracefs 我们可以:

  1. 控制 tracing events: 打开或关闭一个/一组events. 如: function call, context switch, memory allocation
  2. 控制某些选项: 如 trace buffer size, trace data 格式, event handler 的行为
  3. 配置数据搜集文件: 如你可以配置trace data 从专门文件读, trace, trace_pipe, trace_marker.
  4. 过滤: 如 过滤特定的线程, 进程, event type.

几个例子

确保 tracefs 被挂载

$ ls /sys/kernel/tracing
# 若没挂载, 使用下面命令挂载 tracefs. 
$ sudo mount -t tracefs nodev /sys/kernel/tracing

trace function uptime_proc_show 的例子

# 查看所有可用的 functions
$ cat available_filter_functions

# 写入要 trace 的函数
$ echo uptime_proc_show > /sys/kernel/tracing/set_ftrace_filter

# trace function
$ echo function > /sys/kernel/tracing/current_tracer

# 打开 trace
$ echo 1 > /sys/kernel/tracing/tracing_on

# 做一些有写入操作的动作
$ uptime

# 关闭 tracing
$ echo 0 > /sys/kernel/tracing/tracing_on

# 查看 tracing 数据
$ cat /sys/kernel/tracing/trace
# tracer: function
#
# entries-in-buffer/entries-written: 1/1   #P:8
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
          uptime-39274   [005] ..... 46635.054971: uptime_proc_show <-seq_read_iter

# trace 改成 nop
$ echo nop > /sys/kernel/tracing/current_tracer

trace function uptime_proc_show 并显示 stack trace 的例子

根上面的例子一样, 只是在

$ echo uptime_proc_show > /sys/kernel/tracing/set_ftrace_filter
$ echo function > /sys/kernel/tracing/current_tracer
$ echo 1 > /sys/kernel/tracing/options/func_stack_trace
$ echo 1 > /sys/kernel/tracing/tracing_on
$ uptime
$ uptime
$ echo 0 > /sys/kernel/tracing/tracing_on
$ cat /sys/kernel/tracing/trace
# tracer: function
#
# entries-in-buffer/entries-written: 4/4   #P:8
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
          uptime-39283   [005] ..... 47476.243887: uptime_proc_show <-seq_read_iter
          uptime-39283   [005] ..... 47476.243892: <stack trace>
 => uptime_proc_show
 => seq_read_iter
 => proc_reg_read_iter
 => new_sync_read
 => vfs_read
 => ksys_read
 => __x64_sys_read
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe
          uptime-39284   [005] ..... 47480.208731: uptime_proc_show <-seq_read_iter
          uptime-39284   [005] ..... 47480.208736: <stack trace>
 => uptime_proc_show
 => seq_read_iter
 => proc_reg_read_iter
 => new_sync_read
 => vfs_read
 => ksys_read
 => __x64_sys_read
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe

# 清理
$ echo 0 > /sys/kernel/tracing/options/func_stack_trace
$ echo 0 > /sys/kernel/tracing/tracing_on
$ echo nop > /sys/kernel/tracing/current_tracer
$ echo "" > /sys/kernel/tracing/trace

function_graph tracer 的例子

$ echo uptime_proc_show > set_graph_function
$ echo function_graph > current_tracer
$ echo 1 > tracing_on
$ uptime
$ echo 0 > tracing_on
$ cat trace
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0)               |  uptime_proc_show() {
 0)               |    get_idle_time() {
 0)               |      get_cpu_idle_time_us() {
 0)   0.230 us    |        ktime_get();
 0)   0.788 us    |      }
 0)   1.231 us    |    }
 0)               |    get_idle_time() {
 0)               |      get_cpu_idle_time_us() {
 0)   0.125 us    |        ktime_get();
 0)   0.177 us    |        nr_iowait_cpu();
 0)   0.742 us    |      }
 0)   0.964 us    |    }
        ... 省略 一些 重复 ...
 0)   0.197 us    |    ktime_get_with_offset();
 0)   0.107 us    |    ns_to_timespec64();
 0)   0.118 us    |    set_normalized_timespec64();
 0)   1.007 us    |    seq_printf();
 0) + 30.206 us   |  }

# 清理
$ echo nop > current_tracer
$ echo "" > set_graph_function

function_graph tracer 并过滤一些 function 的例子

$ echo uptime_proc_show > set_graph_function
$ echo get_idle_time > set_graph_notrace
$ echo function_graph > current_tracer
$ echo 1 > tracing_on
$ uptime
$ echo 0 > tracing_on
$ cat trace
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0)               |  uptime_proc_show() {
 0)   0.371 us    |    ktime_get_with_offset();
 0)   0.152 us    |    ns_to_timespec64();
 0)   0.138 us    |    set_normalized_timespec64();
 0)   1.027 us    |    seq_printf();
 0)   8.900 us    |  }
# 清理
$ echo nop > current_tracer
$ echo "" > set_graph_function
$ echo "" > set_graph_notrace

trace event 的例子

显示可用的 trace events:

$ cat /sys/kernel/tracing/available_events
$ ls /sys/kernel/tracing/events/

events/syscalls/sys_enter_write 为例展示 trace event

$ ls /sys/kernel/tracing/events/syscalls/sys_enter_write
enable  filter  format  hist  id  inject  trigger
# 查看输出格式
$ cat /sys/kernel/tracing/events/syscalls/sys_enter_write/format

# enable event tracing for that event
$ echo 1 > /sys/kernel/tracing/events/syscalls/sys_enter_write/enable
$ echo 1 > /sys/kernel/tracing/tracing_on

# 查看输出
$ cat /sys/kernel/tracing/trace
cat-39448   [005] ..... 49432.357354: sys_write(fd: 1, buf: 7f8e06c19000, count: 62)
cat-39448   [005] ..... 49432.357304: sys_write(fd: 1, buf: 7f8e06c19000, count: 62)
# 清理
$ echo 0 > /sys/kernel/tracing/tracing_on
$ echo 0 > /sys/kernel/tracing/events/syscalls/sys_enter_write/enable

设置 event 过滤条件

过滤 fd 不是 1 的(标准输出 stdout):

$ cat  /sys/kernel/tracing/events/syscalls/sys_enter_write/filter
none

$ echo "fd!=1" > /sys/kernel/tracing/events/syscalls/sys_enter_write/filter
$ echo 1 > /sys/kernel/tracing/events/syscalls/sys_enter_write/enable
$ echo 1 > /sys/kernel/tracing/tracing_on
$ cat /sys/kernel/tracing/trace
sudo-3129    [003] ..... 51059.260638: sys_write(fd: 8, buf: 560f1fb345e0, count: fff)

# 清理
$ echo 0 > /sys/kernel/tracing/tracing_on
$ echo 0 > /sys/kernel/tracing/events/syscalls/sys_enter_write/enable
$ echo "" > /sys/kernel/tracing/trace

event 的 histogram

histogram 通过 trigger 实现

# 查看 trigger
cat  /sys/kernel/tracing/events/events/syscalls/sys_enter_write/trigger
# Available triggers:
# traceon traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist

$ echo "hist:key=common_pid.execname:val=count:sort=count.descending" > /sys/kernel/tracing/events/syscalls/sys_enter_write/trigger
$ echo "fd!=1" > /sys/kernel/tracing/events/syscalls/sys_enter_write/filter
$ echo 1 > /sys/kernel/tracing/events/syscalls/sys_enter_write/enable
$ echo 1 > /sys/kernel/tracing/tracing_on
$ cat /sys/kernel/tracing/trace

$ cat /sys/kernel/tracing/events/syscalls/sys_enter_write/hist
# event histogram
#
# trigger info: hist:keys=common_pid.execname:vals=hitcount,count:sort=count.descending:size=2048 [active]
#

{ common_pid: sshd            [      1036] } hitcount:        169  count:      13000
{ common_pid: sudo            [      3129] } hitcount:        169  count:       9678
{ common_pid: cat             [     39478] } hitcount:          2  count:       7893
{ common_pid: bash            [      3132] } hitcount:        108  count:       1371
{ common_pid: multipathd      [       497] } hitcount:        122  count:        976
{ common_pid: cat             [     39479] } hitcount:          4  count:         37
{ common_pid: upowerd         [      2616] } hitcount:          1  count:          8

Totals:
    Hits: 575
    Entries: 7
    Dropped: 0

# 清理
$ echo 0 > /sys/kernel/tracing/tracing_on
$ echo 0 > /sys/kernel/tracing/events/syscalls/sys_enter_write/enable
$ echo "" > /sys/kernel/tracing/trace

Ftrace filter

  1. 多个 filter

    $ echo sys_nanosleep hrtimer_interrupt > set_ftrace_filter
    $ cat set_ftrace_filter
    hrtimer_interrupt
    sys_nanosleep
  2. 模糊匹配 模式 使用*. 前面,后面,中间等

    $ echo 'hrtimer_*' > set_ftrace_filter
    $ cat set_ftrace_filter
  3. 添加更多 filter

    $ echo sys_nanosleep >> set_ftrace_filter
  4. 使用文件 available_filter_functions 里面的索引值

    $ head -1 available_filter_functions
    $ echo 1 > set_ftrace_filter 
    $ cat set_ftrace_filter

    写文件到文件系统, 并抓取 ksys_write 的 function call:

    写一个shell 文件 test.sh

    echo $$
    sleep 5
    echo "" > /sys/kernel/tracing/trace
    echo ksys_write > /sys/kernel/tracing/set_graph_function
    echo $$ > /sys/kernel/tracing/set_ftrace_pid
    echo function_graph > /sys/kernel/tracing/current_tracer
    echo 1 > /sys/kernel/tracing/tracing_on
    echo $$ > /tmp/test.txt
    echo 0 > /sys/kernel/tracing/tracing_on
    cat /sys/kernel/tracing/trace > /tmp/result.txt
    echo nop > /sys/kernel/tracing/current_tracer
    echo "" > /sys/kernel/tracing/trace
    echo "done"
    cat /tmp/result.txt

其它

以下都是材料, 还没整理:

  1. 这应该是比较早的关于 ftrace 的邮件: https://lwn.net/Articles/264029/
  2. 关于 ftrace 的一个配置的说明: https://cateee.net/lkddb/web-lkddb/FTRACE.html
Enable the kernel to trace every kernel function. This is done by
using a compiler feature to insert a small, 5-byte No-Operation
instruction to the beginning of every kernel function, which NOP
sequence is then dynamically patched into a tracer call when tracing
is enabled by the administrator. If it's runtime disabled (the bootup
default), then the overhead of the instructions is very small and not
measurable even in micro-benchmarks.

3.https://alex.dzyoba.com/blog/ftrace/#:~:text=Ftrace%20is%20a%20framework%20for,Tracepoints%20support

关于 ftrace 的一些关键字:
框架, NOP, -pg, gcc, gprof, trace-cmd, kernelShark, tracefs, 文件接口, mcount, 所有 kernel 函数, function, function_graph.

how it works

Ftrace -> mcount -> gpro -> (-pg flags)
https://github.com/freelancer-leon/notes/blob/master/kernel/trace/ftrace-design.md
https://ftp.gnu.org/old-gnu/Manuals/gprof-2.9.1/html_node/gprof_25.html

参考: https://www.kernel.org/doc/html/latest/trace/ftrace.html