分类 Linux 相关 下的文章

由 hypervisor 驱动内存泄漏导致的 VM CPU飙高的问题

今天有开发人员说他们同一个 cluster 里面运行同一版本的某些 server 出现 JVM CPU 非常高的情况, 而其它 server 的JVM
CPU 维持正常. 他们表示说以前没出现过这种情况, 而出现这种情况的server 比正常其它server 的CPU usage 要高很多, 所以被内部某些监控工具自动重启了. 据他们观察这些机器可能正在被内部的某些漏洞扫描工具在扫描, 但是又不能确认, 想请SRE帮忙确认一下原因是什么?

SRE 首先确认了这些 CPU usage 非常高的server 跟内部的漏洞扫描基本没关系, 因为这些漏洞扫描的 traffic 基本进不了程序内部代码逻辑, 在应用框架层就被拦截了, 基本不会造成CPU usage 高. 另外还有其它被漏洞扫描的server 并没有出现 CPU 飙高的情况.

SRE 另外明确看到, 这些出问题的server(其实都是通过OpenStack 虚拟出来的VM)的CPU usage大概都在40%左右, 不出问题的server 的CPU usage 大概在3%左右. 出问题server 的JVM CPU usage 大概在8%左右, 而没有问题的 server 的 JVM CPU usage 大概在1%左右. 所以可以大概得出结论, 这些CPU 大部分并不是被 JVM 所占用, 但是 JVM 也受到了一定的影响.

进一步观察发现出现问题的server 都是在同一台 hypervisor 上, 进一步去查看同一台 hypervisor 上面的其它 vm server, 也都表现出了 CPU 较高的情况.

登录到这台 Hypervisor 上面, 使用下面的命令可以看到, 这些Hypervisor 有kernel的内存泄漏问题:

admin@hv-8hhy:~$ smem -twk
Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory        159.2G       6.5G     152.7G
userspace memory             139.3G     196.2M     139.1G
free memory                   15.6G      15.6G          0
----------------------------------------------------------
                             314.1G      22.3G     291.8G

在 kernel dynamic memory 这行的 Noncache 这列, 我们看到它使用了152.7G, 这明显是个问题. 对于 Cloud team来说这是一个已知的issue, 并且给出了 kernel 的fix link:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972

linux find command 命令

  1. find ~/ -name a.txt
  2. find . -type f -name tech //find file named tech
  3. find . -type d -name tech //find dir named tech
  4. find . -iname tech //ignore case, find both TECH, tech, Tech, etc
  5. find /var -name "*.log"
  6. find . -type f -perm 0777 -print //find all files whose permission is 777
  7. find / -type f ! -perm 777 //find all the files without permission 777
  8. find / -perm /u=r // find read only file
  9. find / -perm /a=x // find executable file
  10. find / -name foo.bar -print 2>/dev/null //"Permission Denied" send to null
  11. find . -name *.bar -maxdepth 2 -print //only search 2 directories deep
  12. find ./dir1 ./dir2 -name foo.bar -print //search 2 dirs
  13. find /some/directory -type l -print // search link file
    type:
b    block (buffered) special 
c    character (unbuffered) special 
d    directory 
p    named pipe (FIFO) 
f     regular file 
l     symbolic link 
s    socket 

There are, however, other expressions you can use as follows:

-amin n - The file was last accessed n minutes ago
-anewer - The file was last accessed more recently than it was modified
-atime n - The file was last accessed more n days ago
-cmin n - The file was last changed n minutes ago
-cnewer - The file was last changed more recently than the file was modified
-ctime n - The file was last changed more than n days ago
-empty - The file is empty
-executable - The file is executable
-false - Always false
-fstype type - The file is on the specified file system
-gid n - The file belongs to group with the ID n
-group groupname - The file belongs to the named group
-ilname pattern - Search for a symbolic line but ignore case
-iname pattern - Search for a file but ignore case
-inum n - Search for a file with the specified node
-ipath path - Search for a path but ignore case
-iregex expression - Search for a expression but ignore case
-links n - Search for a file with the specified number of links
-lname name - Search for a symbolic link
-mmin n - File's data was last modified n minutes ago
-mtime n - File's data was last modified n days ago
-name name - Search for a file with the specified name
-newer name - Search for a file edited more recently than the file given
-nogroup - Search for a file with no group id
-nouser - Search for a file with no user attached to it
-path path - Search for a path
-readable - Find files which are readable
-regex pattern - Search for files matching a regular expression
-type type - Search for a particular type
-uid uid - Files numeric user id is the same as uid
-user name - File is owned by user specified
-writable - Search for files that can be written to

linux 本地端口 使用

今天查问题 遇到如下异常:
java.net.ConnectException: Cannot assign requested address

看到网上大多数是说 本地往外连接的端口已经被占用完.

  1. 首先查看本地的 ulimit 设置, 是否过小
    _$ ulimit -a
  2. 如果不是很小, 查看当前的 端口使用情况
    _$ ss -s
  3. 查看本地往外连接端口的设置:
    _$ cat /proc/sys/net/ipv4/ip_local_port_range

更多的 linux 网络配置参数:https://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.obscure.html#AEN1252

参考:
https://ma.ttias.be/linux-increase-ip_local_port_range-tcp-port-range/

关于 linux PS 命令

虽然经常用, 但是不是那么熟悉它竟然能提供那么多的信息. PS 是 Process Status 的缩写. top 命令的输出和 PS 很类似, 只不过是实时刷新.

ps --help all //显示所有的命令行参数
ps L //显示输出格式
ps H 16705 //显示特定进程的线程信息

ps -o ppid,pid,lwp,nlwp,%cpu,%mem,cputime,cmd,args k -%cpu H 16705 //输出一个进程的所有线程, 并且自定义格式, 按照 cpu 使用时间倒序排列.

关于格式中的nlwp: Number of Lightweight Processes. This basically amounts to the number of threads a program has running

一般结合 https://www.pslinux.online/index.php & ps --help all 就能找到想用的参数.

tcpdump

tcpdump - dump traffic on a network

tcpdump [ -AbdDefhHIJKlLnNOpqStuUvxX# ] [ -B buffer_size ]

  • -w 写入文件
  • -r 从文件读入 -F 读入的文件
  • -V 从一批文件读入
  • -c 处理多少 package
  • -D or --list-interfaces 列出接口列表
  • -i --interface= 制定接口
  • -K --dont-verify-checksums 不验证 checksum 加快速度
  • -n 不转换数字为地址
  • -Q --direction=in|out|inout

samples here
sudo tcpdump host 10.102.196.239 -w /tmp/tcpdump.log.cap

ping:
sudo tcpdump -e icmp[icmptype] == 8 //ping echo request
sudo tcpdump -e icmp[icmptype] == 0 //ping echo reply