常见的引起Java 线程卡死的问题
有各种问题会引起 Java 线程卡死, 导致应用程序最终不能正常服务. 通常加上合适的 timeout 时间会使问题缓解. 这里列举一些例子.
- 死锁
这种问题很常见, 如果都是 Synchronizer 锁引起的, 基本通过查看 thread dump 就能找出来. 如果是 Synchronizer 锁和 AQS 的锁, 或者是和外部资源一起死锁,比如和外部数据库, 这种就不那么明显. 关键资源泄漏
关键资源的泄漏, 导致后面需要这些资源的线程只能傻等在门口. 一般在等的门口加一些 timeout, 或许会短期有所缓解, 但是如果这个是必经之路, 最后导致业务无法继续.- 连接池泄漏
经常看到有些连接池在使用完之后没有还回去, 或者各种 error 没有捕获, 导致资源不能还回去. 当然有一些框架可以通过检测 reference 情况, 自动回收这些资源; 比如:
- 连接池泄漏
"DefaultThreadPool-42" daemon prio=10 tid=0x00007fd3fc339000 nid=0x1776 waiting on condition [0x00007fd3c41c4000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000783800070> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:133)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:282)
at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:64)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:177)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:170)
at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:102)
at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:208)
at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
- AQS 资源泄漏
Java 里面很多并发控制手段都是通过 AQS(Abstract Queue Sychronizer) 实现的, 如果有些某些 AQS 的资源没有被及时归还, 就会导致傻等. 傻等的线程栈类似下面:
"DefaultThreadPool-2" daemon prio=10 tid=0x00007efd841c9800 nid=0x4b25 waiting on condition [0x00007efcfd6d3000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007affa8a38> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
- TCP 连接 block
Java 里面很多还是使用 BIO, 如果没有设置 read timeout, 有时候会进入无限等待中. 线程栈如下:
"DefaultThreadPool-52" daemon prio=10 tid=0x00007f18a0015000 nid=0x7739 runnable [0x00007f1822a7b000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:153)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at com.sun.mail.util.TraceInputStream.read(TraceInputStream.java:110)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
- locked <0x00000007ab10de98> (a java.io.BufferedInputStream)
at com.sun.mail.util.LineInputStream.readLine(LineInputStream.java:89)