By:
Andy Zhang
SLES12 SP2的linux上发生的问题,并不常见,但是给出了一些新的思路。
现象是数据库进程达到300个左右时,就无法继续连接数据库了,报以下错误。
ERROR: ORA-12518: TNS:listener could not hand off client connection 15-AUG-2017 01:40:01 * (CONNECT_DATA=(CID=(PROGRAM=myapp)(HOST=__jdbc__)(USER=admin))(SERVER=DEDICATED)(SERVICE_NAME=oracle)) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.22.33.44)(PORT=1521)) * establish *oracle * 12518 TNS-12518: TNS:listener could not hand off client connection TNS-12536: TNS:operation would block TNS-12560: TNS:protocol adapter error TNS-00506: Operation would block Linux Error: 11: Resource temporarily unavailable
问题可以一直重现,但是用户无法找到限制在哪儿,ulimit -a显示没有明显限制:
sa-server-0:grid:+ASM1 # ulimit -a core file size (blocks,-c) 0 data seg size (kbytes,-d) unlimited scheduling priority (-e) 0 file size (blocks,-f) unlimited pending signals (-i) 513378 max locked memory (kbytes,-l) 64 max memory size (kbytes,-m) unlimited open files (-n) 1000000 pipe size (512 bytes,-p) 8 POSIX message queues (bytes,-q) 819200 real-time priority (-r) 0 stack size (kbytes,-s) 8192 cpu time (seconds,-t) unlimited max user processes (-u) 1000000 virtual memory (kbytes,-v) unlimited file locks (-x) unlimited
检查进程限制也没有异常:
sa-server-0:~ # cat /proc/5497/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 33554432 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 513378 513378 processes Max open files 65536 65536 files Max locked memory unlimited unlimited bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 513378 513378 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us
让用户取了listener的strace,的确是clone函数失败,原因是资源不足(Resource temporarily unavailable):
STRACE ------------------- filename=listener.strace 11404 0.000022 poll([{fd=8,events=POLLIN|POLLRDNORM},{fd=11,{fd=13,{fd=14,{fd=15,{fd=16,{fd=17,{fd=3,events=POLLIN|POLLRDNORM}],8,60000) = 2 ([{fd=15,revents=POLLIN|POLLRDNORM},revents=POLLIN|POLLRDNORM}]) <0.000012> 11404 0.000043 read(3," 367 1 0161,fA 177377O230 1 275 : "...,8208) = 247 <0.000010> 11404 0.000028 fcntl(3,F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) <0.000008> 11404 0.000021 fcntl(3,F_SETFL,O_RDWR) = 0 <0.000008> 11404 0.000023 times({tms_utime=5483,tms_stime=2588,tms_cutime=440,tms_cstime=60}) = 1720115043 <0.000009> 11404 0.000096 fcntl(3,F_SETFD,0) = 0 <0.000010> 11404 0.000027 pipe([18,19]) = 0 <0.000012> 11404 0.000026 pipe([20,21]) = 0 <0.000011> 11404 0.000024 clone(child_stack=0,flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,child_tidptr=0x7f4e29b769d0) = -1 EAGAIN (Resource temporarily unavailable) <0.000197> 《============ 11404 0.000219 close(18) = 0 <0.000011> 11404 0.000022 close(19) = 0 <0.000010> 11404 0.000023 close(20) = 0 <0.000009> 11404 0.000021 close(21) = 0 <0.000009>
检查OS log发现了一点端倪:
2017-08-16T02:36:55.560027+08:00server-0 kernel: [ 165.619978] cgroup: fork rejected by pids controller in /system.slice/ohasd.service
' fork rejected by pids controller' 说明对进程数是有限制的。
最终的原因是因为在SUSE 12上增加了systemd的资源控制,其中默认参数:
DefaultTasksMax was default value(512). systemd limited maximum number of tasks that may be created in the unit. 这个值会影响 OS上的maxpid,将该参数设为无限制后解决该问题:
修改 /etc/systemd/system.conf 设置 DefaultTasksMax 的值为'infinity',重启主机。 (编辑:李大同)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|