现象
一个spdk 程序执行一段时间后出现:
[root@qd0 dataserver1]# less /var/log/messagesEAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
分析
看 spdk/dpdk 相关代码,发现在下面及各种情况下可能出现上面的提示:
- 文件描述符不够
-
打开文件失败
打开文件描述符不够
检查系统打开文件设置
ulimit -a
file size (blocks, -f) unlimited
pending signals (-i) 513271
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 513271
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@qd01-ebs-xuanwu177003043 dataserver1]# EAL: Couldn't get fd on hugepage file
检查当前程序实际打开文件数量的限制
[root@qd01-ebs-xuanwu177003041 11537]# pwd
/proc/11537
[root@qd01-ebs-xuanwu177003041 11537]# cat limits
[root@qd01-ebs-xuanwu177003041 11537]# cat limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 5368709120 5368709120 bytes
Max resident set unlimited unlimited bytes
Max processes 513271 513271 processes
Max open files 1024 1024 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 513271 513271 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
看到上面实际只有1024个文件描述符,把它设置大一些:
解决办法
设置系统打开文件数量限制
ulimit -n 1024000