现象

一个spdk 程序执行一段时间后出现:

[root@qd0 dataserver1]# less /var/log/messagesEAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file
EAL: Couldn't get fd on hugepage file

分析

看 spdk/dpdk 相关代码,发现在下面及各种情况下可能出现上面的提示:

  • 文件描述符不够
  • 打开文件失败

打开文件描述符不够

检查系统打开文件设置

ulimit -a
file size               (blocks, -f) unlimited
pending signals                 (-i) 513271
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 513271
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[root@qd01-ebs-xuanwu177003043 dataserver1]# EAL: Couldn't get fd on hugepage file

检查当前程序实际打开文件数量的限制

[root@qd01-ebs-xuanwu177003041 11537]# pwd
/proc/11537
[root@qd01-ebs-xuanwu177003041 11537]# cat limits
[root@qd01-ebs-xuanwu177003041 11537]# cat limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        5368709120           5368709120           bytes
Max resident set          unlimited            unlimited            bytes
Max processes             513271               513271               processes
Max open files            1024                 1024              files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       513271               513271               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

看到上面实际只有1024个文件描述符,把它设置大一些:

解决办法

设置系统打开文件数量限制

ulimit -n 1024000

重启进程