今天ceph db出现溢出情况,记录一下解决方式。

[root@ceph-node-1 ceph-admin-node]#  ceph health detail
 HEALTH_WARN 6 OSD(s) experiencing BlueFS spillover
[WRN] BLUEFS_SPILLOVER: 6 OSD(s) experiencing BlueFS spillover
     osd.4 spilled over 17 MiB metadata from 'db' device (3.6 GiB used of 35 GiB) to slow device
     osd.5 spilled over 47 MiB metadata from 'db' device (3.2 GiB used of 35 GiB) to slow device
     osd.6 spilled over 125 MiB metadata from 'db' device (3.1 GiB used of 35 GiB) to slow device
     osd.12 spilled over 4.1 MiB metadata from 'db' device (3.9 GiB used of 35 GiB) to slow device
     osd.13 spilled over 21 MiB metadata from 'db' device (3.8 GiB used of 35 GiB) to slow device
     osd.14 spilled over 22 MiB metadata from 'db' device (3.8 GiB used of 35 GiB) to slow device

压缩临时解决:

ceph daemon osd.{id} compact

这可以压缩db分区内的数据,减小大小,错误可能会消失,但如果溢出过多、可用容量太小则不一定有效。如果你不希望继续碰到该问题,则有以下2个根本性解决方案。

设置db占用容量

参考:https://yourcmc.ru/wiki/Ceph_performance#About_block.db_sizing
原文描述:

Official documents say that you should allocate 4 % of the slow device
space for block.db (Bluestore’s metadata partition). This is a lot,
Bluestore rarely needs that amount of space.

But the main problem is that Bluestore uses RocksDB and RocksDB puts a
file on the fast device only if it thinks that the whole layer will
fit there (RocksDB is organized in files). Default RocksDB settings in
Ceph are:

1 GB WAL = 4x256 Mb max_bytes_for_level_base and
max_bytes_for_level_multiplier are default, thus 256 Mb and 10,
respectively so L1 = 256 Mb L2 = 2560 Mb L3 = 25600 Mb

建议的大小是磁盘容量的4%,这很大,可能不是那么经济,当然如果有钱则越大越好。但就本文问题来说你应该检查是否修改了max_bytes_for_level_base(默认256MB)、max_bytes_for_level_multiplier(默认10)的值,如果它是默认256MB则最高会用到L3加速,则25600+2560+256+1024 Mb最低需要30GB DB容量,而考虑进制问题你应该最低为分区分配35-40GB容量,如果有足够的剩余容量你应该考虑迁移DB分区至更大的分区(可以新建个分区或插入新的SSD盘),否则你需要降低max_bytes_for_level_base或max_bytes_for_level_multiplier的值。

在octopus版本后,ceph可以充分利用额外的空间,也就是如果是默认值的情况下以前最高用到30GB,因为RocksDB不会应用L4,你分配64GB则是无意义的,多出的容量不会被用到,浪费了;现在则不同,多出来的容量也会被充分利用。

修改完配置后同步发送至其他节点:

ceph-deploy --overwrite-conf config push ceph-node-1 ceph-node-2 ceph-node-3 ceph-node-4

每个节点逐一重启守护进程:

systemctl restart ceph.target

注意:当一个节点在进行重启,必须ceph -s状态恢复正常后再下一个

迁移db至更大的分区

自octopus版本起,ceph提供了非常方便的迁移方式:

ceph-volume lvm migrate --osd-id OSD_ID --osd-fsid OSD_FSID --target TARGET_LV --from {data|db|wal} [{data|db|wal} ...]

更多介绍:https://docs.ceph.com/en/octopus/man/8/ceph-volume/

标签: Ceph

添加新评论