CEPH DB分区溢出解决方式/迁移分区及设置DB使用容量
今天ceph db出现溢出情况,记录一下解决方式。
[root@ceph-node-1 ceph-admin-node]# ceph health detail
HEALTH_WARN 6 OSD(s) experiencing BlueFS spillover
[WRN] BLUEFS_SPILLOVER: 6 OSD(s) experiencing BlueFS spillover
osd.4 spilled over 17 MiB metadata from 'db' device (3.6 GiB used of 35 GiB) to slow device
osd.5 spilled over 47 MiB metadata from 'db' device (3.2 GiB used of 35 GiB) to slow device
osd.6 spilled over 125 MiB metadata from 'db' device (3.1 GiB used of 35 GiB) to slow device
osd.12 spilled over 4.1 MiB metadata from 'db' device (3.9 GiB used of 35 GiB) to slow device
osd.13 spilled over 21 MiB metadata from 'db' device (3.8 GiB used of 35 GiB) to slow device
osd.14 spilled over 22 MiB metadata from 'db' device (3.8 GiB used of 35 GiB) to slow device
ceph daemon osd.{id} compact
Official documents say that you should allocate 4 % of the slow device
space for block.db (Bluestore’s metadata partition). This is a lot,
Bluestore rarely needs that amount of space.But the main problem is that Bluestore uses RocksDB and RocksDB puts a
file on the fast device only if it thinks that the whole layer will
fit there (RocksDB is organized in files). Default RocksDB settings in
Ceph are:1 GB WAL = 4x256 Mb max_bytes_for_level_base and
max_bytes_for_level_multiplier are default, thus 256 Mb and 10,
respectively so L1 = 256 Mb L2 = 2560 Mb L3 = 25600 Mb
建议的大小是磁盘容量的4%,这很大,可能不是那么经济,当然如果有钱则越大越好。但就本文问题来说你应该检查是否修改了max_bytes_for_level_base(默认256MB)、max_bytes_for_level_multiplier(默认10)的值,如果它是默认256MB则最高会用到L3加速,则25600+2560+256+1024 Mb最低需要30GB DB容量,而考虑进制问题你应该最低为分区分配35-40GB容量,如果有足够的剩余容量你应该考虑迁移DB分区至更大的分区(可以新建个分区或插入新的SSD盘),否则你需要降低max_bytes_for_level_base或max_bytes_for_level_multiplier的值。
ceph-deploy --overwrite-conf config push ceph-node-1 ceph-node-2 ceph-node-3 ceph-node-4
systemctl restart ceph.target
注意:当一个节点在进行重启,必须ceph -s状态恢复正常后再下一个
ceph-volume lvm migrate --osd-id OSD_ID --osd-fsid OSD_FSID --target TARGET_LV --from {data|db|wal} [{data|db|wal} ...]