centos ceph raid0 bcache 故障osd更换

作者: admin

时间: 2021-10-12

分类: Ceph

首先停止故障osd运行:

# ceph osd out osd.4
# systemctl stop ceph-osd@4.service

解除bcache绑定（停用前端缓存），假设故障osd是建立在bcache1上，缓存盘的cset-uuid为805bc5f1-36e9-4685-a86a-3ea8c03f1172，我osd只是偶尔出现错误即将故障，而不是不能识盘，所以正常解除绑定清理脏数据:

# echo 805bc5f1-36e9-4685-a86a-3ea8c03f1172  > /sys/block/bcache1/bcache/detach

查看缓存数量，降为0全部刷进去后下一步操作：

# cat /sys/block/bcache1/bcache/dirty_data

现场更换硬盘后，查看丢失的Virtual Drive编号：

# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL –aAll | grep Virtual
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Virtual Drive: 1 (Target Id: 1)
Virtual Drive: 2 (Target Id: 2)
Virtual Drive: 4 (Target Id: 4)

# /opt/MegaRAID/MegaCli/MegaCli64 -GetPreservedCacheList -a0
                                     
Adapter #0

Virtual Drive(Target ID 03): Missing.

可以确认丢失的Virtual Drive是Target ID 03，清理缓存：

# /opt/MegaRAID/MegaCli/MegaCli64 -DiscardPreservedCache -L3 -a0
                                     
Adapter #0

Virtual Drive(Target ID 03): Preserved Cache Data Cleared.

Exit Code: 0x00

重建raid0，Slot Number是4：

# /opt/MegaRAID/MegaCli/MegaCli64 -CfgLdAdd -r0 [32:4] WB Direct -a0

为新的数据盘绑定bcache，假设重建后盘符为 /dev/sdd：

# wipefs -a /dev/sdd
# make-bcache -B /dev/sdd -C /dev/nvme0n1

卸载osd挂载目录

# umount /var/lib/ceph/osd/ceph-4

停用ceph集群数据平衡：

# for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd set $i;done

从crush map 中移除osd：

# ceph osd crush remove osd.4
removed item id 4 name 'osd.4' from crush map

删除故障osd的密钥：

# ceph auth del osd.4
updated

删除故障osd：

#ceph osd rm 4

接下来添加新的osd，ceph-node-3节点bcache3：

#ceph-deploy osd create ceph-node-3 --data /dev/bcache3

完成后启用ceph集群数据平衡：

# for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd unset $i;done

部分摘录学习自：

https://blog.csdn.net/ct1150/article/details/87367518
https://blog.csdn.net/signmem/article/details/110927220
https://www.cnblogs.com/ajunyu/p/11165950.html

centos ceph raid0 bcache 故障osd更换

添加新评论

最新文章

最近回复

分类

归档

其它