机器配置:E5-2680v2*1 / 64GB内存 / Samsung 870 EVO系统盘,均在运行ceph,生产环境下有一定负荷但不在繁忙。2个卡式NVME也都已经作为CEPH WAL/DB/BCACHE拉了一些HDD了,其中三星的带了4-5个HDD,傲腾1-2个,我拿了一个分区来做测试。数据:

Samsung 983ZET 480GB:

[root@ceph-node-3 ~]# fio -name=zet983test -filename=/dev/nvme0n1p3 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=1G -numjobs=30 -runtime=100 -group_reporting -name=randwrite-4k
zet983test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
randwrite-4k: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 30 threads and 1 process
randwrite-4k: you need to specify size=
fio: pid=0, err=22/file:filesetup.c:950, func=total_file_size, error=Invalid argument
Jobs: 30 (f=30): [w(30),X(1)][100.0%][r=0KiB/s,w=609MiB/s][r=0,w=156k IOPS][eta 00m:00s]
zet983test: (groupid=0, jobs=30): err= 0: pid=12895: Thu Feb 17 19:06:56 2022
  write: IOPS=132k, BW=517MiB/s (542MB/s)(30.0GiB/59393msec)
    clat (usec): min=15, max=3437, avg=224.75, stdev=99.36
     lat (usec): min=16, max=3438, avg=224.95, stdev=99.36
    clat percentiles (usec):
     |  1.00th=[   58],  5.00th=[   79], 10.00th=[  113], 20.00th=[  155],
     | 30.00th=[  176], 40.00th=[  190], 50.00th=[  206], 60.00th=[  227],
     | 70.00th=[  255], 80.00th=[  293], 90.00th=[  355], 95.00th=[  416],
     | 99.00th=[  545], 99.50th=[  586], 99.90th=[  668], 99.95th=[  709],
     | 99.99th=[  783]
   bw (  KiB/s): min= 7806, max=19323, per=2.61%, avg=13840.65, stdev=2986.98, samples=3540
   iops        : min= 1951, max= 4830, avg=3459.78, stdev=746.75, samples=3540
  lat (usec)   : 20=0.01%, 50=0.10%, 100=8.00%, 250=60.00%, 500=30.04%
  lat (usec)   : 750=1.83%, 1000=0.02%
  lat (msec)   : 2=0.01%, 4=0.01%
  cpu          : usr=1.45%, sys=2.68%, ctx=7866073, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,7864320,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=517MiB/s (542MB/s), 517MiB/s-517MiB/s (542MB/s-542MB/s), io=30.0GiB (32.2GB), run=59393-59393msec

Disk stats (read/write):
  nvme0n1: ios=80/7894300, merge=0/5135, ticks=5/1726778, in_queue=8, util=100.00%
[root@ceph-node-3 ~]# 

[root@ceph-node-4 ~]# fio -name=zet983test -filename=/dev/nvme0n1p15 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=1G -numjobs=30 -runtime=100 -group_reporting -name=randwrite-4k
zet983test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
randwrite-4k: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 30 threads and 1 process
randwrite-4k: you need to specify size=
fio: pid=0, err=22/file:filesetup.c:950, func=total_file_size, error=Invalid argument
Jobs: 30 (f=30): [w(30),X(1)][100.0%][r=0KiB/s,w=518MiB/s][r=0,w=132k IOPS][eta 00m:00s]
zet983test: (groupid=0, jobs=30): err= 0: pid=680749: Thu Feb 17 19:20:37 2022
  write: IOPS=94.7k, BW=370MiB/s (388MB/s)(30.0GiB/83059msec)
    clat (usec): min=15, max=19836, avg=314.99, stdev=271.10
     lat (usec): min=15, max=19837, avg=315.22, stdev=271.11
    clat percentiles (usec):
     |  1.00th=[   65],  5.00th=[  178], 10.00th=[  188], 20.00th=[  204],
     | 30.00th=[  217], 40.00th=[  239], 50.00th=[  265], 60.00th=[  297],
     | 70.00th=[  334], 80.00th=[  388], 90.00th=[  474], 95.00th=[  562],
     | 99.00th=[ 1057], 99.50th=[ 1418], 99.90th=[ 3326], 99.95th=[ 5407],
     | 99.99th=[ 9765]
   bw (  KiB/s): min= 4929, max=15557, per=2.63%, avg=9951.47, stdev=2497.12, samples=4950
   iops        : min= 1232, max= 3889, avg=2487.50, stdev=624.29, samples=4950
  lat (usec)   : 20=0.01%, 50=0.25%, 100=1.49%, 250=43.12%, 500=47.22%
  lat (usec)   : 750=5.71%, 1000=1.10%
  lat (msec)   : 2=0.87%, 4=0.17%, 10=0.07%, 20=0.01%
  cpu          : usr=1.07%, sys=2.00%, ctx=7869217, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,7864320,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=370MiB/s (388MB/s), 370MiB/s-370MiB/s (388MB/s-388MB/s), io=30.0GiB (32.2GB), run=83059-83059msec

Disk stats (read/write):
  nvme0n1: ios=8830/8214198, merge=551/71100, ticks=1295/2627371, in_queue=202487, util=100.00%

Samsung 983ZET 480GB的4k测试IOPS是从几十k缓缓爬上100k左右浮动的,我测试了2台服务器上不同的983ZET,得到的数据、IOPS上升速度大体相近。

接下来Intel Optane AIC 280GB(傲腾)数据:

[root@ceph-node-4 ~]# fio -name=inteloptanetest -filename=/dev/nvme1n1p3 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=1G -numjobs=30 -runtime=100 -group_reporting -name=randwrite-4k
inteloptanetest: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
randwrite-4k: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 30 threads and 1 process
randwrite-4k: you need to specify size=
fio: pid=0, err=22/file:filesetup.c:950, func=total_file_size, error=Invalid argument
Jobs: 30 (f=30): [w(30),X(1)][100.0%][r=0KiB/s,w=2470MiB/s][r=0,w=632k IOPS][eta 00m:00s]
inteloptanetest: (groupid=0, jobs=30): err= 0: pid=680062: Thu Feb 17 19:13:14 2022
  write: IOPS=630k, BW=2461MiB/s (2581MB/s)(30.0GiB/12481msec)
    clat (usec): min=15, max=8567, avg=45.46, stdev=23.56
     lat (usec): min=15, max=8567, avg=45.72, stdev=23.58
    clat percentiles (usec):
     |  1.00th=[   29],  5.00th=[   35], 10.00th=[   38], 20.00th=[   41],
     | 30.00th=[   42], 40.00th=[   44], 50.00th=[   44], 60.00th=[   45],
     | 70.00th=[   47], 80.00th=[   48], 90.00th=[   51], 95.00th=[   58],
     | 99.00th=[   85], 99.50th=[  111], 99.90th=[  258], 99.95th=[  359],
     | 99.99th=[  750]
   bw (  KiB/s): min=65017, max=84745, per=2.73%, avg=68778.51, stdev=3258.46, samples=720
   iops        : min=16254, max=21186, avg=17194.25, stdev=814.61, samples=720
  lat (usec)   : 20=0.02%, 50=87.82%, 100=11.53%, 250=0.53%, 500=0.08%
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=7.57%, sys=15.78%, ctx=7881856, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,7864320,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=2461MiB/s (2581MB/s), 2461MiB/s-2461MiB/s (2581MB/s-2581MB/s), io=30.0GiB (32.2GB), run=12481-12481msec

Disk stats (read/write):
  nvme1n1: ios=127/7815557, merge=39/2455, ticks=9/282297, in_queue=14, util=100.00%

虽然三星多带了3-4个HDD,也是简单的测试,数据可能有些许偏颇,但这差距也太大了一些。。。做CEPH缓存、BCACHE,买傲腾就对了,千万不要贪小便宜,少很多麻烦,不然就跟我一样现在要头疼。

标签: Ceph, Bcache

已有 2 条评论

  1. cepher cepher

    hi, 您好。请教下,optane 900p 280GB这款SSD作为bcache的caching盘,backing盘为4T,那么caching盘的size多大合适?还是说像您文章提到的分50GB就可以了?谢谢。

    1. 50GB就好,如果条件允许,当然越大越好

添加新评论