Recommended hardware for a Pi5 setup?

In ZFS, everything goes into ARC/memory first. As it fills… if you have L2ARC configured… the oldest/least-used entries slowly get moved to L2ARC and a placeholder pointer to them is left in ARC. So in a small sense… those L2ARC pointers use up real RAM that could otherwise hold ‘real data’ in ARC - but in general you still come out ahead because having the moved data still in L2ARC is still faster than hitting the HDD.

But I’m surprised you can’t have L2ARC at all? I don’t see why not.

I understand what you’re going for: to have a metadata-only L2ARC that would slowly populate over time… so it handled all filewalker IO. And having it persistent (which it sounds like isn’t working?) would prevent you from having to slowly refill L2ARC from-scratch after every reboot. Sounds like a good idea!

A metadata special-device is a bit different. It handles 100% of the metadata IO, all the time, no cache to warm up, and it’s not filled by data evicted from ARC. It has no RAM limitations. It also handles all metadata writes (something L2ARC doesn’t do). And it can also (optionally) handle small-files so the HDD only deals with the larger stuff it’s better at. But unlike L2ARC (which can disappear or fail at any time, no problem) … if you lose a metadata special-device: you’ve lost the filesystem. That’s why… especially if it was handling 8 nodes for you: you’d want to to at least mirror it. But just for testing you could use a single device like your Optane.

ARC is still doing it’s own thing on top: and may decide to hold some metadata in RAM: that will always still be a win. But if it’s not in ARC… all metadata that filewalker touches would be on that metadata special-device SSD. So always speedy, and never touching the HDD. But a potential point of failure.

It looks like IsThisOn got some of his fastest/most-consistent filewalker performance from his special-metadata config.

2 Likes

It is tempting. I will keep that in mind as a backup plan. But first I will try out this: storagenode/blobstore: blobstore with caching file stat information (… · storj/storj@2fceb6c · GitHub

At what version it will start working?7
found, 1.108 version looks like.
But what mean hot cache? it will be in ram or nvme possible to add?

I would not bother with pi. Power consumption and price is high, while performance is low.
Compare that with an Intel Atom C3758 that idles at 10W full system.

And I would not even get this, since I don’t trust Chinese or Hongkong companies to offer support or warranty.

1 Like

And the winner is the new blobstore cache. On a small node it gives me the same gain but it isn’t limited by RAM.

In about 2 weeks I will have a 5TB node migrated to ext4. Lets see how that performs in comparison.

so you updated on 1.108? it runs faster? because of cache?

5 Gb NIC on raspi Full 5 Gigabit Ethernet on Raspberry Pi 5 with iocrest Realtek RTL8126 adapter - Jiri Brejcha

I’m sorry, but for what? Your bottleneck is a USB to HDD middleware…

Depends on what you mean by that, but it needs to be assembled first just like L2ARC.

I reverted to the mirrored special device. And it’s working marvellous, to be honest:

root@VM-HOST:~# lsblk -o PATH,SIZE,LABEL,PARTLABEL,UUID,VENDOR,FSTYPE,rota
PATH              SIZE LABEL       PARTLABEL            UUID                                 VENDOR   FSTYPE     ROTA
/dev/sda        931,5G                                                                       ATA                    0
/dev/sda1       111,8G VM-HOST     VM-HOST-1            9547310d-8093-4e66-accc-f9a7d608bee4          btrfs         0
/dev/sda15         15G storjdata4  STORJ4-META          12932962326981665985                          zfs_member    0
/dev/sda16          5G storjdata22 STORJ22-META         17156441812472959081                          zfs_member    0
/dev/sda17         15G storjdata10 STORJ10-META         9667372469621822347                           zfs_member    0
/dev/sda18          3G storjdata11 STORJ11-META         3131107259782492802                           zfs_member    0
/dev/sda19          3G storjdata16 STORJ16-META         10625861142284363320                          zfs_member    0
/dev/sda20         15G storjdata18 STORJ18-META         6868956751607392789                           zfs_member    0
/dev/sda21          8G storjdata6  STORJ6-META          5312916681478188263                           zfs_member    0
/dev/sda22          8G storjdata9  STORJ9-META          5253881928466273503                           zfs_member    0
# cut some irrelevant stuff
/dev/sdf          1,4T                                                                       SAMSUNG                1
/dev/sdf1         1,4T storjdata6  zfs-1391287a2e5fd9d2 5312916681478188263                           zfs_member    1
/dev/sdf9           8M                                                                                              1
/dev/sdg          1,4T                                                                       Hitachi                1
/dev/sdg1         1,4T storjdata9  zfs-ace96bd612c38442 5253881928466273503                           zfs_member    1
/dev/sdg9           8M                                                                                              1
# irrelevant
/dev/sdi          2,7T                                                                       WD                     1
/dev/sdi1         2,7T storjdata4  zfs-07f56700262ef24b 12932962326981665985                          zfs_member    1
/dev/sdi9           8M                                                                                              1
/dev/sdj          2,7T                                                                       WD                     1
/dev/sdj1         2,7T storjdata18 zfs-13156e834a813ec9 6868956751607392789                           zfs_member    1
/dev/sdj9           8M                                                                                              1
/dev/sdk        931,5G                                                                       Samsung                1
/dev/sdk1       931,5G storjdata22 zfs-7627b31b68853fd3 17156441812472959081                          zfs_member    1
/dev/sdk9           8M                                                                                              1
/dev/sdl        477,5G                                                                       Mass                   1
/dev/sdl1       477,5G storjdata11 zfs-46912d350c39ebb9 3131107259782492802                           zfs_member    1
/dev/sdl9           8M                                                                                              1
/dev/sdm          2,7T                                                                       BUFFALO                1
/dev/sdm1         2,7T storjdata10 zfs-0e9b458b17d1abae 9667372469621822347                           zfs_member    1
/dev/sdm9          64M                                                                                              1
/dev/sdn        476,9G                                                                       Mass                   1
/dev/sdn1       476,9G storjdata16 zfs-685857c77212e77b 10625861142284363320                          zfs_member    1
/dev/sdn9           8M                                                                                              1
/dev/sdo        476,9G                                                                       Realtek                0
/dev/sdo1       476,9G STORJ-DATA  STORJ17-DATA         57ce79f6-203e-442c-bfc8-22a9e9e75c1a          ext4          0
/dev/zram0       44,3G                                                                                              0
/dev/nvme0n1      1,8T                                                                                              0
/dev/nvme0n1p1  112,2G VM-HOST     VM-HOST              9547310d-8093-4e66-accc-f9a7d608bee4          btrfs         0
/dev/nvme0n1p2    512M             EFI                  1066-2F4F                                     vfat          0
/dev/nvme0n1p3    1,3T STORJ-DATA  STORJ23-DATA         72a134ac-e240-4477-8e0d-f5d2acb36ccf          xfs           0
/dev/nvme0n1p15    15G storjdata4  STORJ4-METAD         12932962326981665985                          zfs_member    0
/dev/nvme0n1p16     5G storjdata22 STORJ22-METAD        17156441812472959081                          zfs_member    0
/dev/nvme0n1p17    15G storjdata10 STORJ10-METAD        9667372469621822347                           zfs_member    0
/dev/nvme0n1p18     3G storjdata11 STORJ11-METAD        3131107259782492802                           zfs_member    0
/dev/nvme0n1p19     3G storjdata16 STORJ16-METAD        10625861142284363320                          zfs_member    0
/dev/nvme0n1p20    15G storjdata18 STORJ18-METAD        6868956751607392789                           zfs_member    0
/dev/nvme0n1p21     8G storjdata6  STORJ6-METAD         5312916681478188263                           zfs_member    0
/dev/nvme0n1p22     8G storjdata9  STORJ9-METAD         5253881928466273503                           zfs_member    0

root@VM-HOST:~# zpool list -v
NAME                     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
storjdata10             2.73T  2.35T   393G        -         -     6%    85%  1.00x    ONLINE  -
  zfs-0e9b458b17d1abae  2.73T  2.34T   385G        -         -     6%  86.2%      -    ONLINE
special                     -      -      -        -         -      -      -      -  -
  mirror-1              14.5G  6.77G  7.73G        -         -    62%  46.7%      -    ONLINE
    STORJ10-METAD         15G      -      -        -         -      -      -      -    ONLINE
    STORJ10-META          15G      -      -        -         -      -      -      -    ONLINE
storjdata11              479G   403G  76.2G        -         -    41%    84%  1.00x    ONLINE  -
  zfs-46912d350c39ebb9   477G   401G  74.7G        -         -    41%  84.3%      -    ONLINE
special                     -      -      -        -         -      -      -      -  -
  mirror-1              2.75G  1.25G  1.50G        -         -    75%  45.3%      -    ONLINE
    STORJ11-METAD          3G      -      -        -         -      -      -      -    ONLINE
    STORJ11-META           3G      -      -        -         -      -      -      -    ONLINE
storjdata16              479G   401G  77.6G        -         -    26%    83%  1.00x    ONLINE  -
  zfs-685857c77212e77b   477G   400G  76.0G        -         -    26%  84.0%      -    ONLINE
special                     -      -      -        -         -      -      -      -  -
  mirror-1              2.75G  1.17G  1.58G        -         -    63%  42.4%      -    ONLINE
    STORJ16-METAD          3G      -      -        -         -      -      -      -    ONLINE
    STORJ16-META           3G      -      -        -         -      -      -      -    ONLINE
storjdata18             2.73T  2.35T   397G        -         -     7%    85%  1.00x    ONLINE  -
  zfs-13156e834a813ec9  2.73T  2.34T   387G        -         -     7%  86.1%      -    ONLINE
special                     -      -      -        -         -      -      -      -  -
  mirror-1              14.5G  5.16G  9.34G        -         -    69%  35.6%      -    ONLINE
    STORJ18-METAD         15G      -      -        -         -      -      -      -    ONLINE
    STORJ18-META          15G      -      -        -         -      -      -      -    ONLINE
storjdata22              932G   805G   127G        -         -     5%    86%  1.00x    ONLINE  -
  zfs-7627b31b68853fd3   932G   804G   124G        -         -     5%  86.6%      -    ONLINE
special                     -      -      -        -         -      -      -      -  -
  mirror-1              4.50G  1.60G  2.90G        -         -    70%  35.5%      -    ONLINE
    STORJ22-METAD          5G      -      -        -         -      -      -      -    ONLINE
    STORJ22-META           5G      -      -        -         -      -      -      -    ONLINE
storjdata4              2.73T  2.35T   394G        -         -     3%    85%  1.00x    ONLINE  -
  zfs-07f56700262ef24b  2.73T  2.34T   386G        -         -     3%  86.2%      -    ONLINE
special                     -      -      -        -         -      -      -      -  -
  mirror-1              14.5G  6.34G  8.16G        -         -    64%  43.7%      -    ONLINE
    STORJ4-METAD          15G      -      -        -         -      -      -      -    ONLINE
    STORJ4-META           15G      -      -        -         -      -      -      -    ONLINE
storjdata6              1.37T  1.19T   179G        -         -     3%    87%  1.00x    ONLINE  -
  zfs-1391287a2e5fd9d2  1.36T  1.19T   175G        -         -     3%  87.4%      -    ONLINE
special                     -      -      -        -         -      -      -      -  -
  mirror-1              7.50G  3.56G  3.94G        -         -    64%  47.4%      -    ONLINE
    STORJ6-METAD           8G      -      -        -         -      -      -      -    ONLINE
    STORJ6-META            8G      -      -        -         -      -      -      -    ONLINE
storjdata9              1.37T  1.17T   206G        -         -    29%    85%  1.00x    ONLINE  -
  zfs-ace96bd612c38442  1.36T  1.16T   203G        -         -    29%  85.4%      -    ONLINE
special                     -      -      -        -         -      -      -      -  -
  mirror-1              7.50G  4.39G  3.11G        -         -    74%  58.6%      -    ONLINE
    STORJ9-METAD           8G      -      -        -         -      -      -      -    ONLINE
    STORJ9-META            8G      -      -        -         -      -      -      -    ONLINE

root@VM-HOST:~# iostat -x
Linux 6.1.0-22-amd64 (VM-HOST)  13-07-24        _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2,73    0,11    2,88    4,00    0,00   90,29

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1        195,54   1132,49     0,22   0,11    0,20     5,79  104,35   1336,26     2,78   2,59    0,22    12,81    0,00      0,00     0,00   0,00    0,00     0,00    1,90    0,31    0,06   1,63
sda            168,33    930,11     0,26   0,16    0,24     5,53   93,55   1335,05     3,41   3,52    0,22    14,27    0,00      0,00     0,00   0,00    0,00     0,00    2,16    0,73    0,06   1,83
sdb              0,76     85,09     0,01   1,00    1,71   111,38    0,89    129,21     0,01   0,60    2,02   144,49    0,00      0,00     0,00   0,00    0,00     0,00    0,02    0,86    0,00   0,05
sdc              0,79     75,01     0,01   1,80    1,97    94,87    0,92    132,53     0,01   1,02    2,23   144,68    0,00      0,00     0,00   0,00    0,00     0,00    0,02    0,90    0,00   0,06
sdd              4,38    131,04     0,01   0,22   14,50    29,91    9,86    737,84     0,06   0,59   23,17    74,83    0,00      0,00     0,00   0,00    0,00     0,00    0,04   47,91    0,29   5,26
sde              2,07     77,29     0,01   0,41   18,63    37,25    9,86    737,84     0,06   0,60   22,12    74,84    0,00      0,00     0,00   0,00    0,00     0,00    0,04   26,56    0,26   3,60
sdf              2,44    952,01     0,00   0,00    4,26   390,72    0,75     16,70     0,00   0,01   10,21    22,28    0,00      0,00     0,00   0,00    0,00     0,00    0,21   31,74    0,02   1,69
sdg              2,32    476,72     0,00   0,00    4,03   205,39    0,71     41,62     0,00   0,01    9,67    59,03    0,00      0,00     0,00   0,00    0,00     0,00    0,17   26,50    0,02   1,40
sdh              0,48     55,18     0,01   2,78    2,62   115,67    0,28     86,69     0,00   1,76    3,29   313,58    0,00      0,00     0,00   0,00    0,00     0,00    0,02    1,08    0,00   0,04
sdi              3,00   1053,39     0,00   0,02    8,32   351,33    0,76     68,36     0,00   0,02    1,40    90,53    0,00      0,00     0,00   0,00    0,00     0,00    0,00    0,00    0,03   2,38
sdj              2,92    818,77     0,00   0,02   10,10   280,35    0,94     96,93     0,00   0,02    1,21   103,42    0,00      0,00     0,00   0,00    0,00     0,00    0,00    0,00    0,03   2,83
sdk              2,25    195,91     0,02   0,73   10,40    87,13    0,62     39,08     0,00   0,00    2,34    63,02    0,00      0,00     0,00   0,00    0,00     0,00    0,00    0,00    0,02   1,58
sdl              2,58    235,84     0,00   0,02   10,24    91,30    0,48      6,90     0,00   0,00   14,62    14,41    0,00      0,00     0,00   0,00    0,00     0,00    0,00    0,00    0,03   2,10
sdm              5,07   1862,27     0,00   0,02    4,50   367,24    0,84     23,57     0,00   0,01    0,96    28,04    0,00      0,00     0,00   0,00    0,00     0,00    0,00    0,00    0,02   2,05
sdn              1,56    145,52     0,00   0,03   10,02    93,12    0,24      5,49     0,00   0,01   12,06    23,02    0,00      0,00     0,00   0,00    0,00     0,00    0,00    0,00    0,02   1,14
sdo              1,51     21,19     1,87  55,34    2,77    14,06    8,32    566,73     6,70  44,61    2,96    68,11    0,00      0,00     0,00   0,00    0,00     0,00    0,00    0,00    0,03   2,43
zram0           30,72    122,90     0,00   0,00    0,01     4,00   50,50    224,88     0,00   0,00    0,02     4,45    0,00      0,00     0,00   0,00    0,00     0,00    0,00    0,00    0,00   0,18

Before I reverted, I had an IO-wait of 60+% and idle-time of 15%. Utilization of hard drives was all 70+%. This is a huge improvement, I actually couldn’t pretell beforehand.

As you can see, I only have a 5GB/TB ratio for the special devices with recordsize 512kB. And as you can set, it’s only used about 50% of the data disk usage.
This is my making:

zpool create -o ashift=12 -O compress=lz4 -O atime=off -O primarycache=metadata -O sync=disabled -m /storj/nd10 -O xattr=off -O redundant_metadata=some -O recordsize=512k storjdata10 /dev/sdn -f
zpool add storjdata10 -o ashift=12 special mirror /dev/disk/by-partlabel/STORJ10-METAD /dev/disk/by-partlabel/STORJ10-META -f
1 Like

Why, sounds actually like nonsensical to me. Because 5Gbps = 625MB/s. Give the fact a storagenode is almost all random IO, you could add easily 20 hard drives (probably even 60) before saturating the USB bandwidth.

If you would have a special device before them?
But probably you are right, if they now uses different lanes for USB and other devices.

Even if not, there are not do many devices aside from SSD, video (cards), HDMI over USB, … that can saturate the while bandwidth; all not applicable even in Pi <=4.

So, you are truly believe, that your HDDs can handle 5Gbps of random I/O? Even if we would assume, that all NIC 5Gbps traffic would go to the node(s), without a slow down by other devices?

Did I say so, somewhere?
I only said that you probably need at least 20 but more likely over 60 HDDs to saturate USB 3.0 bandwidth with random IO.

1 Like

Probably more than that. I’ve found a benchmark which shows you can do 50k IOPS with USB 3 with a single flash storage device. This would translate to 200 HDDs worth of IOPS. A more interesting question would be whether RPi5 is capable of generating this kind of traffic.