A short story about how nvme disk cache affects node startup

About a week ago I added disk cache to node storage.
Previously the initial filewalker took 30+ minutes.
Now…

Node start:
2023-02-10T12:42:15.823Z INFO Configuration loaded {"Process": "storagenode", "Location": "/app/config/config.yaml"}

CPU load:
Screenshot 2023-02-10 140150
Even during filewalker iowait level ~ 20%.

Storage utilization:
Screenshot 2023-02-10 140304

(I started netdata VM later, so empty to 13:43)

Looks like whole filewalker took 5 minutes after complete host and node vm restart.
So no RAM cached buffers at all, only nvme disk cache.

4 Likes

that’s really interesting, thank you for sharing!
Can you share some more details about your server? How exactly did you add the NVMe cache? Is it part of an ZFS raid? Is it only read-cache or also write-cache?

Why did’t you just disable filewalker?

Sure, here are the details:

It’s not rack server. Hardware is CPU Ryzen 3700X + 32GB DDR4 RAM in stylish Fractal Design Define 7 XL case, where you can put 10 3.5" HDDs without any problem,
and most important HDDs are in bays with rubber pads so noise and vibration levels are minimal - you can sleep in same room :slight_smile:
Storage for storj is made of 4x Seagate IRONWOLF 8TB, connected to external pcie SATA controller, since you never have enough SATA connectors.
NVME is Samsung Evo 970+, plugged in free m.2 slot, used for chia plotting before, with reasonable wear level.

MDM software raid-5 is made on top of those HDDs, and of top of that lvm vg with lv dedicated to storj, with ext4 fs.
Cache is 400GB lvm cache-pool with data and metadata lvs.
Since you can create and remove lvm cache whenever you want without destroing real data lv it was IMO best option.
VM for storj uses 2 CPU cores and 8GB RAM with lv connected via /dev/mapper.

1 Like

Well, it’s not NAS with 2GB of RAM and without upgrade possibility where filewalker runs 12h+
and it’s not about filewalker itself but about making solution with the best performance and efficiency I can.

Adding a SSD cache, means just adding another point of failure for Storj nodes and another component to watch for, and if you have enaugh RAM, dosen’t realy makes sense. If you need that for something else, OK, but for Storj the simpler the better. I managed to install 18 GB RAM on Synology and works great with 8TB data stored. FW takes about 20 min after system restart. I also have 1 DS216+ 1GB with 2 Ironwolfs 8TB (7 TB data in total) and I disabled FW. Works great. Ofcourse it losses more races than the 18GB one, but in the end I see very close earnings for both.
If you have so much RAM and you don’t use the system for anything else that requires that RAM, adding the SSD cache just makes the RAM wasted. And as speed, RAM always beats NVMe or any hard storage.

My experience contradicts the above theoretical claims.

In case your claims about pointlessness of SSD caches with Storj aren’t theoretical: Please post some measurements in order to demonstrate the validity of your points. Thanks.

1 Like

It seems that FW would not be a problem anymore. See post:

Potential GC speedup by reversing lstat and checking Bloom filter?

I saw SNOs using NVMe for databases, not only cache. That makes sense indeed.

This type of fast disk cache is used for everything, including database files.
Filewalker is just example for long, resources consuming task.
You have faster save of downloaded pieces, faster piece uploads from cache.
And we say about 10-100x faster. IMO nvme with 1000 TBW will live far longer than hdd with heads looking for millions of files across the platters.