TL;DR; The process is fairly straightforward if you’re comfortable with the linux command line and the end result is fantastic!
- Raspberry pi 4 4GB;
- 8TB HDD connected to one of Pi’s USB3 port;
- old OCZ Vertex2 120G SSD connected to the second USB3 port.
A few months ago I placed large database files
pieceinfo.db onto the SSD and
--mount'ed them in the docker run command. It helped a bit with the HDD random reads but the initial and periodic disk scans are now the most annoying issue that I wanted to address.
Here’s the last month of disk activity (note I haven’t restarted the node before yesterday). Green lines are reads, red are writes, thin lines are IOPS (left axis), bold lines are throughput (right axis):
bcache was merged into the linux kernel mainline many years ago, Raspbian specifically excluded it from their kernels (not even as a loadable module!) and I previously failed with all my attempts to compile
bcache separately and insert it into the existing kernel. And I shelved this idea for a few months.
To enable the
bcache I built the kernel. Fortunately this procedure is fairly straightforward and is well described. In the
.config file I set:
CONFIG_DM_CRYPT=y. For some reason without this change I couldn’t mount my encrypted partition even though both the stock and my modified kernels had this flag set to
=m. I believe it’s safe to leave it as is if you don’t have any dmcrypt devices.
On the Raspberry Pi itself the build took ~one hour (there’s an option to cross-compile). Make, install, reboot into the new kernel, yay.
Fortunately I had my disk partitioned in a way that the
bcache's superblock was already placed before the main filesystem so all I needed was to delete the old partition, create a new one with
fdisk and update the
/etc/fstab. If not you’ll need to free up 8KiB before the partition to perform this trick.
I allocated 16GiB of the SSD for the cache and after the warmup only 25% of it is used.
One important tweak: set the
/sys/block/bcache0/bcache/sequential_cutoff to something like 64KiB. The default is 4MiB which is larger than the storj piece size, however we want it to be smaller than that to let the HDD serve pieces but large enough to accommodate filesystem directories.
This is how the last day of disk activity looks like:
During the process of pieces enumeration which is done at the node start the HDD used to sustain 200-300 IOPS while reading 6-7 MB/s or so. Now the SSD sustains 2500-3000 IOPS while reading 60-70 MB/s. In complete silence
After all setups I moved all the database files back to the HDD and let the
bcache do its job.