Enable bcache on RaspberryPi 4 4G node ✔️

xopok · July 20, 2020, 9:28pm

TL;DR; The process is fairly straightforward if you’re comfortable with the linux command line and the end result is fantastic!

My setup:

Raspberry pi 4 4GB;
8TB HDD connected to one of Pi’s USB3 port;
old OCZ Vertex2 120G SSD connected to the second USB3 port.

A few months ago I placed large database files orders.db and pieceinfo.db onto the SSD and --mount'ed them in the docker run command. It helped a bit with the HDD random reads but the initial and periodic disk scans are now the most annoying issue that I wanted to address.
Here’s the last month of disk activity (note I haven’t restarted the node before yesterday). Green lines are reads, red are writes, thin lines are IOPS (left axis), bold lines are throughput (right axis):

bcache

Even though bcache was merged into the linux kernel mainline many years ago, Raspbian specifically excluded it from their kernels (not even as a loadable module!) and I previously failed with all my attempts to compile bcache separately and insert it into the existing kernel. And I shelved this idea for a few months.

Until today.

To enable the bcache I built the kernel. Fortunately this procedure is fairly straightforward and is well described. In the .config file I set:

CONFIG_BCACHE=y
CONFIG_DM_CRYPT=y. For some reason without this change I couldn’t mount my encrypted partition even though both the stock and my modified kernels had this flag set to =m. I believe it’s safe to leave it as is if you don’t have any dmcrypt devices.

On the Raspberry Pi itself the build took ~one hour (there’s an option to cross-compile). Make, install, reboot into the new kernel, yay.

Fortunately I had my disk partitioned in a way that the bcache's superblock was already placed before the main filesystem so all I needed was to delete the old partition, create a new one with fdisk and update the /etc/fstab. If not you’ll need to free up 8KiB before the partition to perform this trick.

I allocated 16GiB of the SSD for the cache and after the warmup only 25% of it is used.

One important tweak: set the /sys/block/bcache0/bcache/sequential_cutoff to something like 64KiB. The default is 4MiB which is larger than the storj piece size, however we want it to be smaller than that to let the HDD serve pieces but large enough to accommodate filesystem directories.

This is how the last day of disk activity looks like:

During the process of pieces enumeration which is done at the node start the HDD used to sustain 200-300 IOPS while reading 6-7 MB/s or so. Now the SSD sustains 2500-3000 IOPS while reading 60-70 MB/s. In complete silence

After all setups I moved all the database files back to the HDD and let the bcache do its job.

articulateape · July 25, 2020, 4:57pm

Sorry but my knowledge of this is limited, but what does this do? I have a R Pi 4, is this a recommended course?

andrew2.hart · July 25, 2020, 5:38pm

It uses an extra solid state disk to optimise writes and cache reads to a slower hard drive.

It’s not needed but is cool

xopok · July 25, 2020, 6:24pm

As @andrew2.hart answered, it’s an SSD cache for the spinning disk.

I’m deliberately not using it to cache writes because:

Massive random reads were my main pain point.
If SSD fails, I’m not losing data.

Eioz · October 24, 2022, 2:17pm

Hi @xopok i would like to setup bcache on amd64 debian 11, as for today could you tell me what is your cache used ratio with 8TB HDD ? I would like to setup bcache in writeback mode with mdadm devices, you can see my post here : Setup Bcache on Debian 11 amd64 with mdadm devices.
Thanks