How can a handful of small nodes completely trash a large array?

The way storage nodes write files is just a bad match to btrfs, see some of my old measurements.

Parity schemes necessarily perform small random writes slower than the slowest drive in the parity scheme. If a storage node needs to update a single sector in a stripe, it has to first read the whole stripe from N-3 disks, only then write the new sector and new parity data to 3 drives. Again, not sure how specifically this works on btrfs, but I’ve identified that on ext4 you would do around 10-20 small random writes for a single upload. I suspect btrfs won’t do much better than that. Given that a single HDD can do 250 IOPS best case, even a few concurrent uploads will have you observe trashing.

I do not have any specific advice as for the actions you can do with your Synology unit, I do not have experience with them. But if you can, avoid parity schemes, and avoid btrfs. Or, if you are willing to tinker, patch your storage node code to remove the synchronous writes, and the use of temp directory for uploads. This will make you violate the current node T&C, but at least your hard drives will thank you.

2 Likes