But the beauty of the technology is that opinions don’t matter, and facts are verifiable. I did not see any verifiable data from those posts beyond the pacebo effect – “node feels faster”.
Logic, on the other hand, does not support any defragmentation benefits.
Defragmentation helps in two way, both of which are aimed to minimize need for seek and eliminate seek latency:
- Consolidating MFT – this shall never get fragmented if you keep 10% of disk free
- Consolidating chunks of single file together so that disk does not need to seek mid-read, thus allowing you to achieve maximum datasheet promised sequential throughput on large files located on the outer disk tracks.
This latter point is irrelevant for storj: vast majority of files are smaller than 4k – i.e. they cannot get fragmented by definition, and the rest are smaller than 16k; provided that file allocation size quantizes by the sector size, and you have sufficient free space, only fraction of that fraction ends up fragmented.
Moreover, with the node operation, sequential read is never a bottleneck – node does so little of it it’s not even worth the bytes this message consists of. Random access, driven by customer requests, is a bottleneck here, in addition to the database usual file locking and updates.
Hence, defragmenting the disk may have a small positive effect (you still have metadata seek, and only eliminate mid-read seek) on few thousand of files, and have no effect on many million of smaller files. At the same time, probability that those files will get accesses is therefore just as tiny, so whatever small benefit could be gained – likely won’t be.
These are verifiable facts: you can build a histogram of file sizes for your node yourself, and compare it with the sector size, and do the same mental experiment. BTW, some defragmenting tools report how many files are fragmented, and then how often are those files accessed by the node:- that would be another way to confirm that running defragmentation is pointless for any reason except seeing nice blue cubes neatly aligned in the window (I recommend Tetris, as a replacement app to satisfy that craving)
Drawbacks of defragmentation are real: if your filesystem supports and contains snapshots – you are (sometimes, dramatically) wasting space by defragmenting. During defragmentation your disk is IO limited, hindering performance of everything else that needs it.
How to improve random IO performance then? It has been discussed, from using tiered storage solutions, or zfs with special device, where MFT and small files should end up on SSD, to filesystem tweaking to eliminate random IO completely, such as turning off SYNC, and increasing system filesystem buffer sizes (including carefully tweaking transaction group size for array performance on ZFS).
Various vendors will be happy to sell you defragmentation tools, but that is snake oil: even if that helped, the benefit is so minuscule, that it’s not worth doing it. Moreover, modern filesystem don’t fragment as the old ones used to. The key however to keep 10-15% of space unoccupied. To be clear, even if you don’t do that for the storagenode – it won’ matters, as vast majority of the node files won’t be fragmented in the first place, partly due to being stored in MFT, in case of NTFS.