My notes from performance optimization efforts on ZFS.
Hardware setup for reference: Node holds around 9TB of data today, running on an old 2012-ish era freebsd server; there is a single zfs pool consisting of two 4-disk RAIDZ1 VDEV’s. Out of 32GB of ram, 9 are used by services, and the rest available for ARC. There is 2TB SSD used as L2ARC, but it is not really being utilized, I just had it, it is not necessary in any way. To speed up synchronous writes I have a cheap 16GB Optane device mounted as SLOG. It helps a lot with Time Machine, and to some degree, storj – but see below, it is not necessary either.
The server is used for other tasks, mainly hosting plex/homebridge, serving as time-machine and other backup target, from about 7 computers, and serving two iSCSI drives to windows machines in the LAN.
After making the changes described in no particular order below I don’t notice storj impacting performance of other tasks.
-
General recommendation, often overlooked: Ensure your SSDs are configured with native block size. Some SSDs, especially Samsung, like to pretend they have 512-byte sectors, however that is not the case. When you are adding such device to a pool, override the sector size with the
ashift
parameter. For example, to add most SSDs as a L2ARC device, force 4096 sector size with this: (note1 << 12 == 4096
):zpool add -o ashift=12 pool1 cache ada0
-
Disable access time updates on the datasets holding storage node data: This removes associated IO:
zfs set atime=off pool1/storj
-
You should have a UPS. In this case, a dramatic performance improvement can be accomplished by disabling sync writes on Storj dataset
zfs set sync=disabled pool1/storj
If you don’t have a UPS – don’t disable sync writes, but instead add SLOG: it will offload some of the IO from the main array. I did both: added SLOG to use for sync writes with other data, and disabled sync for storj data, because I have a UPS.
-
Caching: storage node stores two main classes of data: blobs and databases.
For blobs you want to only cache metadata: clients access data randomly, there is little benefit in cache churn.
Databases, on the other hand, benefit strongly from caching: with metadata-only caching on my node the dashboard takes over a minute to refresh. With full caching – about a second.
As you can see there are conflicting requirement. Naturally, the best solution is to keep databases on a separate dataset, with separate caching configuration.
I’ve created that dataset with 64k sector size, to better match sqlite usage. The dataset for the rest of the storj data is kept at default 128k: even though storj recommended chunk size is 64M, I found vast majority of files are significantly smaller than that. 128k sectors provide a good balance between overhead and space usage. Keep the default ZFS compression on – this helps to conserve disk space for incomplete sectors. The databases dataset is mounted separately into the storj jail and the path to it is specified in the config file via
storage2.database-dir:
parameter:# path to store data in storage.path: /mnt/storj ... # directory to store databases. if empty, uses data path storage2.database-dir: "/mnt/storj-databases"
With that, caching is configured as follows:
zfs set primarycache=metadata pool1/storj zfs set primarycache=all pool1/storj/databases
You may go a step further and apply this to a secondary cache as well: as discussed above, caching blob data does not help performance with current access pattern, but may increase SSD usage. When you have significant traffic on the node and the nature of the traffic changes somewhat – i.e. when sufficient amounts of chunks are accessed more than once repeatedly – you may decide to switch secondary cache back to all.
zfs set secondarycache=metadata pool1/storj zfs set secondarycache=all pool1/storj/databases
That’s pretty much all I had.