Running multiple nodes on the one pool (RAID)

You bring up a few interesting points.

SLOG is a “separate log device”, its used exclusively by sync writes. If it fails — it’s harmless. The file system will see that the device is not accepting new writes, stop using it, and degrade the pool. You will suffer a period of worse performance while you are replacing the failed SLOG, but no data loss will occur. The only opportunity for data loss will be if power loss occurs, and then slog fails during boot. You might lose last few transactions, but that’s extremely unlikely with redundant PSU and and a UPS. And yet, still harmless in the grand scheme of things.

This is far into the future, seeing how 3TB sized node receives 300kBps average traffic today, but let’s say you do get 50MBps random traffic from STORJ customers to your node. Then I’d argue the raid array with caches will handle that workload much better than a single HDD: Random access performance of a single drive is horrific. So you pretty much need some sort of caching to absorb that. Which brings us back to – do you want to buy 5 sets of cache drives or one. (And if you have multiple nodes on multiple HDDs—they share traffic anyway, so it’s no different in terms of workload compared to an array).

Absolutely. But since newer data has higher probability to be retrieved than old data, the cache will still be helping. Actually, it does not even have to be too huge - just to fit the metadata (look up tables (directory structures, file bitmap, what have you). There is probably not much benefit in caching actual data.

For the life of a cache drive—I don’t care much about it, they are disposable. If for winning races I have to use cache—I have to use cache, regardless of whether the storage is an array or multiple nodes.

Tangentially, for a lot of users, (if not all) hosting storage nodes makes sense if you have free space to share, to somewhat offset costs and feel good about contributing to the project – that was the original intent. So this pretty much rules out single drives: I have no use for single drive volumes, other than storj. So array is not really a choice.