-Seagate expansion 4TB (STEA4000400) works very well even filled, i have 2 units.
-Toshiba Canvio Basic 4TB are just horrible disks. I had db locks, it took ages to stop the node, I/O @100% constantly, you get it.
+ docker stop -t 36000 storagenode-chomik7
storagenode-chomik7
real 418m34.723s
user 0m3.745s
sys 0m0.616s
Had to kill it manually. The node is on a ST6000VX0023 drive together with some other nodes, a total of ~4.5TB in Storj data and another 0.5TB in other “data collecting” activities. The drive is now connected over SATA to a HP Microserver N36L.
I tried, but I admit I don’t know how to interpret the results of iostat/iotop. What I certainly noticed is that if I temporarily set max concurrent connections to a low number, I never had this problem. It only became a problem when I increased it to 500.
$ iostat -dmx 5
This will give you the running avg usage details of your disks.
Look for:
r/s - read ops per sec
w/s - write ops per sec
right most column, is a calculated indication in % of drive busy
First two can be combined for total drive IOPS.
500 is a very large number… if you are to use this setting at all, use low sensible numbers. It’s a limitation setting, it can’t force more traffic to your node if you set it high
This is the normal operation of the system, and I deliberately started stopping the same node again. The drives that host my nodes are /dev/sdc and /dev/sdd, the latter being the one with 4.5TB of Storj data.
It’s mostly because at some point I had thousands of connections… and it took more RAM than what I had in this system. Besides, I believe that just like every queue in a distributed system should be bounded, every resource should be limited if reasonably possible, even if the limits are way higher than what a healthy system should request.
In that case you should use something besides btrfs. Not having redundant metadata, even on a single disk, is a recipe for losing the filesystem if a bit is flipped in the wrong place.
Finally finished migration. The difference is clear: I’m barely getting 20 writes per second, when I used to have even 200 on btrfs, when comparing a similar period in terms of a number and size of upload requests. I didn’t expect the difference will be so huge.
One thing is that the btrfs partition was accidentally created with metadata in DUP scheme, which probably explains half of the difference. Maybe being log-structured might explain the other half?
So while I don’t have hard evidence against btrfs, I’d probably side towards advising against it for storage nodes.