When network is sufficient? (crash test router)

tachyontec · January 22, 2023, 1:24pm

It seems to be a problem for a lot of people starting their way with a large disk like 8TB ,
going really well in the beginning but after a while they have network problems, or disk problems etc,
and then they realise that their network maybe is not sufficient to handle all the storj loads, or even the disk sometimes…

Is there any way to check the network and the device if it is capable of transferring that large amounts of data, that will have in the long term?

Speedtest cannot really do that because it transfers really small data packets compared to storj nodes,
but is there any way to check that ?

For instance we could take an average from other nodes, and see what’s the bandwidth needed for a specific amount of disk space. Because a disk with 8TB will have a lot more uploads,downloads than a 1TB disk.

We could simulate that, by uploading and downloading big amounts of data, and also write them on the disk, so we could check how much it can handle, and if our disk and network are sufficient for the needs of storj.

I tried to find something like this online, without any luck, i am sorry if something like this already exists, but if it’s not i am just dropping it as a suggestion

zolbarna · January 22, 2023, 2:57pm

Disk problems?
Network problems?

Can you be more specific?

I’m a node operator having a 1T 4T 8T nodes. The amount of data/second never reached a point where the network/disk I/O should be a bottleneck.

Toyoo · January 22, 2023, 4:03pm

From what I observe, there are four types of network bottlenecks for Storj.

Latency to customers. If you’re “close” to many customers, then you’ll win more races, filling your storage faster and getting more downloads. You can’t control the actual geographical distance, but you may control what ISP you are using—maybe some ISPs are better connected than others. Your local network topology will impact latency, e.g. Ethernet is better than WiFi, and WiFi is (from my experiments) better than networking over powerlines. Your routers and switches may also impact latency if they’re of low grade. You can measure latency to your ISP.

Ingress bandwidth. I think I haven’t seen ingress bigger than around 3 MB/s yet, and these were short peaks anyway. This is per IP /24 block, with a small increase if you have both a vetted and an unvetted node within your network. If your ISP gives you 100 Mbps or more, you shouldn’t worry. Speedtest is probably good enough here.

Egress bandwidth. Again, around 3 MB/s short-term peaks was the highest I observed so far. Egress depends on what kind of data you store, so it’s more difficult to define how it scales. New data is usually downloaded more often than old data, but I do have one node that is full for many months now, but constantly gives pretty good egress revenue. Again, speedtest will give you a good number here.

Number of concurrent TCP/UDP connections. Something I learned the hard way, my old router was restarting every few hours during traffic peaks because its routing tables filled too quickly. Probably the easiest way to test this is to try to download a high-traffic torrent file and check if it kills your router

Regarding the device, there are two interesting metrics here, kinda related though. One is sustained write IOPS measured at the file system level, another is operations on directories with large number of files. SMR drives are worse than CMR drives, btrfs is worse than ext4, checksums and parity schemes often degrade performance, write caches help. Having more RAM for caching direntries and inodes helps. Moving databases to separate storage helps. Disabling synchronous writes should help. Probably a tool like filebench would be of use here, though I have no idea what would be the thresholds to achieve, and a realistic test would take many days.

This is roughly a synthesis of common forum posts. Probably not comprehensive, but should be a good start.