Thats true and a good move from storj to take the risk instead to take in on the shoulders of the SNOs. But it will help a lot to communicate clearly and fast, if the customer(s) decided to sign we should know that in order to be prepared.
I just ordered one more 18TB disk and will proceed like you and i think many other SNOs here.
I dont know exactly what your experiences is but the most of my node are on 10G/10G or at least 1G/1G and i never saw more like 180-200mbps per node so far…
Where these speed requirements up to 1G already tested? experiences anyone?
I think I saw 300mbps for a short time a while ago. But this is less about that I think and more about the speed. As I understand, the problem is not with the amount of available space, but with the speed and the concern is that if the fast nodes fill up (and they will fill up first since they get more traffic than slow nodes), the remaining nodes, even if they have the space, won’t be fast enough
So, getting a lot of space online, but with a slow connection won’t help.
That would make sense if a whole file was being downloaded from a slow SNO.
But you’ll get 28 chunks out of however many (I can’t remember exactly the numbers) it tries first, so even if the SNOs are relatively slow, the parallelism should compensate for that.
Apparently it doesn’t, at least not enough. According to the previous posts, Storj found that the network was too slow, so they made some changes (reduced the data expansion, changed how nodes are selected to select faster nodes etc). So, now faster nodes get more traffic than slower nodes and if they fill up the remaining slower nodes may not be enough. Apparently the new customer is planning to upload data at 100gbps or some other high speed.
Well in fact i think this could be a possible reason.
The next question is about possible drops in terms of data distribution.
If there are SNO with sufficient bandwidth, spinnig up new nodes one by one in same /24 subnet, because disks fill up fast, then the amount of incoming traffic will be divided to all available nodes… Will the traffic be divided by amount of online nodes in subnet or will the traffic be divided by online nodes in subnet with space available?
If divided by online nodes without consideration of available space , there could grow a bottleneck because the amount of full nodes rises while the incoming traffic will be divided by more and more nodes including the full ones. That would lead to decending download rates on the nodes with space available.
That won’t change the bandwidth: as nodes fill they’ll stop being available for ingress. So the upload won’t be “divided to all available nodes”. Maybe there’s one-node-with-space… or one-node-with-space-plus-twenty-full-nodes. It will be the same bandwidth to that one-node-with-space
This all hinges on those SNOs ‘with sufficient bandwidth’… to never stop expanding when they fill. We’re not all @Th3Van - most of us have limits
Also… we don’t know where all the data is going to be coming from.
We don’t know if this is one big centralised company or a lot of smaller users uploading data from all over the world.
So the current pool of “fast SNOs” may actually not be the fastest ones for the end-client if and when they do start uploading data.
Write-only data would be very simple to “store” if not for the pesky audits.
In 2017 I had a brief chance to work on an absolute monster with 12 TB of RAM and >500 CPU cores. I couldn’t reasonably take advantage of all that power despite doing CPU-heavy stuff. htop view was amazong though.
On the other side, now I’m working at a place where spinning up a Spark cluster with hundreds of nodes in the cloud is a daily routine
A new record of GC on these test data. 6M pieces moved to trash over 104h, totalling around 700GB, again while receiving ingress. It also reported a piece count of 28M from the saltlake satellite.