The amount of egrees my nodes recently perform is getting uncomfortably close to what my Internet connection can support without negatively impacting my home usage. If my node will suddenly start receiving high-egrees pieces, it might be quite annoying. And so I started thinking of the best way to control egrees.
I recall there used to be a parameter within the storage node software to declare the amount of egrees bandwidth available, but I also recall that it never really worked, and that it got removed.
I could set up some basic traffic shaping, but I fear the effect of naĂŻve shaping would be detrimental to the Storj network as a whole and to my earnings: it would equally split the available bandwidth over all concurrent downloads, making all of them slower and all of them at risk of becoming dropped, making the experience of all served customers worse. Besides, it would also pile up more and more requests, just like we observe with slow disk I/O requiring more and more RAM.
Instead, following the well-known rule that shaping is best performed directly at source, it would probably be better to shape traffic inside the storage node code itself. I imagine an implementation where the node prioritizes transfers in the following way:
- First, transfers that have already started. If we’ve decided in the past to put effort in these connections, commit to them fully.
- Then, audit/repairs in the incoming order (FIFO/oldest first), but allocating only a small amount of bandwidth to them and only if they’re already waiting for at least, let say, 30 seconds. These connections are not latency-sensitive, so defering them should not impact customer experience.
- Then, pending non-audit/non-repair egress requests in the order opposite to incoming (LIFO/most recent first). If we need to decide to start data transfer and commit to it, the most fresh requests are the ones most likely not to be tail-cancelled.
- Then, pending audit/repairs without the 30 second delay. If there’s still free bandwidth, let them happen quickly to free up bandwidth for future customer traffic spikes.
This way the started transfers would get finished potentially as quickly as without shaping (being picked up shortly after being initiated and without having to compete with many other connections), so it would be a win at least to some of the transfers. And the egress requests that keep being postponed by newer requests—well, they’ll get more likely to be tail-cancelled, so at least the transfer will not induce the local disk I/O and data transfer overhead for it.
Also, instead of keeping a traffic shaping counter within the storage node itself, the storage node would observe the operating system counters for the network interface, like /proc/net/dev
. This way shaping would be shared across all storage nodes operating on the device, and as a nice side effect, if the operator performs other, non-Storj data transfers from that device, the storage node will make space for them in a natural way, not impacting the device’s main purpose. The storage node could assume that the minimum T&C-mandated amount of 5Mbps is always safe, of course.
What do you think?