Limit node transfers through node selection

Toyoo · April 28, 2020, 12:50pm

Indeed. Even now it is quite easy for end users to get cloud storage prices close to 2 USD/TB/month of raw (non-redundant) storage with plenty of bandwidth even with current cloud operators, and I suspect that their actual costs are even lower than that. If Tardigrade becomes successful, these cloud operators will drive out RPis.

However tiering is something that I would happen naturally if it will be possible for satellite operators to charge different prices (SNOs can already decide which satellites to work for, and it could become the choice of the more competitive ones that pay better, or the lesser competitive ones that don’t pay as well). It is for these lower tiers that this functionality makes sense, and it would be a competitive advantage of this technology to allow more flexible tiering than whatever current cloud operators focus on. Without this kind of functionality, lower tiers might not be as economical for end users.

BrightSilence · April 28, 2020, 12:51pm

Thanks for the response @BlackDuck

No more than using the max-concurrent-requests setting would.
I’d even go as far as argue that nodes that are unable to win races right now because of their constraints would possibly receive more successful uploads rather than fewer. So a good implementation would serve decentralization, not harm it.

I think this unfortunately impacts all proposed solutions here. I don’t know how large that impact would be. I think some options could work if those numbers are asynchronously updated, perhaps in batch. If the number is already in the nodes table, adding a condition would not slow down node selection. Though it would come at the cost of the recency of that data. But a slowly updating score like @Pentium100 suggested could still make that a plausible option.

This could be collapsed into a single setting that limits all transfers as well, instead of separation between upload and download. See also my explanation below of how download limiting would work in this scenario.

Are you suggesting these nodes should be excluded from the network altogether? I can see that may be an option, but I’ve not heard that be suggested before. So I’m just wondering if this is being considered.

And that’s exactly what we’re trying to prevent here. I completely agree with you and I should have been more complete. In the case of a download limit, it wouldn’t be a hard limit. The satellite would select nodes that haven’t reached their limit first, but if not enough of them remain, it will have to include nodes over their rate limit. Assuming x nodes are needed to be selected for download, but only y nodes are below their rate limit (y<x), the satellite would select those y nodes below their rate limit and x-y nodes that are over the rate limit at random from the remaining nodes. I believe this can be done without impacting performance of node selection.
This method would optimize the number of nodes that have resources to deal with download transactions, which will actually improve download speeds over selecting at random while still allowing for long tail cancellation and redundancy and preventing downloads from failing.

Correct, the limit would have to be applied per satellite. SNOs would have to adjust for that. A system that would automatically adjust for this would be preferable. Which is where option 4 comes in. But that would require more coordination between satellite and node. Honestly I think SNOs would be happy with a setting for max transfers per minute per satellite that they could tweak should they run into bottlenecks with their setup.

BlackDuck · April 28, 2020, 1:17pm

I don’t see a perfect implemention right now. As a malicious I could generate request to bring you node into the limits, and exclude it from the the seletion list for some time.

Of course not! We need much more of them. If we had the double or tripple amount of nodes most problems were gone. My favorite is more but smaller nodes.

Don’t get me wrong, but we have a limited amount on dev resources and a bunch of features to implement. And we need to get things finished to onboard paying customers. I understand that from SNO side everything feel very slow progressing, but we are on it. One reason why we put that load on the network is to see such problems. But from our side we need a solution that work for SNO, Customer and fit into the SLA we give.

Pentium100 · April 28, 2020, 1:21pm

Can you change my nodes IP address to some bogus value to get it ofline or the wallet address to yours? If not then maybe the same protection should be used for the weight/connection-limit/etc?

anon27637763 · April 28, 2020, 1:35pm

DDoS is a definite risk for the Storj network… especially if it grows into something that noticeably cuts into more established providers’ bottom line.

Toyoo · April 28, 2020, 1:47pm

Worse than that. Governments can request cloud providers to hand the data. As there is no such possibility with Storj (assuming soundness of encryption), denial of service is the next best thing for them.

Alexey · April 29, 2020, 10:24pm

7 posts were split to a new topic: What is Warrant canary?

BrightSilence · April 28, 2020, 2:03pm

How? Since the satellite does node selection, I don’t see how anyone on the outside could fake connections. Or does the node not verify that an uplink contacting them was actually based on selection by a satellite? If so that sounds like it’s already a problem.

The node would do it’s own reporting of limits and or congestion based on the solution chosen. These could be verified with simple signing of the communication.

I didn’t think so, but wanted to make sure. More nodes would help, but it is a balance. If more nodes leads to lower demand, it by definition also leads to less traffic which limits profitability. That’s a balance between profitability and distribution of transfers. Lean too much either way and it wouldn’t be good. More nodes (in relation to demand) would help with this problem, but drop profitability. You also can’t control what those nodes will look like. I’m kind of messing with your wish already, running a node with now 16TB available. And I’ll keep upgrading that as long as it’s profitable. There will always be nodes like that.

It really doesn’t to me. I know you are working really hard on lots of features and customers have to be priority one. Yet there have been great upgrades to the dashboard to give SNO’s insight into earnings etc. It’s not my intention to complain about the rate of development. I merely point out a risk of SNOs using the max-concurrent-requests setting and customers failing uploads as a result. So I’m suggesting an alternative to fill a need for some SNOs (Not even my own need in this case, my node isn’t running into any bottlenecks) to work around bottlenecks without causing this problem. I know SNOs are still using max-concurrent-requests and I know some customers are still encountering these errors.

Failed to copy: uplink: segment error: ecclient error: successful puts (79) less than success threshold (80)
Failed to copy: uplink: segment error: ecclient error: successful puts (29) less than or equal to repair threshold (35)

My theory was that these could be related and solving this would prevent errors in the uplink. Honestly the SNO advantages are just icing at that point. Fixing this error is more important. But if it solves problems for SNOs as well and perhaps even opens the door to more SNOs, all the better.
I could be wrong about that relation between error and cause though. I don’t have the inside stats to know for sure this is part of the problem.

littleskunk · April 29, 2020, 12:32am

I am one of these nodes. I have a low success rate. So let me share my observations here.

I enabled QoS. That solved this problem.

I have tested different settings in the past. It is correct that it will improve the success rate but will decrease the number of total successful uploads. Let’s say I am getting 1000 upload in whatever time frame. At the moment I have a success rate of 45%. I will win the race on 450 uploads. I can increase the success rate to more than 90% by setting a max-concurrent-requests limit. The problem is the number of successful uploads will decrease as well. In the end, I will have only 400 successful uploads. So less data but at least it will look nice in the logs.

For that reason, I removed the limits on my storage node. I take all the uploads I can get. My node is increasing fast. Who cares about the success rate in the logs as long as the storage node is increasing fast?

Mark · April 29, 2020, 1:36am

It will also be a more efficient use of the internet connection since the incoming data is being saved on the node instead of being canceled and discarded which seems like a waste of internet resources to me.

BrightSilence · April 29, 2020, 7:50am

I agree that QoS is a decent solution for the SNO side if the connection is a bottleneck. Though not all routers have it. A friend of mine had to stop their node because the router had no QoS and the internet was no longer working. Admittedly they had a connection well below the recommended spec.

If the HDD is the bottleneck, a possible solution could also be to spin up a second node, which will spread the load over 2 HDDs. There are several options SNOs could consider.

But I’m afraid as long as the max-concurrent-requests setting is there people will use it.

As for success rates, that was not my primary concern when posting this. I don’t doubt your results, but I’ve also seen SNOs that had rates below 10% and were able to significantly increase them by rejecting about half of the traffic. It could go either way. Though I’ve stopped suggesting it to people due to the issues it could cause for customers.

Even if people don’t use that setting though, selecting nodes that aren’t already slow could also increase performance for the customer. I see many SNOs touting gbit connections with plenty of room to spare. They can probably handle a little more load while the slower ones are struggling.

Toyoo · April 29, 2020, 8:27am

On one of my nodes, now retired, this worked only to a point. At some point with large number of consecutive uploads the total number of races won decreased. That point was, for that specific node, at around 30, IIRC. I posted some statistics I computed in a different thread here.

littleskunk · April 29, 2020, 10:37am

It doesn’t need to run on the router. You can install QoS on the strage node itself. It wouldn’t be as affective as a central QoS instance but it would still solve the problem.

system · May 2, 2020, 10:24pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.