Updates on Test Data

Mitsos · May 22, 2024, 5:18pm

Unvetted nodes only receive a tiny portion of traffic (1% if I’m not mistaken). Since the network considers all nodes in a single IP as one, that means that essentially the old node still gets the same traffic as before, since that 1% is of all new data uploaded spread over all unvetted nodes.

vovannovig · May 22, 2024, 5:21pm

I suggested an option to reconsider this paradigm.
This is the essence of my proposal to make changes to the 1% paradigm.

JWvdV · May 22, 2024, 5:35pm

I’m up for 1 and 3, although I would never make traffic really $0. But $0 for storage, is really the worst idea I’ve ever heard of and would be no deal. You would get all daily backups and so on, which most probably never are being downloaded. This only would make sense if they would pay for ingress.

That was just the point, since the proposal meant to pay $0 for storage and only for egress traffic as far as I’m interpreting it right.

Roxor · May 22, 2024, 6:23pm

As cheap as raw space has become: bandwidth costs have come down even faster. I bet Storj will have to remove all transfer fees at some point. Customers expect to pay you to keep something… but not move it.

MarviBiene · May 22, 2024, 8:30pm

You have to see the other side too. File sizes exploded in that past years too. And transfer is still quite expensive. So somehow it keeps level with transfer costs, bandwidth and file sizes

littleskunk · May 22, 2024, 8:31pm

Yesterday we tested 4 different RS settings. Lets say the results have been supprising. The longer we make the long tail the more throughput we get. Downside is a waste of resources. Longer long tail also means nodes that lost the race still consumed a lot of bandwidth. So ideally we make the long tail short to make it more resource efficient and less painful for the nodes.

In a few minutes we will start new benchmark tests. This time with a new node selection. If it works out I will also explain a bit more in detail how exactly the new node selection works. I don’t want to jinx it.

snorkel · May 22, 2024, 8:34pm

If you keep changing nodes in the same time you change the test parameters, how can you compare the results? Or I missunderstood you?

BrightSilence · May 22, 2024, 8:38pm

Isn’t that kind of expected when the uploading machine isn’t the bottleneck? I think there are plenty of nodes that can handle much more than you threw at it. Even with the additional waste. Though hopefully the new node selection will achieve the same without the waste. Looking forward to hearing the details on that.

littleskunk · May 22, 2024, 8:51pm

First test round will be new node selection with standard RS settings. Later we will mix in the faster RS settings we found.

It is a bit more complicated. The RS numbers differ in piece size, IOPs, bandwidth consumption, expansion factor. The test could have gone in any direction depending on what exactly the limiting factor is.

BrightSilence · May 22, 2024, 9:05pm

I’d be curious to hear what options were tested and what the results were if you could share and it wouldn’t take too much time. I know you’re busy though.

Ruskiem · May 23, 2024, 7:33am

You have trouble to pretype the nodes with fastest download?

because the fastest nodes will occur as winners after a long tail test?
but the resources will be wasted to find out?

And You want to know before, to what nodes send the file first?
but its kinda hidden?

Why not every restart storagenode.exe to provide a download test? like prepare a 100MB file (lets exclude it from payment), upload it to every node, every restart a node will download it from random nearby nodes and determinate the throughput on how fast it completes it.
Save results, and add it as an additional variable used during the node selection.

This way, customers who want upload files as fast as possible, will have an option to target the fastest nodes right off the bat.

ACarneiro · May 23, 2024, 7:45am

Valid points but this one is just a bit of a lazy generalisation, isn’t it?

Alexey · May 23, 2024, 8:55am

I believe the readability timeout is related to this problem:

littleskunk · May 23, 2024, 8:57am

Our current RS setting is 29/35/65/110 without the repair override. 65/29=2.25 expansion factor. 110/29=3.8 for bandwidth. It is up to 3.8 depending on how much traffic long tail can cancel out.

The first alternative we tested was 16/23/35/60. Less IOPs for the storage nodes but expansion factor and bandwidth is about the same. It still was 20% faster. That was the first surprise. I expected same speed.

The second RS we tested was 16/23/35/50. It was the original target but I didn’t want to change IOPs and bandwidth consumption at the same time. That is why we did the additional step with the previous RS setting. Shorter long tail cancelation decreased the performance. Second suprise. I thought this RS setting consumes less IOPs and less bandwidth so it should be faster.

And the final RS setting I wanted to test was 32/38/54/77. It has almost the same IOPs as our current 29 RS setting. It has a lower expansion factor of 1.7 and would help to upload an even bigger amount of data. It consumes less bandwidth. So overall the best RS number I can come up with. Just that the previous tests showed that a longer long tail improves performance. So lets scale it up to match same bandwidth consumptions as the other RS settings. 32/38/54/120 it is. Suprise it was twice as fast.

Now the problem is that the RS setting with the biggest performance gain isn’t something I really want to use. My plan is to revisit the RS numbers after we tested the new node selection. My theory is that the new node selection will pick better nodes by default to the point that I can pick the shortest possible long tail + some safety margin. Maybe even lower than 32/38/54/77. That would be the best RS setting as long as we can hit our performance goal.

Test with the new node selection failed. I believe I spotted the mistake already and now I am waiting for my coworker to wake up for a new try.

BrightSilence · May 23, 2024, 9:39am

I thought it was 29/35/80/110? Or did you choose 65 to simulate average availability?

I assume you kept the segment size constant for these tests?

That is surprising, since you involve fewer nodes and less distribution of bandwidth. Interesting. I guess the IO overhead has more impact than that. This could be good for individual node efficiency too. Also for the file walkers and stuff since there are fewer and larger files on the nodes.

This one surprises me less. There would be a bigger chance of slower nodes being required to finish the transfer. In my eyes this signifies there is a fairly significant performance differentiation between nodes, so that makes sense. I guess this is where success rate based node selection can help.

Wow, that is surprising. The biggest difference here is the lower success threshold as the other numbers are all just slightly higher than your baseline. I wasn’t expecting long tail to be that much of a factor.
I do wonder if this setting wouldn’t trigger a lot more repair due to the smaller margin between 32 and 54. Do you see that as an issue?

Yeah, that’s what I was thinking too. Hopefully you can get the best of both worlds. Same or even better performance as a long tail, but without the IO overhead. This would really help on the uplink side as well.

Perhaps a little selfishly, I was kind of hoping for lower RS numbers, not higher ones. Just because fewer and bigger files on nodes would help with lots of performance aspects on the node side. Perhaps something to keep in mind. Of course adjusting the segment size is also an option for that, although that only helps with larger files.

Thanks for posting the details! Really appreciate it. And the results are indeed interesting. Looking forward to the new node selection test.

littleskunk · May 23, 2024, 10:01am

Originally it was but later we found out that there was an off by one bug in the calculations and 65 would give us high enough durability as well. Plus at some point upload speed wasn’t consistent. So we reduced it to 65 and that was giving us great upload speed again and around that time we started to invest time into coming up with a better node selection to not have to throw more and more long tail resources on the problem. That doesn’t scale in the long run.

The idea is that the data gets deleted by the TTL before we ever have to repair it. So we have more freedom with the RS numbers here. It is a bit of a gamble right now because we don’t know how long it takes for a segment to end up in the repair queue. For some reason we never build a metrics for that. Good thing is we can basically pick any RS number and change it later on. Within a few months the data that was uploaded with the previous RS numbers will expire and the new RS numbers will take effect. So we don’t have to waste too much time on it and can just try it out on the fly.

BrightSilence · May 23, 2024, 10:03am

Sure, that would be the case on saltlake, but not on production satellites, right? Where there is a mix of TTL and non-TTL data.

littleskunk · May 23, 2024, 10:09am

There is a PR prepared to allow a mix of RS numbers on the same satellite. It is missing some tests and would require a few days of work to finish.

Roxor · May 23, 2024, 10:10am

Wait… so the healthy-pieces-median number should have been around 65 all this time and not 80? The repair system hasn’t been slightly behind? Everyone thought it was slacking!

BrightSilence · May 23, 2024, 10:13am

That… is pretty awesome.

Well it’s normal for it to be lower than the success threshold as repair doesn’t immediately kick in after the first drop in availability. So I’m actually surprised to see the median at 65. I guess over time the base of nodes has become quite stable, so it makes sense that that success threshold was lowered. I just didn’t know it was. I don’t think that was communicated before.