Updates on Test Data

BrightSilence · May 24, 2024, 9:59am

I might have misinterpreted your initial question. I assumed you meant whether downloads are also selected using power of 2 node selection. But now I feel like you might have been asking whether download successes are also used to determine priority for upload selection. Perhaps they should be. Because you make a good point.

Challenges with that:

You would have to track download success rate in a different way, because the upload method used now won’t work for downloads. Should still be possible by using orders sent by nodes, I think. But it would have a significant delay.
With upload you select nodes for upload and then upload right away. So you measure and immediately use. For optimizing downloads by uploading to nodes with good download success rate you pretty much have to use an average over a long period of time, because you don’t know when the subsequent downloads will happen.

Not impossible to overcome, but perhaps not a priority right now. Interesting thought though.

Mitsos · May 24, 2024, 10:08am

I’m sure a lot of people are waiting for this: did we actually manage to hit the goals with last night’s test? How far off are we if we didn’t?

ACarneiro · May 24, 2024, 10:11am

I’m still seeing a lot of ingress on my nodes, I wonder if the test is still ongoing…

agente · May 24, 2024, 10:21am

“Me test you long time…”

holocronology · May 24, 2024, 10:39am

Been sitting at ~25% Trash for several days now. Is this the new normal?

ACarneiro · May 24, 2024, 10:41am

It should be deleted after 7 days, but I’m also wondering when this will actually disappear…

holocronology · May 24, 2024, 10:46am

It only seems to be like this since the test data round.

ACarneiro · May 24, 2024, 10:58am

Perhaps the parameters that Littleskunk was so excited about have been rolled out on a more permanent basis?

snorkel · May 24, 2024, 12:11pm

So “use what you have” and “don’t invest in new stuff for storagenodes” are contradicted with every major update in the last 12 months.
All these updates say only one thing: slower nodes get left behind.

Mitsos · May 24, 2024, 12:22pm

The way I see it is this: You use what you have to start off. If you see it working out in your favor, then you buy more. At some point, depending on your risk tolerance, you may start buying drives/adding connections with the anticipation of making your investment back.

I don’t think that anyone is expecting SNOs to have an exabyte of standby capacity. That’s not happening. It is used elsewhere and moved to storj as the needs grow.

The way I started: I had some spare capacity in servers. I used that to get a feel for it. When I saw that it was getting used, then I added a 4TB drive. When that started getting filled, I could have waited for it to fill up then ROI on its own. But I didn’t, I added a 2nd 4TB drive. This way both of them started working towards ROIing the 2nd drive. At that point I scrapped the idea of ROIing the first drive and kept adding. A couple years down the line payouts made it to a point where I could add bigger drives. That’s the point I’m currently at: I add 20TB drives and scrap the idea of waiting to get my investment back. Do the payouts cover the drive+operating expenses? Add more.

flwstern · May 24, 2024, 5:38pm

I’m seriously starting to get a bit tired of acting as a SNO. You’re getting my bandwidth practically for free, and you get very good prices per TB from us operators. But when you run tests for over 2 weeks to benchmark, maxing out both CPU and bandwidth, and you claim that we should rent out excess storage, it’s really disrespectful to push our hardware so hard for such a long time without even checking with us operators first. My links are almost saturated at certain times of the day, and it’s not even customer data. If it were customer data, fine, but this is test data that doesn’t benefit us at all. In 3 days, I’ve received over 50TB of traffic. We who are in the data center pay for our bandwidth, and when it becomes this extreme over time without any payment in return, one almost considers shutting down.

Bottom line i personally as a SNO wish for more stability, there are to much bugs, errors and stuff going wrong with updates,extreme ingress, extreme deletion. Not caused by customers, but you as a company. And it doesnt feel comfortable, could you please run your own tests on your own nodes? All this production testing i personally feel its sketchy. To much cowboy.

I cannot dedicate hours of reading the forums everyday as a SNO.

littleskunk · May 24, 2024, 5:51pm

Good news. We found a bottleneck on the satellite side and solved it. With 6KB files the satellite is able to hit the required ops/s. Now we just need to get the storage node network to take that load with bigger files. Time to try the 32 RS numbers in combination with the power of 2 node selection.

We will start with 32/38/54/120 first. That doubled the throughput last time but with the power of 2 node selection I would expect that the gain is smaller this time and a shorter long tail like 32/38/54/65 will improve the throughput. We will see.

Roxor · May 24, 2024, 5:59pm

I can definately understand some SNOs leaving the project for awhile: perhaps taking a year off and then coming back to try running a node again. Hopefully they still check these forums occasionally: just to hear what’s going on.

pangolin · May 24, 2024, 6:10pm

Only if “elsewhere” is less profitable than $1.50/TB.

ACarneiro · May 24, 2024, 6:12pm

I disagree.

These tests are important ways of stressing the network, prepare it for future loads to (hopefully) come and, in so doing, tweak more performance out of the system (which appears to be happening, judging by littleskunk’s posts).

I appreciate we’re not being paid for this (and you have the added injury of actually paying for your bandwidth!) but if our machines have to hurt a bit today in order to get more work over the coming weeks then I’m happy with that.

Perhaps you can bring your nodes offline for a few days until the testing is over? Or put some QoS in place in order to reduce bandwidth utilisation? (You’ll lose more races but I guess that’s the intent).
Not sure the later is against ToS, mind you, but so long as you’re above the minimum bandwidth stipulated then I suppose it should be OK…

zip · May 24, 2024, 6:13pm

If some of the use cases will get signed, will it be using the capacity/throughput as you are testing it for? Or are you way overshooting the tests just to be sure?

flwstern · May 24, 2024, 6:14pm

But this is my whole point, during my two years as a SNO there is something new all the time, something new to configure, monitor. There is just cowboy all the time. I wish for more stability and less work all the time. And here i am on a friday doing bgp filtering and qos.

ACarneiro · May 24, 2024, 6:16pm

I see what you mean. But there are many improvements that need to happen. The product is a brand new(ish) concept and it was started from scratch.
When you join such a young system there has to be change and evolution otherwise it will fail.

As @Roxor alluded above, perhaps you might want to consider quitting the service for a couple of years until it has hopefully matured some more?

Would be a shame to see SNOs going but in your circumstances where you pay for bandwidth I would not be able to criticise you…

flwstern · May 24, 2024, 6:18pm

Yes perhaps the best.

pangolin · May 24, 2024, 6:21pm

As far as I know those tests are done using the saltlake satellite. So why not just disabling this satellite as a SNO?