Updates on Test Data

Holy crap that’s a lot of ones and zeroes! You’ll make up for losing like half of your stored data in no time! :wink:

(but seriously man: I check your report every once in awhile: what happened to your earnings rate was brutal :cry: )

Thank you for the feedback. The team is now working on a fix. It is the collector that deletes the expired pieces. It isn’t running with the lower io priority yet. Should be an easy fix.

5 Likes

Well, then I will wonder, maybe it’s the customer who stopped uploads. Also, it won’t help with establishing the long-term baseline.

Metrics are either reliable or worthless.

That is correct. On my grafana dashboard I am watching the upload rate. So just upload started per second. Even a small reduction in success rate will scale down the number of uploads my node gets in a very short time. Comparing 2 nodes by success rate doesn’t work anymore. Comparing them by upload rate is the new metric.

2 Likes

Its storj aka cowboys!

So, as I understand correctly, the satellite looks for multiple failed uploads in a row now? Or is it that the satellite now calculates the success rate over a very short time window (how short)?

We are going to run a few more tests now to see if we can reduce the storage expansion factor and bandwidth usage a bit while still hitting our target throughput.

7 Likes

and than? when scale up again?

yeah it always has been like that, if You make computer busy with max download, You cant really upload at the same time, with good speed. Learned that in 2005 or so, downloading anime at full speed 24/7 from some mIRC channels or so.

Edit: maybe a differentiation at node selection: if its SAS or SATA drive. SAS drives could be prioritized for more hot data (means upload from node, (egress in storjs terms)) A.i. tells me SAS drives are 2 times faster in simultaneous I/O read and write.

1 Like

We’ve always been able to count how many uploads/downloads have come in over time (and their size), that hasn’t changed.

And higher-performance nodes have always seen better results (= more stored data = more egress = more cash) that hasn’t changed.

And low success rates still indicate config/hardware issues: that hasn’t changed.

What it sounds like has changed is the system is less likely to push nodes to try to win more races than they’re capable of (and then start to increase race losses). Many SNOs have asked for this, especially after experiencing issues with early performance testing (and you don’t even have to leave this thread to find examples).

I can imagine a stronger focus on raw upload request counts now though. Because even if nodes don’t get pushed to failure… you’ll still want to know how well (or poorly) they perform compared to others (or themselves over time).

But ultimately… a potato is still a potato. It’s just the new adaptable-rate-upload features make it less likely to be a baked potato :+1: . Faster nodes used to make more $$$ and will continue to do so…

3 Likes

What changed is the ability to see that the node is overloaded. I have more graphs than most others so maybe it would be easier for me, but looking at this:

Which drop in traffic was because my node was overloaded so the satellite sent it away to cool down and which is normal, because the test was stopped or because a customer finished uploading data?

2 Likes

This made me chuckle. Have a pint :wink:

3 Likes

Be aware of the risks. In your chase of hitting some target throughput, you keep reducing the expansion factor, which, untill now, it kept the data verry safe and we didn’t lost a bit.
I don’t know… maybe a little compromise won’t hurt anyone; lower your target and keep expansion factor to a safer level.

1 Like

That’s kinda a tough question. Traffic graphs never told you if your node was capable of more… unless you were hitting the limits of the NIC or your Internet plan. And you never knew when tests were stopped. You never knew when a customer finished uploading data. So none of that has changed.

BUT as an indication that maybe a node could do more was to see if it’s race-%-wins dipped: as a hint that it could be overloaded. That’s something you could see in a graph. Now the graph to look at will be one that shows upload-request rates (not win-%) - and if a node consistently caps at around the same rate day-to-day or week-to-week: that will be your hint that it could be overloaded. Meaning a faster node could probably service more requests (and store more data and make more money)

If anything that upload-request-rate is a better number: you can directly compare it between two different nodes to see which is generally faster (whereas comparing two nodes that both win 99% of races tells you nothing about their relative speeds).

I can’t wait to see where this new system is going! Show me the money! :money_mouth_face:

You can try correlating the load (especially IO wait) to the drops in traffic. Buffers and cache (assuming linux, I don’t know off the top of my head your setup) shooting up and not stabilizing is a sign of overloading.

What I mean is that if the network load drops, that should be after a spike up in the node’s load, otherwise it can be safely assumed that the uploader was done.

1 Like

a lot of ingress today!


2 Likes

That’s because SQM wasn’t that prevalent back then. Today fully saturating uplink shall not noticeably affect dowink and vice versa.

FWIW I had 12Mbps down 2Mbps up Internet for a while and my backup tasks were saturating the upstream 24/7 for months. The graph was the straight horizontal line. And yet it had no impact on other activities. Of course, if I disabled SQM — everything would just collapse. I had Ubiquiti USG3 router back then, it could do SQM to up to 50 megabit. Magnificient little device.

1 Like

Success rate had a known fixed benchmark of 80/110 ≈ 72% to compare against, so you didn’t have to have a second node.

Now you will get a decent success rate even for relatively bad configurations.

Indeed. But knowing whether a change you make to your potato actually improves operations is known. Now potato operators have even less indicators to guide optimization of their nodes.

But there is a simple thing Storj Inc. can do to remedy this, and it is publishing the success ratios used in the algorithm. This way node operators can look up how their nodes work compared to others’. It’s not like these ratios are secrets—any customer can run a large number of uploads and publish their success ratios along with node IDs. Storj Inc. can do that as regular satellite operations though, by just publishing the numbers they already have.

Yep, I fully expect some node operators to drop over ISP limits.

SQM is magic. But it takes a good router to have enough CPU to do it at throughputs >200 Mbps. My observations are, typical ISP-provided potato routers are too cheap for that.

1 Like

Bro. This node is like 45 days old. Ive taken on 3tb in the last day and a half, and my router is showing 200mb download consistently

3 Likes

I believe the vetting constraint has been removed for the test data in Salt Lake.