Announcement: major storage node release (potential config changes needed!)

jtolio · May 15, 2024, 4:05pm

Behind the scenes: we have been focused on the overall performance improvements represented by v1.104.x. Now that v1.104.x is rolling out we should be able to get back to pause/resume for the filewalker. You could say we paused and are now resuming it

Toyoo · May 15, 2024, 11:59pm

Does your node really store that number of files for a single satellite?

snorkel · May 16, 2024, 5:12am

I have 50 mil for US1 in a 12TB node, so I can imagine 70 mil is very possible.

pdeline06 · May 16, 2024, 11:52am

This is the total number of files for all satellites on the node. There is already a node with 75 million

pdeline06 · May 16, 2024, 11:58am

jtolio, сould you look at this problem? Please

No one has cared about this error for 2 weeks now
Or at least pass the information on to the people in charge.
I didn’t expect that so much time had passed and no one cared about this problem.

Thanks

snorkel · May 16, 2024, 12:58pm

This change is already impacting nodes not updated to 104. They are loosing races like crazy…
These are 2 nodes on the same machine/IP:

ACarneiro · May 16, 2024, 1:29pm

Yes, I am seeing something similar.
One of my nodes has been upgraded and it’s having significantly more ingress traffic than the others…

donald.m.motsinger · May 16, 2024, 2:42pm

Are you sure it’s not just the display of the graph? I ran the successrate script against one node without and one nod with the update, first column is the whole month, second column is just today (14h so far). Upload is ~99% instead of ~96%. I wouldn’t call that “loosing races like crazy”.

v1.102.3

========== AUDIT ============== 
Critically failed:     0            0
Critical Fail Rate:    0.000%       0.000%
Recoverable failed:    0            0
Recoverable Fail Rate: 0.000%       0.000%
Successful:            39587        932
Success Rate:          100.000%     100.000%
========== DOWNLOAD =========== 
Failed:                187          4
Fail Rate:             0.040%       0.025%
Canceled:              8392         240
Cancel Rate:           1.787%       1.512%
Successful:            460992       15624
Success Rate:          98.173%      98.172%
========== UPLOAD ============= 
Rejected:              0            0
Acceptance Rate:       100.000%     100.000%
---------- accepted ----------- 
Failed:                1063         31
Fail Rate:             0.133%       0.112%
Canceled:              38763        1055
Cancel Rate:           4.859%       3.810%
Successful:            757902       26604
Success Rate:          95.008%      96.078%
========== REPAIR DOWNLOAD ==== 
Failed:                0            0
Fail Rate:             0.000%       0.000%
Canceled:              0            0
Cancel Rate:           0.000%       0.000%
Successful:            78931        3320
Success Rate:          100.000%     100.000%
========== REPAIR UPLOAD ====== 
Failed:                26           0
Fail Rate:             0.097%       0.000%
Canceled:              287          17
Cancel Rate:           1.069%       1.721%
Successful:            26547        971
Success Rate:          98.835%      98.279
========== DELETE ============= 
Failed:                0            0
Fail Rate:             0.000%       0.000%
Successful:            46278        669
Success Rate:          100.000%     100.000%

v1.104.5

========== AUDIT ============== 
Critically failed:     0            0
Critical Fail Rate:    0.000%       0.000%
Recoverable failed:    0            0
Recoverable Fail Rate: 0.000%       0.000%
Successful:            14999        358
Success Rate:          100.000%     100.000%
========== DOWNLOAD =========== 
Failed:                116          6
Fail Rate:             0.030%       0.052%
Canceled:              9691         344
Cancel Rate:           2.500%       2.961%
Successful:            377800       11266
Success Rate:          97.470%      96.987
========== UPLOAD ============= 
Rejected:              0            0
Acceptance Rate:       100.000%     100.000%
---------- accepted ----------- 
Failed:                979          29
Fail Rate:             0.123%       0.106%
Canceled:              24156        204
Cancel Rate:           3.028%       0.744%
Successful:            772558       27199
Success Rate:          96.849%      99.151%
========== REPAIR DOWNLOAD ==== 
Failed:                0            0
Fail Rate:             0.000%       0.000%
Canceled:              0            0
Cancel Rate:           0.000%       0.000%
Successful:            60440        2501
Success Rate:          100.000%     100.000%
========== REPAIR UPLOAD ====== 
Failed:                27           2
Fail Rate:             0.100%       0.196%
Canceled:              197          1
Cancel Rate:           0.728%       0.098%
Successful:            26823        1020
Success Rate:          99.172%      99.707%
========== DELETE ============= 
Failed:                0            0
Fail Rate:             0.000%       0.000%
Successful:            41773        0
Success Rate:          100.000%     0.000%

snorkel · May 16, 2024, 2:46pm

On 16 nodes the graph is down, on the updated node the graph is up. In the same time period… I don’t know what those percentages mean and how accurate they are, but this is what I see.
I don’t know how they calculate those.
Maybe the updated buffed up nodes are selected more frequently, and it dosen’t mean much more races won, but more data ingested.

Copycat · May 16, 2024, 7:15pm

Hi.
2 nodes, first is v1.102.3 - ingress 37GB, second is v1.104.5 - ingress 89GB.
Both are in traffic monitoring. Ingress traffic of the nodes is the same. This is odd.

jtolio · May 16, 2024, 7:23pm

Oh, the storage node’s tracking of ingress data has changed.

This thread is long (Upcoming storage node improvements including benchmark tool), but in it, we discussed one tradeoff, which is that instead of keeping track of actual bandwidth used, it might be sufficient to keep track of ordered bandwidth used instead. This requires significantly less bookkeeping at the point of upload. See this change: https://review.dev.storj.io/c/storj/storj/+/13086

So, in v1.104.x+, the storage node graphs are the amount of bandwidth ordered, not necessarily used, whereas prior to v1.104.x, the graphs were the amount of bandwidth actually used.

Roxor · May 16, 2024, 8:05pm

What does it mean for bandwidth to be ordered? Maybe that’s when you actually win an upload/download race it gets counted (and before bandwidth would be counted even for lost races?). I don’t know: I’m guessing

Toyoo · May 16, 2024, 9:09pm

Thank you for this note! Though what matters is a per-satellite number. Bloom filters work per-satellite.

BrightSilence · May 16, 2024, 10:00pm

I’m pretty sure we’ve seen in the past that those differences can be quite big. The topic you linked also mentioned that that’s what it does “for now”. My earnings calculator is going to show this difference after payouts and make it look like Storj doesn’t pay out what it should for egress. I know this isn’t actually the case, but I would no longer have the data to show and verify that.

jtolio · May 17, 2024, 1:24am

Page 53 (section 4.17) of our whitepaper (https://www.storj.io/storjv3.pdf) has a little diagram of this. Essentially, we send little claims that we call “bandwidth allocations,” or more recently, “orders”, that are essentially something like bank checks signed by the client software made out to the storage node for bandwidth usage. What can happen is the client can fill out this “bandwidth allocation” for 5 MB, but if the node is slow, the client might cancel the upload before the 5MB is actually used. If the client canceled after only 3MB, the old code would have shown 3MB. The new code will now show 5MB.

I believe this change is only a change for ingress, not egress. I think that egress already worked like this (and thus, has matching payouts).

Copycat · May 17, 2024, 4:53am

Thanks for explanation. Therefore a question. What is the sense of such ingress dashboard for SNO? It has no connection to real data and only provide misunderstanding of the real ingress traffic. You have 100Gb yesterday, 150GB today, you node shows you +35GB ingress in Total disk space .
I think SNOs are indifferent to “ordered traffic” or any other developer’s parameters.
To put this parameter graphing to SNO’s dashboard for developers reason is a good idea?

jammerdan · May 17, 2024, 5:10am

So it shows only what the node should have been serving according to customer request instead of what it has actual served?
This is not meaningful or useful. It just tells what a node could have been serving if it fullfilled all orders accordingly. I don’t feel this gives me any real value.

daki82 · May 17, 2024, 5:24am

Why not displaying it as difference form used space once a week?

it may be more usefull, easy to calculate.

actual implementation may be ok for transition time.

BrightSilence · May 17, 2024, 6:02am

Data also gets deleted, so this will be even less accurate.

daki82 · May 17, 2024, 6:44am

We have to rename it to weekly growth,
Why show unprecise ingress anyway?