Announcement: major storage node release (potential config changes needed!)

It shows pending code review and :point_down:

3 Likes

Yes, but no additional information since then. So I don’t know what’s going on with this behind the scenes.

3 Likes

There are differences in Ingress between 2 nodes running in different locations depending on the version. I think this is because fsync is disabled.


There’s no extra ingress yet for 1.104 to take advantage of, is there? SLC hasn’t been pushing data since the start of the month… so this must be normal customer traffic… which could coincidentally be closer to your 104 node?

I’m patiently waiting for the torrent of ones and zeros to start falling from the heavens! :cloud_with_rain:

I thought it was because of fsync :slight_smile:
I guess there is another reason

Behind the scenes: we have been focused on the overall performance improvements represented by v1.104.x. Now that v1.104.x is rolling out we should be able to get back to pause/resume for the filewalker. You could say we paused and are now resuming it :slight_smile:

14 Likes

Does your node really store that number of files for a single satellite?

2 Likes

I have 50 mil for US1 in a 12TB node, so I can imagine 70 mil is very possible.

2 Likes

This is the total number of files for all satellites on the node. There is already a node with 75 million

jtolio, сould you look at this problem? Please :slight_smile:

No one has cared about this error for 2 weeks now :sleepy:
Or at least pass the information on to the people in charge.
I didn’t expect that so much time had passed and no one cared about this problem.

Thanks

This change is already impacting nodes not updated to 104. They are loosing races like crazy…
These are 2 nodes on the same machine/IP:

Yes, I am seeing something similar.
One of my nodes has been upgraded and it’s having significantly more ingress traffic than the others…

Are you sure it’s not just the display of the graph? I ran the successrate script against one node without and one nod with the update, first column is the whole month, second column is just today (14h so far). Upload is ~99% instead of ~96%. I wouldn’t call that “loosing races like crazy”.

v1.102.3

========== AUDIT ============== 
Critically failed:     0            0
Critical Fail Rate:    0.000%       0.000%
Recoverable failed:    0            0
Recoverable Fail Rate: 0.000%       0.000%
Successful:            39587        932
Success Rate:          100.000%     100.000%
========== DOWNLOAD =========== 
Failed:                187          4
Fail Rate:             0.040%       0.025%
Canceled:              8392         240
Cancel Rate:           1.787%       1.512%
Successful:            460992       15624
Success Rate:          98.173%      98.172%
========== UPLOAD ============= 
Rejected:              0            0
Acceptance Rate:       100.000%     100.000%
---------- accepted ----------- 
Failed:                1063         31
Fail Rate:             0.133%       0.112%
Canceled:              38763        1055
Cancel Rate:           4.859%       3.810%
Successful:            757902       26604
Success Rate:          95.008%      96.078%
========== REPAIR DOWNLOAD ==== 
Failed:                0            0
Fail Rate:             0.000%       0.000%
Canceled:              0            0
Cancel Rate:           0.000%       0.000%
Successful:            78931        3320
Success Rate:          100.000%     100.000%
========== REPAIR UPLOAD ====== 
Failed:                26           0
Fail Rate:             0.097%       0.000%
Canceled:              287          17
Cancel Rate:           1.069%       1.721%
Successful:            26547        971
Success Rate:          98.835%      98.279
========== DELETE ============= 
Failed:                0            0
Fail Rate:             0.000%       0.000%
Successful:            46278        669
Success Rate:          100.000%     100.000%

v1.104.5

========== AUDIT ============== 
Critically failed:     0            0
Critical Fail Rate:    0.000%       0.000%
Recoverable failed:    0            0
Recoverable Fail Rate: 0.000%       0.000%
Successful:            14999        358
Success Rate:          100.000%     100.000%
========== DOWNLOAD =========== 
Failed:                116          6
Fail Rate:             0.030%       0.052%
Canceled:              9691         344
Cancel Rate:           2.500%       2.961%
Successful:            377800       11266
Success Rate:          97.470%      96.987
========== UPLOAD ============= 
Rejected:              0            0
Acceptance Rate:       100.000%     100.000%
---------- accepted ----------- 
Failed:                979          29
Fail Rate:             0.123%       0.106%
Canceled:              24156        204
Cancel Rate:           3.028%       0.744%
Successful:            772558       27199
Success Rate:          96.849%      99.151%
========== REPAIR DOWNLOAD ==== 
Failed:                0            0
Fail Rate:             0.000%       0.000%
Canceled:              0            0
Cancel Rate:           0.000%       0.000%
Successful:            60440        2501
Success Rate:          100.000%     100.000%
========== REPAIR UPLOAD ====== 
Failed:                27           2
Fail Rate:             0.100%       0.196%
Canceled:              197          1
Cancel Rate:           0.728%       0.098%
Successful:            26823        1020
Success Rate:          99.172%      99.707%
========== DELETE ============= 
Failed:                0            0
Fail Rate:             0.000%       0.000%
Successful:            41773        0
Success Rate:          100.000%     0.000%

On 16 nodes the graph is down, on the updated node the graph is up. In the same time period… I don’t know what those percentages mean and how accurate they are, but this is what I see.
I don’t know how they calculate those.
Maybe the updated buffed up nodes are selected more frequently, and it dosen’t mean much more races won, but more data ingested.

Hi.
2 nodes, first is v1.102.3 - ingress 37GB, second is v1.104.5 - ingress 89GB.
Both are in traffic monitoring. Ingress traffic of the nodes is the same. This is odd.

1 Like

Oh, the storage node’s tracking of ingress data has changed.

This thread is long (Upcoming storage node improvements including benchmark tool), but in it, we discussed one tradeoff, which is that instead of keeping track of actual bandwidth used, it might be sufficient to keep track of ordered bandwidth used instead. This requires significantly less bookkeeping at the point of upload. See this change: https://review.dev.storj.io/c/storj/storj/+/13086

So, in v1.104.x+, the storage node graphs are the amount of bandwidth ordered, not necessarily used, whereas prior to v1.104.x, the graphs were the amount of bandwidth actually used.

4 Likes

What does it mean for bandwidth to be ordered? Maybe that’s when you actually win an upload/download race it gets counted (and before bandwidth would be counted even for lost races?). I don’t know: I’m guessing :slight_smile:

1 Like

Thank you for this note! Though what matters is a per-satellite number. Bloom filters work per-satellite.

1 Like

I’m pretty sure we’ve seen in the past that those differences can be quite big. The topic you linked also mentioned that that’s what it does “for now”. My earnings calculator is going to show this difference after payouts and make it look like Storj doesn’t pay out what it should for egress. I know this isn’t actually the case, but I would no longer have the data to show and verify that.

1 Like

Page 53 (section 4.17) of our whitepaper (https://www.storj.io/storjv3.pdf) has a little diagram of this. Essentially, we send little claims that we call “bandwidth allocations,” or more recently, “orders”, that are essentially something like bank checks signed by the client software made out to the storage node for bandwidth usage. What can happen is the client can fill out this “bandwidth allocation” for 5 MB, but if the node is slow, the client might cancel the upload before the 5MB is actually used. If the client canceled after only 3MB, the old code would have shown 3MB. The new code will now show 5MB.

I believe this change is only a change for ingress, not egress. I think that egress already worked like this (and thus, has matching payouts).

6 Likes