Behind the scenes: we have been focused on the overall performance improvements represented by v1.104.x. Now that v1.104.x is rolling out we should be able to get back to pause/resume for the filewalker. You could say we paused and are now resuming it
Does your node really store that number of files for a single satellite?
I have 50 mil for US1 in a 12TB node, so I can imagine 70 mil is very possible.
This is the total number of files for all satellites on the node. There is already a node with 75 million
jtolio, сould you look at this problem? Please
No one has cared about this error for 2 weeks now
Or at least pass the information on to the people in charge.
I didn’t expect that so much time had passed and no one cared about this problem.
Thanks
This change is already impacting nodes not updated to 104. They are loosing races like crazy…
These are 2 nodes on the same machine/IP:
Yes, I am seeing something similar.
One of my nodes has been upgraded and it’s having significantly more ingress traffic than the others…
Are you sure it’s not just the display of the graph? I ran the successrate script against one node without and one nod with the update, first column is the whole month, second column is just today (14h so far). Upload is ~99% instead of ~96%. I wouldn’t call that “loosing races like crazy”.
v1.102.3
========== AUDIT ==============
Critically failed: 0 0
Critical Fail Rate: 0.000% 0.000%
Recoverable failed: 0 0
Recoverable Fail Rate: 0.000% 0.000%
Successful: 39587 932
Success Rate: 100.000% 100.000%
========== DOWNLOAD ===========
Failed: 187 4
Fail Rate: 0.040% 0.025%
Canceled: 8392 240
Cancel Rate: 1.787% 1.512%
Successful: 460992 15624
Success Rate: 98.173% 98.172%
========== UPLOAD =============
Rejected: 0 0
Acceptance Rate: 100.000% 100.000%
---------- accepted -----------
Failed: 1063 31
Fail Rate: 0.133% 0.112%
Canceled: 38763 1055
Cancel Rate: 4.859% 3.810%
Successful: 757902 26604
Success Rate: 95.008% 96.078%
========== REPAIR DOWNLOAD ====
Failed: 0 0
Fail Rate: 0.000% 0.000%
Canceled: 0 0
Cancel Rate: 0.000% 0.000%
Successful: 78931 3320
Success Rate: 100.000% 100.000%
========== REPAIR UPLOAD ======
Failed: 26 0
Fail Rate: 0.097% 0.000%
Canceled: 287 17
Cancel Rate: 1.069% 1.721%
Successful: 26547 971
Success Rate: 98.835% 98.279
========== DELETE =============
Failed: 0 0
Fail Rate: 0.000% 0.000%
Successful: 46278 669
Success Rate: 100.000% 100.000%
v1.104.5
========== AUDIT ==============
Critically failed: 0 0
Critical Fail Rate: 0.000% 0.000%
Recoverable failed: 0 0
Recoverable Fail Rate: 0.000% 0.000%
Successful: 14999 358
Success Rate: 100.000% 100.000%
========== DOWNLOAD ===========
Failed: 116 6
Fail Rate: 0.030% 0.052%
Canceled: 9691 344
Cancel Rate: 2.500% 2.961%
Successful: 377800 11266
Success Rate: 97.470% 96.987
========== UPLOAD =============
Rejected: 0 0
Acceptance Rate: 100.000% 100.000%
---------- accepted -----------
Failed: 979 29
Fail Rate: 0.123% 0.106%
Canceled: 24156 204
Cancel Rate: 3.028% 0.744%
Successful: 772558 27199
Success Rate: 96.849% 99.151%
========== REPAIR DOWNLOAD ====
Failed: 0 0
Fail Rate: 0.000% 0.000%
Canceled: 0 0
Cancel Rate: 0.000% 0.000%
Successful: 60440 2501
Success Rate: 100.000% 100.000%
========== REPAIR UPLOAD ======
Failed: 27 2
Fail Rate: 0.100% 0.196%
Canceled: 197 1
Cancel Rate: 0.728% 0.098%
Successful: 26823 1020
Success Rate: 99.172% 99.707%
========== DELETE =============
Failed: 0 0
Fail Rate: 0.000% 0.000%
Successful: 41773 0
Success Rate: 100.000% 0.000%
On 16 nodes the graph is down, on the updated node the graph is up. In the same time period… I don’t know what those percentages mean and how accurate they are, but this is what I see.
I don’t know how they calculate those.
Maybe the updated buffed up nodes are selected more frequently, and it dosen’t mean much more races won, but more data ingested.
Hi.
2 nodes, first is v1.102.3 - ingress 37GB, second is v1.104.5 - ingress 89GB.
Both are in traffic monitoring. Ingress traffic of the nodes is the same. This is odd.
Oh, the storage node’s tracking of ingress data has changed.
This thread is long (Upcoming storage node improvements including benchmark tool), but in it, we discussed one tradeoff, which is that instead of keeping track of actual bandwidth used, it might be sufficient to keep track of ordered bandwidth used instead. This requires significantly less bookkeeping at the point of upload. See this change: https://review.dev.storj.io/c/storj/storj/+/13086
So, in v1.104.x+, the storage node graphs are the amount of bandwidth ordered, not necessarily used, whereas prior to v1.104.x, the graphs were the amount of bandwidth actually used.
What does it mean for bandwidth to be ordered? Maybe that’s when you actually win an upload/download race it gets counted (and before bandwidth would be counted even for lost races?). I don’t know: I’m guessing
Thank you for this note! Though what matters is a per-satellite number. Bloom filters work per-satellite.
I’m pretty sure we’ve seen in the past that those differences can be quite big. The topic you linked also mentioned that that’s what it does “for now”. My earnings calculator is going to show this difference after payouts and make it look like Storj doesn’t pay out what it should for egress. I know this isn’t actually the case, but I would no longer have the data to show and verify that.
Page 53 (section 4.17) of our whitepaper (https://www.storj.io/storjv3.pdf) has a little diagram of this. Essentially, we send little claims that we call “bandwidth allocations,” or more recently, “orders”, that are essentially something like bank checks signed by the client software made out to the storage node for bandwidth usage. What can happen is the client can fill out this “bandwidth allocation” for 5 MB, but if the node is slow, the client might cancel the upload before the 5MB is actually used. If the client canceled after only 3MB, the old code would have shown 3MB. The new code will now show 5MB.
I believe this change is only a change for ingress, not egress. I think that egress already worked like this (and thus, has matching payouts).
Thanks for explanation. Therefore a question. What is the sense of such ingress dashboard for SNO? It has no connection to real data and only provide misunderstanding of the real ingress traffic. You have 100Gb yesterday, 150GB today, you node shows you +35GB ingress in Total disk space .
I think SNOs are indifferent to “ordered traffic” or any other developer’s parameters.
To put this parameter graphing to SNO’s dashboard for developers reason is a good idea?
So it shows only what the node should have been serving according to customer request instead of what it has actual served?
This is not meaningful or useful. It just tells what a node could have been serving if it fullfilled all orders accordingly. I don’t feel this gives me any real value.
Why not displaying it as difference form used space once a week?
it may be more usefull, easy to calculate.
actual implementation may be ok for transition time.
Data also gets deleted, so this will be even less accurate.
We have to rename it to weekly growth,
Why show unprecise ingress anyway?