Outrageous upload from "some" nodes

Alexey · October 6, 2023, 8:49am

…then they must perform differently, that’s the point.

The problem will be if they start to work the same. This would mean that certain segments of our customers are at risk.

humbfig · October 6, 2023, 8:59am

Now I think you just don’t want to answer my question…
They can not “work the same” (means having the same amount of egress?) because they have no common data. They do not compete. But on average, they should have the same egress if they had been started on the same day. Since they weren’t, older nodes should have more egress (on average). BTW, most days they do have the same average egress (considering relative size). It’s just for some periods (so far this month days 3, 4 and 5) that the egress from the newest node explodes…

Mircoxi · October 6, 2023, 1:30pm

I think this is where your assumption is falling flat - egress is determined by what files the customer wants, and how often. Looking at my own nodes, older data gets accessed less often than newer data, and my newer node was getting more data sent to it while both were accepting new data. I don’t know if that’s intended on Storj’s part where newer nodes get more to “equalise” things, but that doesn’t really matter, it’s the same IP/24, and they’re all vetted.

New data is more frequently accessed on average, so it makes sense that your newer node is seeing the most egress - S3-esque services are perfect for dumping backups in that rotate out over time, and generally you only access those if something went wrong. As far as I can tell, this is just normal usage patterns.

I’m also not really sure what the issue is here? You’re getting paid the same regardless, it shouldn’t really matter from an SNO perspective which node it comes from.

humbfig · October 6, 2023, 2:12pm

Mircoxi,

Please read carefully what I wrote. More than once. For example:

They are holding the “same amount of new data”.

If your new node was getting more data (while already vetted!) than your first, for sure you have an asymmetry on your nodes. Maybe the nodes don’t reside in the same machine and have different success rates… I don’t have such asymmetry.
On average, you should get the same amount of data to your nodes if all is equal. The variance of ingress data to my nodes has been negligible since forever.

Come on man, do I have to do the math?
You could say it’s negligible, I’ll give you that. The issue is obviously not the money.

arrogantrabbit · October 6, 2023, 2:39pm

Assuming the process of node selection is indeed random, including across the nodes in the same ip group (even if it designed this way, it may not behave this way due to bugs, or maybe the node selection involves node ID values in some way in which case it can be very skewed, I did not review the code before writing this, so just hypothesizing), I don’t have reasons to doubt what Alexey is saying, which leaves another possibility — an anomaly: your one random node got a single file that ended up being downloaded millions of time (e.g. some assets on an active web site). You can easily check this from the logs.

humbfig · October 6, 2023, 2:42pm

ok. I can accept that. But even then the golden file is downloaded millions of times just on certain consecutive days of the month, which is odd.
I’ll check the logs.

BTW, thanks for recognizing the anomaly. I thought I was alone in the Universe…

Toyoo · October 6, 2023, 11:02pm

I wouldn’t be surprised if you managed to get some piecies of an extra-popular file while other nodes were restarting due to upgrade. I’ve seen specific single pieces being requested tens of thousands of times. I like these pieces, they stay in memory cache ^^