Outrageous upload from "some" nodes

humbfig · October 4, 2023, 11:02pm

Things you notice…

3 nodes on the same machine (synology) using the same disks.
The three nodes have been coexisting since each was created (same input traffic!!!) and none has ever been full.

node1 (oldest, fully vetted)
size 5.13TB; upload 3.04GB

node2 (2nd oldest, fully vetted)
size 3.96TB; upload 2.68GB

node3 (3th oldest, not vetted)
size 1.36TB; upload 49.58GB

That is today so far.
Last month the node3 had a more than 80GB upload day, while the other two did the usual 2 or 3 GB.

just saying…

daki82 · October 5, 2023, 5:44am

Completely normal phenomen…

new data got requested by clients.

i have this in egress of the dashboard of my new node. so its download from costumer.

Alexey · October 5, 2023, 6:54am

Please clarify - do you really mean uploads to the network (ingress to your nodes) or downloads from the network (egress from your nodes)?

Please note - we always using a customer’s point of view on traffic.

jammerdan · October 5, 2023, 6:59am

I believe he is referring to this:

github.com/storj/storj

Need to retune % of data going to unvetted nodes

opened 11:01PM - 01 Jul 23 UTC

thepaul

Bug

**Description** Currently, in node selection, we select 5% of the nodes from …the pool of unvetted nodes. This was meant to limit how much data went to unvetted nodes. However, (somewhat) recent changes have made it so that nodes can be vetted significantly faster than before, leaving the pool of vetted nodes much smaller. This, in turn, leads to unvetted nodes getting a larger share of data than they would have gotten when we first tuned the 5% value. There are some pretty dramatic bandwidth screenshots illustrating this effect at https://forum.storj.io/t/unvetted-vetted-node-traffic/23086. These might not be reflective of the average node experience, though; a small number of unvetted nodes sharing a /24 network with many vetted nodes would see this bandwidth effect multiplied, and that may be what's happening there. **Possible Fix** We should lower the 5% parameter. How much we should lower it is not totally clear to me after researching the historical fraction of unvetted nodes, but we should probably lower it to be at most 2%. Distinct last_nets with unvetted nodes currently make up about 2.8% of all distinct last_nets, so we probably want the new value lower than that, at least. The parameter is `overlay.node.new-node-fraction` in satellite config.

humbfig · October 5, 2023, 11:00am

Downloads from the network (egress from my nodes).
I really should change the title of the post…

humbfig · October 5, 2023, 11:02am

Notice that I wrote “The three nodes have been coexisting since each was created (same input traffic!!!) and none has ever been full.”

humbfig · October 5, 2023, 11:36am

I really shouldn’t drink before posting…
All 3 nodes are fully vetted. The third node was vetted in less than a month a long time ago (6 months).
Instead of “vetted” I meant “still subject to held amount”.

jammerdan · October 5, 2023, 11:46am

Don’t drink and write?

daki82 · October 5, 2023, 11:47am

Happens to me too sometimes.

But it depends on the data of the nodes. When the newest node stored the pieces, its get the egress when its requested by the client.

I think its pure coincidence.

daki82 · October 5, 2023, 11:50am

Looking at the numbers and my nodes i guess its egress

humbfig · October 5, 2023, 12:36pm

All three nodes have been “ingressing” during all their common existence. Maybe during the vetting month of the third node it got a bit more ingress. So, should the present difference in egress be related to the extra ingress during the vetting month?

humbfig · October 5, 2023, 12:38pm

yep. I should go for a drive instead…

Alexey · October 6, 2023, 3:53am

There is no strong relations between ingress and followed egress: the customers may upload some static data like pictures or videos then stream them from their site, generating egress from your nodes, but some uses it as a backup and never download them back, only deletes, so there is no stable usage pattern.

humbfig · October 6, 2023, 4:18am

Yep. Nothing new, really…
Can you explain why one of my nodes get streamable pictures and videos and the other two get static data?
All three nodes have been ingressing in parallel since creation. If the older nodes would do more egress, that would be alright because they’ve got data that the newest node has never seen.
But the data that the newest node holds “is also present” in the other 2 nodes.
No stable usage pattern doesn’t mean that statistics should make no sense.

Alexey · October 6, 2023, 4:25am

it’s pure random. This particular node is happen to get an active pieces. I guess all your nodes are behind the same /24 subnet of public IPs, thus they cannot get pieces of the same segment of the same file, so all three stores pieces of different segments of different files and likely - of different customers.
There are 20K+ of active nodes in the network. Why these customers should select your three nodes for the similar usage? Even if they happen to upload to your node nearly in the same time? How can statistics prove otherwise?

humbfig · October 6, 2023, 8:27am

You are in denial…
The nodes are not only in the same subnet, they also have the same IP (same machine, synology)
Forget the previous ingress of the 2 oldest nodes. Let’s take into account only the ingress that the three nodes got since there are three nodes in parallel. Let’s pretend I created the 3 nodes on the day I actually started the third.
Each node gets roughly the same amount of ingress everyday. Therefore, let’s say each node holds now ~1.37TB.
Here is this month ingress for the three equal nodes started on the same day:

node 1: 3; 2.8; 4.4; 3.1; 2.6;
node2: 2.6; 2.4; 4.5; 2.9; 2.9
node3: 1.7; 1.7; 16.7; 50.4; 12.5

I can’t recover last month egress. The dashboard is not working (zero). But I remember seeing a day where the 3th node did more than 80GB while the other 2 nodes did 2 or 3 GB. This is statistically impossible!!!

It’s obviously not possible for the 3 nodes to have the same segments of a file. Whatever egress I have, it can not be chosen to come from a certain node instead of the others. If this “thing” is only happening to me and not to other node operators (excess egress from new nodes), then the statistically “impossible” is happening. But I don’t think it is happening…

I will tell you what the “thing” could be that would explain the absurd egress I showed you. You don’t have to tell me that it’s true or false. Just tell me if you agree or not that it would explain the absurd egress.

You have programmed on the network an egress preference for the new nodes (nodes that are not being paid 100%).

Alexey · October 6, 2023, 8:31am

For honestly, I do not understand, why do you expect the same behaviour?
They’re even updated in a different time, I’m sure.
So, why do you expect the same behaviour for the independent nodes?
Yes, they are in the same subnet of public IPs (local IPs doesn’t matter), but why do you think, that they are dependent?
Of course, since they uses the same disk/pool, they are affect each other because of the concurrency access to the same resources, but it doesn’t make them depended regarding usage, moreover, they must be accessed by different customers, at least store different segments of different files.

By the way, there is no “egress preferring”, in this regard they are equal to other 79 nodes, keeps pieces of the same segment.
The satellite selects 39 different nodes, stores this segment for download, as soon the customer downloaded first 29 pieces, all remained got canceled.

humbfig · October 6, 2023, 8:39am

Read what I wrote more carefully. I would expect the same average behaviour if the nodes had been created on the same day. Since the older nodes are larger, the expected average behaviour in terms of egress should favor the older nodes, not the new one.
The “old concept” of “new data having more egress” does not apply here, since the three nodes have always been working in parallel. They hold the same amount of “new data”.
I don’t know what you mean by the nodes being dependent. The only relation between them is not having common data.
The nodes reside in different disks. Same success rates.
You didn’t answer… would it explain the absurd egress or not?

Alexey · October 6, 2023, 8:43am

Why do you expect the same behaviour? This part I do not understand. They are completely different nodes, they have a different NodeId, they are selected only one from the /24 subnet of public IPs, this is expected, that they will act differently, this is by design.

The problem likely start if they start to perform the same.
Just remember our goal - we want to be decentralised as much as possible (we are expect that nodes will act differently).

humbfig · October 6, 2023, 8:48am

If three nodes are started at the same time, same IP, same machine, same brand/model of disks, I expect the same average behaviour. Let’s say, for nodes 1 and 2, I get roughly the same egress (on average) considering the different sizes of the two nodes. Never had 40 times more egress from one of the nodes.
Again, would such a preference, if it existed, explain the abnormal egress?