Nodes in the same machine with different ingress

humbfig · July 19, 2023, 12:21pm

I have 3 nodes running in a single Synology. 2 of them share the same disk, the other in a different disk. Don’t ask why, it’s about legacy.
Ever since all nodes are vetted, they get the same ingress.
Since ~1 week ago, one of the nodes (the oldest one) is getting almost 3 times the ingress of each of the other 2.
Also, the ingress of a single node in another location used to correlate with these 3 nodes. It used to be the same as the 3 nodes combined. It used to make sense. Now it doesn’t correlate in anyway. And both sites have no storj neighbours.
Can someone explain this?

michaln · July 19, 2023, 2:05pm

hey @humbfig,
could you provide some details about your nodes, like:

ids of affected nodes
info if ingress is different for all satellites or maybe one specific
any additional details you think is important

michaln · July 19, 2023, 2:24pm

Node IDs and other sensitive data can be send by DM or by creating support ticket. Thanks in advance.

humbfig · July 19, 2023, 3:33pm

Here is a fact-table. This is about ingress only. Apparently, things started to not make sense on the 15th of July and only for the US1 satellite.

Expected behaviour:
i) Node 4 = Node 1 + Node 2 + Node 3
ii) Node 1 = Node 2 = Node 3

The “expected” happened since all 4 nodes were vetted.

Apparently, what doesn’t make sense is nodes 2 and 3 getting little download, because if nodes 2 and 3 were getting the same download as node 1 (as they should), then Expected behaviour(i) would be right.
PS- One could also notice that on EU1, the ingress on the node1 is somewhat higher than in the rest of the “parallel” nodes (2 and 3) since the 15th. Being the only one running tha v1.81.3, it makes you wonder…

Other info: Node 1 is the only one running v1.81.3. All other nodes are running v1.82.1. I can’t say when each node was upgraded.

Node ids will be DMd. I didn’t know that was sensitive information…

digitalfrank · July 19, 2023, 6:08pm

also i have this issue. On 22 nodes only the nodes with 1.81.3 have identical ingress, others on 1.82.1 is more down

Ambifacient · July 19, 2023, 6:33pm

1.82.1 fixed an ingress reporting issue. storagenode/piecestore: fix ingress graph skewed by larger signed orders · storj/storj@b6026b9 · GitHub

humbfig · July 19, 2023, 7:14pm

That would settle it…
I just have to wait until node1 upgrades to v1.82.1 for the world to start making sense again…

PS- Actually not. Node4 is on v.1.82.1, so, that wouldn’t explain Expected behaviour(i)… if node4 was upgraded to v1.82.1 before the 18th, which I think it was…
If there is a problem with 1.81.3 reporting higher ingress than it should, then Expected Behaviour should be:
i) node 2 = node3 (check!)
ii) node4 = 3 X node2 (or node3) (No check!)

snorkel · July 19, 2023, 7:45pm

I imagine node 2 and 3 are on the same disk. Is the disk OK? No baddies, no errors etc? Are the db-es OK?

Toyoo · July 19, 2023, 7:55pm

I also wonder if there might be any performance differences between your nodes.

You can try comparing just the number of upload attempts—satellites consider each piece as equal for node selection, regardless of the piece size. Looking at just the number of times your node was selected as an upload targets reduces variance of the comparison, but most importantly here disables the impact of failed uploads.

You can either parse your node logs for the number of upload attempts, or, if you have debugging enabled, look at the upload_started_count counter (side note: it would be nice if this counter was per satellite, and even better if it was a rate meter!).

If these numbers will be equal, then we’ll know the difference comes from failed/canceled uploads.

humbfig · July 19, 2023, 7:58pm

That would explain a lot. But no. Nodes 1 and 2 are on the same disk. Node3 is alone in one disk that serves no other purpose. Not a single bad sector in any of the disks, ever.

humbfig · July 19, 2023, 8:10pm

Well, you tell me. I see no performance difference. And anyway, before the July 15th, the world made sense…

node1

node2

node3

node4

Also, some of the numbers you point out, I don’t know how to get them. Debug is not enabled.

Toyoo · July 19, 2023, 10:20pm

Indeed, this looks a bit weird. Can’t help more than that, sorry, but this probably should be useful to Storj engineers.

Alexey · July 20, 2023, 5:07am

do you mean upload to the nodes (ingress) or download from the nodes (egress)?
We always use the customer’s point of view, so need clarification.

snorkel · July 20, 2023, 7:33am

From the first post I get that he reffers to ingress.

Alexey · July 20, 2023, 7:55am

I hope so. Because downloads = egress and egress can vary.

snorkel · July 20, 2023, 8:19am

I’ll ask the obvious question: are any of them full?
Guessing that the answear is NO, I’ll stop all of them, rm them, restart the machines, update all the packages and DSM, manual update the storagenode image on all, start the nodes, just to get them on the same version, and the software up-to-date. The last dsm and docker are working perfectly with storj, so no worries.

humbfig · July 20, 2023, 9:13am

I mean ingress!
This POV thing is very confusing…

humbfig · July 20, 2023, 9:20am

None of the nodes are full. DSM is the latest version. Packages, including docker, are all updated. Node1 updated to v1.82.1 last night. So, all nodes on the same version.

PS- I’ll be offline doing IRL stuff until Monday or Tuesday. I’ll post my new ingress statistics when I get back.

humbfig · July 27, 2023, 2:33pm

Hi!
Just to say that the world makes sense again.
In the meanwhile all nodes were the same version (now, not anymore) and numbers started to add up.
Thank you all for your help.