4 nodes on one server but one node is getting 50% of the data the other nodes are getting

ShareIT · October 13, 2024, 6:17pm

I have spun up a 4-node server (Docker, Debian, Supermicro 12bay, Xeon E5-2650v2, 32GB, 2x8TB 7200 HDD, 2x 1.6TB NVMe) to experiment with a few things. Each node is mapped to a dedicated disk.

Oddly, the 2x NVMe and one of the 8TB disks consistently get 29-30GB per day each, but one of the 8TB is getting exactly half of that (14-15GB).

The disk is perfectly healthy, the logs are clean, vetting status is the same ap1, us1, eu1 fully vetted. Saltlake ~35 %- 45% across the nodes. There is 2% downtime on the eu1 node but that’s the same on all nodes.

The node has been running for about 2 weeks and it’s been like it since day one. Is that expected?

I’ve spun up some other nodes (on different /24) and they are all even.

Any ideas?

Roxor · October 13, 2024, 6:45pm

Since they’re all behind the same IP, the Satellite is trying to balance ingress over them… but after only 2 weeks it may just be luck: that HDD won fewer races for unknown reasons.

Long-term I see closer to 10GB/day/IP but others are seeing higher: so your setup sounds normal. I wouldn’t sweat it.

ShareIT · October 13, 2024, 7:45pm

That doesn’t sound or feel right.

I’m running on 4 physical servers across 4 separate /24 ranges. Each range is doing ~100GB per day and the load is distributed perfectly evenly across the nodes running on each physical server. This particular node gets half of the data consistently - every day.

node-1
node-2

DisaSoft · October 13, 2024, 8:51pm

I have almost the same situation. 2 nodes on the same physical server, started at the same time, share the same IP address, has the same size and working each on dedicated disk of the same model (and probably even from the same batch).
One node systematically gets 2x total ingress than another at least since SLC tests were finished. The difference is mainly because of EU1 (7.5x) and AP1 (4.5x) satellites. Ingress from US1 is almost 1:1.

May be there is some bug in new node choice algorithm?

P.S. Node location - EU.

ShareIT · October 13, 2024, 9:15pm

I’m in the EU too (well, UK but…)

Yep, that one node has 48 GB from EU1. The other 3 have 120GB from EU1.

All 4 nodes are showing 62-65GB from the US1 satellite.

something screwy with EU?

Alexey · October 14, 2024, 5:55am

Hello @ShareIT,
Welcome to the forum!

We use a choice of n on the satellites to prefer nodes with a higher success rate to do not overload weak nodes.
Seems for the satellite this node is looking worse than others. Also, it may be still not vetted. The unvetted nodes can receive only 1%-3% uploads of the customers until got vetted. To be vetted on the satellite, the node should pass 100 audits from it.

ShareIT · October 14, 2024, 1:02pm

Hi Alexey,

Thanks for the welcome.

All the nodes on this server are vetted (except for SLC, which is around ~40%). Granted, there is 2% downtime on all of the nodes, and that was on day 1 of the server going live as we moved it to the DC.

Is there any way of seeing why the EU1 satellite has taken a dislike for this particular node?

Sorry for all the questions, I’m just trying to make sure I understand exactly.

Roxor · October 14, 2024, 1:17pm

Ultimately the Satellites may be doing a perfect job of offering each of your nodes the same chances to receive data. But they do so by suggesting them as options for clients to send files to. And why the client software may end up uploading different amounts of data to different nodes… we’ll never know.

So the Satellites handle opportunities, and the paying-clients-with-the-files handle outcomes. There will be questions Storj can’t answer.

But it’s definately worth watching: like if you’re only 2-weeks in now… will things level out more by the 2-month mark?

Vasabi · October 14, 2024, 2:09pm

В РФ в сравнении с вами вообще трафика почти нет )))

ShareIT · October 14, 2024, 2:40pm

I have over a Petabyte of storage, so even at 100GB per day, it’s going to take a while!

Alexey · October 14, 2024, 3:06pm

Nope. But you may use the success rate scripts:

ShareIT · October 14, 2024, 3:48pm

Oh nice,

How did i miss that!

The pictures below are of the good node and the not-so-good node. Does anything leap out, and are those stats OK?

Alexey · October 14, 2024, 3:58pm

They have a good rate, so seems nothing to worry about.

ShareIT · October 14, 2024, 4:38pm

That’s good to know! It still doesn’t explain why I have one node of the four being shunned by the EU satellite.

@DisaSoft have you run the success scripts and if so, how does it look?

DisaSoft · October 14, 2024, 4:58pm

I use log.level=error, so I have no success records in log to calculate rate correctly, but I don’t think success rate can be a reason in my case because these nodes has absolutely identical hardware and settings. The same traffic from US satellite indirectly confirms this. If one node would perform worse than another due to hardware issue, it should perform worse for US as well.

ShareIT · October 14, 2024, 5:31pm

Luckily, I was about to drop my levels down to error, too, but I didn’t bother.

I agree with you, though. We’re in exactly the same boat. Common servers/hardware/network/etc. I hate not knowing.

Vasabi · October 18, 2024, 2:24am

Посмотрел сегодня, ситуация аналогичная некоторые узлы тоже проучают в половину меньше трафика от eu1 с чем связано не понятно, адреса прямые каждый в своей /24 сети, без vps/vds

arrogantrabbit · October 18, 2024, 3:18am

Поверьте количество нод в том /24 сегменте.

http://storjnet.info/neighbors