Interesting what datacenter is down?

Vadim · August 11, 2023, 5:03pm

Iti is -500 nodes and -3PB at one shot. looks like we have big, very big wails here gone offline for some time.

ACarneiro · August 11, 2023, 5:08pm

Ouch!
Let the repair data flow…

Vadim · August 11, 2023, 5:09pm

it was 3pb of free space, dont know about data even.

ACarneiro · August 11, 2023, 5:10pm

Ah, that’s a very good point.

Interesting find, nonetheless. I wonder if anyone knows what’s going on…

daki82 · August 11, 2023, 6:17pm

Could be cloudflare maintenance rerouting

Ruskiem · August 11, 2023, 6:45pm

Well i might. i suspect those nodes were actually G-exiting, because i just looked at the network traffic, and all my nodes today suddenly dropped in ingress by a ~50% all at once. Soooo mayby all that highter traffic for some days till now, that i was glad to welcome, wasnt customers, but those nodes exiting… just my thesis.

Vadim · August 11, 2023, 6:52pm

500 nodes at once? litle bit too much for exit at same time. My stats show that repair ingress in only 1/12 from total ingress, so no it is not exit traffic.

BrightSilence · August 11, 2023, 6:55pm

There’s no repair involved in graceful exit. Just piece transfers between nodes. So it wouldn’t be counted under repair. That said, I’m not buying the GE theory. Seems unlikely. Traffic patterns change, don’t worry too much about it.

Ruskiem · August 11, 2023, 7:00pm

i mean strange, coz the fact is last days even weeks was intense for my nodes, like 100-50Mbps ingress total non-stop, its aggregated data for all nodes. And suddenly few hours ago, or less it’s constant40-20Mbps, so i guess whatever party was that, its over. Waiting for another.

Th3Van · August 12, 2023, 12:56am

We did some router maintenance in our DC 2023-08-11T10:00:00Z→2023-08-11T11:00:00Z and it affected our sandbox environment, where the server holding 501 backup storage nodes are placed. They are meant to be used if any of the primary nodes crashes or gets DQ.

16 emails later, I found out that all backup nodes was down, but I was unable to get the server up and running again until later that same evening, since i had to attend to a nice dinner at an Italian restaurant.

So I guess that this could be the cause of the “wave”

Th3Van.dk

daki82 · August 12, 2023, 11:03am

I don’t even want the answer why you need 501 backup nodes…and if they all met the min requirements.

Alexey · August 12, 2023, 11:29am

There are huge Operators too

deathlessdd · August 12, 2023, 11:40am

Theres a few whales with 1000+ nodes making over 17000 storj a month The whales are the future of storj. Great example of how bad storj has become if you have over 500+ nodes in a single location. You can bet there on there own subnet as well getting there own data.

daki82 · August 12, 2023, 12:22pm

And i wish i had the hardware for that.

Th3Van · August 12, 2023, 12:58pm

All 501 backup nodes are sharing one single public IP address, and are therefore threaded as one single node by the satellites.

Hardware running the 501 nodes :

Intel Xeon W-3323 CPU @ 3.50GHz
64 GB Memory
2 TB Samsung 980 PRO for boot drive
MegaRAID 9560-16i
8 x 4 TB Samsung SSD 860 QVO (RAID6) for Storage nodes

Th3Van.dk

Walter1 · August 12, 2023, 1:43pm

Do you have any source for the few wales? How to check how they exist?

Walter1 · August 12, 2023, 1:49pm

Don’t know really how you can use those bad QLC SSD’s. How do you setup the nodes as a backup exactly? Because 500 Nodes would bring the CPU into 100% load and crash the BS? Are you using Ubuntu-Server or Debian?

Thanks and kind regads,

deathlessdd · August 12, 2023, 1:54pm

Yeah you can check when the payout happens…Ive posted it a few times.

MattJE96011 · August 12, 2023, 2:31pm

That processor is more than enough for 500 nodes on the same subnet. I’d figure it could handle a good majority of them on separate subnets even during the more demanding times for Storj. Memory on the other hand definitely not as the nodes grow. But for backup nodes on a single subnet, totally doable.

digitalfrank · August 12, 2023, 2:31pm

Hi, why the 860 QVO is not good? I’ve 4 of 870 QVO in raid 10 for my vm machines and nodes but is very speed and actually is on from 2 years and nothing errors or corruption.