Node overloaded - traffic flooding

You can read more about it here: Updates on Test Data

Windows: Node Version: v1.102.3

While my Raspi on same v1.102.3 was running so smoothly I couldnā€™t take my eyes off of it. Reminded me of early beta days when new data was pushed.

1 Like

Ok that sounds like the new version should fix it. There was a free space sys call that was very expensive on Windows.

1 Like

mine nodes are .102 version on docker

I wonder how much of the network is on 1.104ā€¦ I think around 20% now? This would be a great test of its new write caching features.

Oh, and I was wondering why my nodes crashed. But mine are at v102 at the moment? And lazy Filewalker was running at that moment. Can I disable it somehow?

Just want to say all my nodes are operational.
Taking 25% of network ability currently.
2 minutes ago i noticed a spike of 50% networkā€™s ability which is ~500Mbps of ingress (download to my nodes) and had no problem. Reporting: iā€™m ready for more!

Edit: itā€™s 13 nodes (13 HDDā€™s for total of 150TB)
@Roxor

@jammerdan
not much, yesterday i checked all and found only 2.
majority is on some 1.102.x

1 Like

How many nodes is that across? I canā€™t imagine holding that rate across just two or three! (although Iā€™m jealous if thatā€™s what youā€™re getting :star_struck: )

How many on v1.104 yet?

What I would suggest is that we go ahead with the rollout for 1.104.5 before running any more benchmarks.

The reason being simple: A lot of SNOs would be running used-space when upgrading, since that fixes the trash problem. On top of that, weā€™ve had a lot of deletes so they would still be running trash-fw as well (trash-clean is still running for 2024-05-03 on some of my nodes for example). Pile on top of that GC runs.

Asking >now< to benchmark the network is not the best time, IMHO. Let the network go through the update (we have already established that it helps a lot), let the SNOs go through used-space and let the GCs/trash cleanup finish for the heavy deletes. Then benchmark to your heartā€™s desire.

So far I havenā€™t had any node crash. All running 1.104.x (still waiting on used-space to finish on some of them since 1.104.1).

3 Likes

We are not running these tests for fun. The full story you can read up here: Updates on Test Data

Arenā€™t the results of those benchmarks skewed because nodes on fsync are not as fast as they should be?

If the target would be to measure storage node performance than yes. But you know that isnā€™t the only component that needs to keep up with the loadā€¦

1 Like

I know, which is why I suggested that a crashed node because it chocked on fsync writes isnā€™t going to provide any meaningful data. Iā€™m not saying pause the benchmarks for the next 5 years, Iā€™m saying pause them for the next week.

1 Like

Sounds like you didnā€™t understand me. The target is to test the performance of the satellite. For that test the storage node performance is irrelavant and even crashing a few nodes still gives us more than enough data to keep improving the satellite performance.

1 Like

I would like to add that node crashes could provide valuable information on how the satellite manages situations where many nodes suddenly disappear, whether due to an inability to handle the load or network issues.

1 Like

3 nodes here 1 restarted around the time of the test, but appears to have updated, not crashed. the other two are on 102, and all of them appear have to survived the test.

I saw a spike around 500Mbps too toward what appears to be the end of that test run. Must have been a lot of nodes around me drop out.

My nodes are running on pretty stout hardware, definitely not economy stuff thatā€™s recommended. But, not enterprise grade either.

same here, my nodes crashed, server loads went up to 1000, the whole server almost freezed (SSH canā€™t work), what the hell happened?

my nodes are at 102.3

Same here, two node on Qnap arm, 8Tb, 1.5Gb Ram, they goes offline.
Version 1.102.3.
The other node on Windows, N100 with 16Gb of Ram, 2Tb, always 1.102.3, run smoothly

Exactly why I keep ā€œstorage2.max-concurrent-requests: 10ā€ uncommented.