Announcement: major storage node release (potential config changes needed!)

jammerdan · May 19, 2024, 4:49am

But I have an node for example that is full (< 5GB left). And it shows me ingress 70 bytes.
It is my understanding that this node should not be contacted. But it shows this little ingress.
This was not there before the change. I have never seen something like this.
I had just setup an alert if a node remains at 0 ingress for a day. Now this does no longer work
Currently I do not have a node showing that no-ingress-without-a-cause issue. So I can’t compare. If nodes with such issue would still display 0 then everything would be fine.

So the estimated value is no longer correct?
That is bad too.
As said, as SNO I need correct information.

littleskunk · May 19, 2024, 8:27am

My grafana dashboard (using the metrics enpoint) still shows correct information.

jammerdan · May 19, 2024, 8:39am

This is what gets shown for today:

The node is full with 3.87GB free, so it is below the new limit of 5GB.

When nodes hit the limit before, they did receive 0 ingress. Now after the change it is the first time that I see such tiny ingress numbers on several nodes that are below the limit. But currently my alert is set to exactly 0, so it would not set off.
I would not mind these numbers, if it can still detect the other issue, where there is suddenly no ingress for no apparent reason. If in such cases the ingress is still 0 then it would be fine.

littleskunk · May 19, 2024, 8:44am

You should checkout my grafana dashboard. I am using similar alerts and they trigger within minutes after a problem shows up. Depending on how sensitive you make it ofc. Sometimes an alert might also misfire and I have to adjust it in the other direction.

jammerdan · May 19, 2024, 8:51am

That is the question. I am not looking for a different solution I want to know if the change in recording bandwidth (could-be bandwidth instead of actual bandwidth) could have the result that the issue can no longer or harder be detected.

As said, when the issue is happening, I do not see any “uploaded” or “upload started” in the logs. So I believe nothing should get recorded as bandwidth even with the new way of recording bandwidth. But looking at the full nodes which should not receive any ingress as well, I am doubting because they have some (tiny) (wannabe?) ingress recorded.
If such nodes will have ingress recorded for whatever reason, then the question would be what value to expect. It seems the numbers are tiny, so maybe an alert < 1MB instead of 0 would do then.
But it is hard to determine as I don’t know what causes those tiny ingress numbers.

daki82 · May 21, 2024, 5:28am

Here are just 2 more clues:
Task Manager (Windows)
The rattling of the heads all the time.

jammerdan · May 21, 2024, 5:45am

It is getting harder to tell if uploads are working:

Alexey · May 22, 2024, 1:46am

https://review.dev.storj.io/c/storj/storj/+/13227

jammerdan · May 22, 2024, 5:14am

Revert "storagenode/piecestore: add bandwidth only when settling orders

snorkel · May 22, 2024, 6:20am

Soo… where is that cache kept? In RAM? What if there is very low RAM? Or is full of data from other processes like walkers and etc?

Alexey · May 22, 2024, 6:57am

Then node could be killed and cache will gone.