Negative space rising up really fast

Doddophonique · December 19, 2019, 12:48am

It’s rising at almost 1GB/4minutes. Nothing strange is happening in the logs, Netdata shows that the container is reading and writing on disk, the node was already full (only egress was happening before the update), it doesn’t seem to stop.

Is it gonna crash or something? Is someone else having this behaviour?

Edit 1: i just stopped it and restarted it and it’s now at -10GB. Not rising for the moment.

deathlessdd · December 19, 2019, 12:53am

Did you allow for more space then take it away after? It shouldnt go past what you allowed espically not 33gigs over.

Doddophonique · December 19, 2019, 12:54am

Nope, I did the inverse (i had 2.4TB and made it 2.5TB). It was at -0.26KB (the usual when the node gets full) today, but after the update this is happening.

deathlessdd · December 19, 2019, 12:55am

Mine is also full but was full before today…Nevermind I press Crtl F5 and it fixed it.

nerdatwork · December 19, 2019, 2:31am

Have you kept 10% free space as overhead?

kevink · December 19, 2019, 6:40am

grafik
I have the same problem, node was full since weeks with only ~-7MB and around 6 hours ago when the massive uplaods started, it suddenly went to -29GB without any uploads in the logs.

However my 2nd full node stayed at its capacity. So only one of my 2 full nodes had this change.

Odmin · December 19, 2019, 9:31am

I think we have a problem with calculation when garbage collector move data to the trash folder.
Look into your trash, for me:

or we have wrong calculation before garbage collector identifies this garbage…

michaln · December 19, 2019, 9:45am

Thank you for letting us know about the issue. The problem is reported.

@Odmin thanks for hint with trash folder, it might be good starting point for investigation.

littleskunk · December 19, 2019, 11:18am

The number you see on the fronend is a live tracker. Every stored piece will increase the counter. Delete messages will decrease the counter. Once per hour we sync it with the used spaece on disk.

I have 2 theories.

Garbage collection is moving the piece but telling the live counter that it was deleted. --> Storage node reports to the satellite that is has free space now, fills up that space and the next sync will find out that this decision was a mistake.
Garbage collection is moving the piece and telling the live counter that we have new pieces in the trash folder. --> Storage node will report a big negative free space value, not accept any new pieces and on the next sync find out that this was a visual bug only.

So the question is do you see a big number of upload messages in your logfile (1)? What happens if you restart your storage node and wait for the sync process to finish (2)? You can watch the process with curl localhost:7777/mon/ps

kevink · December 19, 2019, 11:34am

option seems to be the correct one since my node has no new uploads.
But the node has now 42G in the trash folder and the live counter is at -29GB.

Doddophonique · December 19, 2019, 6:37pm

I soft restarted the node yesterday (no rm, just stop -t 300 and then restart) and at first it went to -10.20GB, then, after a couple of minutes, it showed ~77GB free on the node and now is receiving uploads again, currently 55GB free.

I had no upload messages in the logfile before restarting. Right now the behaviour is exactly as it was before the update, so I have no means of analyze it again.

littleskunk · December 20, 2019, 2:21pm

Warning: After a restart the storage node will correct the used space value. In my case it first showed -200GB and after one restart I have 200GB free. My storage node is now filling that space and will crash at some point. We are working on a fix. Meanwhile please reduce your allocated space to avoid a crash.