Bug: SN gross space limit overstep

Storgeez · June 18, 2020, 4:01pm

SN overstepped its allocated space by quite a bit today. Quite a bug.
Storage

Total allocated space is 5 TiB, it is using 5.15 TiB currently.

julez · June 18, 2020, 4:17pm

I have just noticed the same on two of my nodes:

               Available         Used      Egress     Ingress
 Bandwidth           N/A      55.4 GB     51.7 GB      3.7 GB (since Jun 1)
      Disk     -161.1 GB     661.1 GB



               Available       Used       Egress      Ingress
 Bandwidth           N/A     0.7 TB     269.7 GB     451.5 GB (since Jun 1)
      Disk     -345.9 GB     6.3 TB

Storgeez · June 18, 2020, 4:22pm

This is an insanely large issue as it might lead to database/system implosion if the free space runs out. I don’t understand why the SN doesn’t check the remaining space before every operation. I’ve had enough free space available in this case, but it just as easily could have been otherwise fully filled, causing further issues.

baker · June 18, 2020, 5:23pm

—edit— OP’s problem seems to be unrelated to what I wrote below

I have had this happen on several occasions, and it always seems come hand in hand with 1) file system inconsistencies which lead to 2) database issues. In my case it seems that problems with (what is currently called) the storage_usage database caused the node to think more space is available than there actually is and thus over-allocates.

I can’t say this is the problem you are having, but it won’t hurt to run consistency checks on your file system and on your database as described here, just in case you are having a similar issue to mine.

twl · June 18, 2020, 5:38pm

Funny enough, just yesterday I deliberately assigned a bit more space than the required 10% overhead on one my nodes

Storgeez · June 18, 2020, 5:54pm

There was no negative free space yesterday, it must’ve filled the trash and gone overboard during the last 24 hours. I didn’t run out of space completely, luckily, but it was getting close, I didnt have time to react as it happened so suddenly. So databases should be fine but using so much extra space isn’t.

BrightSilence · June 18, 2020, 6:22pm

I’m almost certain this is a reporting issue rather than the node actually going over the limit. Currently stefan-benten is doing heavy garbage collection on my node (and probably on everyone’s nodes) and I’m seeing my used space increase by large amounts, while there is no more uploads than the last few days.

It looks to me like when garbage collection copies files to trash, it adds that size to the used space, but the freed up space in blobs is never subtracted. The stats may be fixed when you restart your node. Haven’t tried that yet myself, but if you want you can give that a try.

Storgeez · June 18, 2020, 6:49pm

Fair assertion, but my filesystem reports extra 150 GiB in use on the data path. I’m running an enumeration to confirm the blobs folder size.

BrightSilence · June 18, 2020, 8:53pm

My file system has maybe 10-20GB used extra compared to yesterday, while my node dashboard shows it’s using 500GB extra usage, which just happens to be almost exactly what’s stored in the trash folder for the stefan-benten satellite. I don’t really want to reboot the node while it’s still processing these deletes. I have more than enough free space anyway to just let this play out. It’ll take a loooong time though. After 16 hours of processing, it’s now working on pieces starting with an H. It seems to be doing them in alphabetical order, except it’s starting with letters and then the numbers. So, it’s probably at around 20%…
It probably doesn’t help that an extended SMART scan was already running on the HDD’s in my array when this started. That’s almost done now. Hopefully it’ll speed up after that.

Storgeez · June 18, 2020, 10:26pm

Okay, my command finished - 4.5 TiB in the blobs folder and almost ~680 GiB in the trash folder. 5 TiB allocated, 5.15 TiB consumed, SN overstepped allocated space by 0.15 TiB, a huge difference. Something’s wrong with the free space honoring calculations. Again.

julez · June 18, 2020, 11:32pm

And back to normal now:

               Available         Used      Egress     Ingress
 Bandwidth           N/A      55.9 GB     52.2 GB      3.7 GB (since Jun 1)
      Disk       44.8 MB     500.0 GB


               Available       Used       Egress      Ingress
 Bandwidth           N/A     0.7 TB     273.9 GB     453.1 GB (since Jun 1)
      Disk       16.0 GB     6.0 TB

Storgeez · June 19, 2020, 6:07pm

Okay, the web GUI went to positive free space now, but used disk space is still 150 GiB over the limit and trash is the same size.

SoundHound · June 20, 2020, 4:06am

Definitely some strangeness. I seem to have manufactured 1TB of data out of thin air.

Same node, 11 days apart:

2020-06-19 23:44
               Available         Used       Egress     Ingress
 Bandwidth           N/A     412.3 GB     395.1 GB     17.3 GB (since Jun 1)
      Disk       -1.0 TB       9.0 TB

2020-06-08 19:56
               Available         Used       Egress     Ingress
 Bandwidth           N/A     182.6 GB     171.0 GB     11.6 GB (since Jun 1)
      Disk     -150.4 MB       8.0 TB

tyakimov · June 20, 2020, 11:23am

v1.5.2 also started reporting wrong remaining allocated space a week ago without undergoing any version updates (or service/node restarts/reboots). appears to be a reporting-only issue in web and cli dashboards, as the node is not ingesting ant new data.

Storgeez · June 20, 2020, 4:18pm

Yep, that seems to be a reporting error.

Or out of helium in case you’re using one of the newer drives! Have you checked manually how much space your node is actually using?

SoundHound · June 20, 2020, 5:08pm

Helium indeed! As others have mentioned, it seems to be a reporting issue. But I thought it was a bit humorous. It was certainly a surprise.

graphtek · June 20, 2020, 10:59am

Bonjour,
J’ai un noeud qui me dit dans le journal : 2020-06-20T10:48:26.553Z WARN piecestore:monitor Used more space than allocated. Allocating space {“bytes”: 7000000000000}
mais il me reste encore de la place dans le disque

BrightSilence · June 20, 2020, 2:19pm

tyakimov · June 28, 2020, 3:10pm

its just bad reporting stemming from the calculation of available disk space relative to the one week file retention in the trash folder. i had a bunch of trash that was auto deleted at the predetermined time, so all is good now.

Storgeez · June 28, 2020, 5:58pm

It is not the trash that is miscalculated, I found that to be ok, it is the non-trash data that has misreported size.