Bug: SN gross space limit overstep

SN overstepped its allocated space by quite a bit today. Quite a bug.
Storage

Total allocated space is 5 TiB, it is using 5.15 TiB currently.

I have just noticed the same on two of my nodes:

               Available         Used      Egress     Ingress
 Bandwidth           N/A      55.4 GB     51.7 GB      3.7 GB (since Jun 1)
      Disk     -161.1 GB     661.1 GB



               Available       Used       Egress      Ingress
 Bandwidth           N/A     0.7 TB     269.7 GB     451.5 GB (since Jun 1)
      Disk     -345.9 GB     6.3 TB

This is an insanely large issue as it might lead to database/system implosion if the free space runs out. I don’t understand why the SN doesn’t check the remaining space before every operation. I’ve had enough free space available in this case, but it just as easily could have been otherwise fully filled, causing further issues.

—edit— OP’s problem seems to be unrelated to what I wrote below

I have had this happen on several occasions, and it always seems come hand in hand with 1) file system inconsistencies which lead to 2) database issues. In my case it seems that problems with (what is currently called) the storage_usage database caused the node to think more space is available than there actually is and thus over-allocates.

I can’t say this is the problem you are having, but it won’t hurt to run consistency checks on your file system and on your database as described here, just in case you are having a similar issue to mine.

1 Like

Funny enough, just yesterday I deliberately assigned a bit more space than the required 10% overhead on one my nodes :slight_smile:

There was no negative free space yesterday, it must’ve filled the trash and gone overboard during the last 24 hours. I didn’t run out of space completely, luckily, but it was getting close, I didnt have time to react as it happened so suddenly. So databases should be fine but using so much extra space isn’t.

I’m almost certain this is a reporting issue rather than the node actually going over the limit. Currently stefan-benten is doing heavy garbage collection on my node (and probably on everyone’s nodes) and I’m seeing my used space increase by large amounts, while there is no more uploads than the last few days.

It looks to me like when garbage collection copies files to trash, it adds that size to the used space, but the freed up space in blobs is never subtracted. The stats may be fixed when you restart your node. Haven’t tried that yet myself, but if you want you can give that a try.

3 Likes

Fair assertion, but my filesystem reports extra 150 GiB in use on the data path. I’m running an enumeration to confirm the blobs folder size.

My file system has maybe 10-20GB used extra compared to yesterday, while my node dashboard shows it’s using 500GB extra usage, which just happens to be almost exactly what’s stored in the trash folder for the stefan-benten satellite. I don’t really want to reboot the node while it’s still processing these deletes. I have more than enough free space anyway to just let this play out. It’ll take a loooong time though. After 16 hours of processing, it’s now working on pieces starting with an H. It seems to be doing them in alphabetical order, except it’s starting with letters and then the numbers. So, it’s probably at around 20%…
It probably doesn’t help that an extended SMART scan was already running on the HDD’s in my array when this started. That’s almost done now. Hopefully it’ll speed up after that.

Okay, my command finished - 4.5 TiB in the blobs folder and almost ~680 GiB in the trash folder. 5 TiB allocated, 5.15 TiB consumed, SN overstepped allocated space by 0.15 TiB, a huge difference. Something’s wrong with the free space honoring calculations. Again.

And back to normal now:

               Available         Used      Egress     Ingress
 Bandwidth           N/A      55.9 GB     52.2 GB      3.7 GB (since Jun 1)
      Disk       44.8 MB     500.0 GB


               Available       Used       Egress      Ingress
 Bandwidth           N/A     0.7 TB     273.9 GB     453.1 GB (since Jun 1)
      Disk       16.0 GB     6.0 TB

Okay, the web GUI went to positive free space now, but used disk space is still 150 GiB over the limit and trash is the same size.

Definitely some strangeness. I seem to have manufactured 1TB of data out of thin air.

Same node, 11 days apart:

2020-06-19 23:44
               Available         Used       Egress     Ingress
 Bandwidth           N/A     412.3 GB     395.1 GB     17.3 GB (since Jun 1)
      Disk       -1.0 TB       9.0 TB

2020-06-08 19:56
               Available         Used       Egress     Ingress
 Bandwidth           N/A     182.6 GB     171.0 GB     11.6 GB (since Jun 1)
      Disk     -150.4 MB       8.0 TB
1 Like

v1.5.2 also started reporting wrong remaining allocated space a week ago without undergoing any version updates (or service/node restarts/reboots). appears to be a reporting-only issue in web and cli dashboards, as the node is not ingesting ant new data.

Yep, that seems to be a reporting error.

Or out of helium in case you’re using one of the newer drives! Have you checked manually how much space your node is actually using?

Helium indeed! As others have mentioned, it seems to be a reporting issue. But I thought it was a bit humorous. It was certainly a surprise.

1 Like

Bonjour,
J’ai un noeud qui me dit dans le journal : 2020-06-20T10:48:26.553Z WARN piecestore:monitor Used more space than allocated. Allocating space {“bytes”: 7000000000000}
mais il me reste encore de la place dans le disque

2 Likes

its just bad reporting stemming from the calculation of available disk space relative to the one week file retention in the trash folder. i had a bunch of trash that was auto deleted at the predetermined time, so all is good now.

It is not the trash that is miscalculated, I found that to be ok, it is the non-trash data that has misreported size.