Storage space usage going far beyond limit

Hello,

I have allocated 6.1 TB on a (net) 7.2 TB disk, yet currently there is less than 170 GB free space left. Apart from the falsely reported space use in the web UI (I am aware of this apparent issue from other posts) still showing almost 2 TB “free” and only 4 TB used, soon the physical space will run out on the disk. Instead of allocating even less space to the node, is there anything I can do to make sure the drive does not become full and potentially corrupt data? Can all the data be forcefully re-checked?

The free space on the drive was at around 500 GB for the past months, and as such I though that would still be in the overhead limit and that only the web UI reporting issue exists. However, the node started to dowload more data recently. My node is on Windows docker, running fine for a long time now, apart from the above serious space overrun issue. There are no apparent errors in the log. I also do not see any space being recovered from garbage collection or anything over the course of the past few weeks.

Edit: Clarification, Windows docker.

For now you can lower the amount of the storage to a value low enough, so that you won’t receive new data.

Are you using the disk for anything else? Verify the size of the folder storage, see if is off from your web dashboard or CLI dashboard. You might also check if the disk has system restore enabled, perhaps something is using the disk without you knowing.

The disk is not being used for anything else, there is only the storj folder on it. (I am a bit reluctant to check the storage folder size manually, simply due to the sheer number of files stored. However, as the drive is used solely for the storagenode, the windows-reported space should be “as is”.)
I double-checked that pagefiling, system restore and file versioning are all off.

Restarting the node will make it walk all the pieces to see how much space is used. But I’m sure you did that at some point already.
It might be worth checking there is no other stuff on the HDD anyway. Could be a trash folder, hibernation data, page file, even windows update temporary files. And most of that stuff would be in hidden folders. So make sure you display hidden and system files and folders to check.

Most importantly though, make sure your disk doesn’t fill up completely as that could prevent the node from writing to the db files which could lead to fatal issues. So until the issue is fixed, please lower the shared amount first.

Isn’t there a reliable way to prevent such condition? Obviously by relying on the config setting, there is a threat of overallocations which I believe should be prevented by the node itself if it can lead to such fatal outcome. Maybe at some interval the node should query the OS how much space is left on the data partition and adjust itself accordingly. I would like to see more fault tolerance or self healing like this for nodes to meet the run and forget approach.

1 Like

just keep feeding it more drives so it never runs out of space… xD

problem solved heh your welcome

There really isn’t anything else on the disk, I already purged everything a while back to avoid any space issues. All hidden files/folders are always shown and were removed.

I did some node restarts, but I doubt that all data is re-checked against a database on startup, I would expect that to take several hours on some TB of data.

For now I did what I did not want to do originally and dropped the space allocation to 4.25 TB (on an “8TB” drive). This is the value which was shown in the web UI as “used space” so I am hoping that this will stop the flow of new data.

I am quite sure that I have some leftover data from missed garbage collection or something, so it would be useful to have an option to force re-check stored data and scrub everything which is no longer in the database.

I suggested the same thing in another topic where this was being discussed. Littleskunk seemed to agree, so I think this is now being considered.

It seems like it’s time to actually verify that. We’re kind of shooting in the dark right now.

Hello @leHans,
Welcome to the forum!

Please, check your disk with scandisk too.

Hello and thanks for the tip.

I shut down the node, did a check and there was a “We found errors with this disk” message in Windows. I ran a disk check, it completed in a few mintes and reported that all errors were fixed.
I then restarted the node with the same 4.25 TB limit and giving it some time now.

On another note, I read multiple times that “don’t worry about the displayed space usage reported in the UI, the payout is calculated based on the actual data stored”. However, surely if storj thinks (I suppose based on the db) that it is only storing ~4 TB of data - despite an 8TB drive being full - that is then used in the payout calculation as well. I do not suppose that I get anything for data stored above the ~4 TB as it seems storj is unaware that more space is being used.

(Not trying to get rich quick here or anything, just trying to understand and provide feedback at the same time.)

To my knowledge you still haven’t shown it’s actually storj using all this space. How large is your blobs folder?

Apologies, it took quite a while to get a calculated space usage.
blob_size_!
The space is used by the blobs folder.

However, since the “size on disk” value is almost double, it seems to confirm my fear that this has to do with the block size of the drive. I do not know how this slipped me, but the drive is in exFAT format with a (presumably default) block size of 1’048’576, that is 1 MiB. With file shards commonly around ~2.26 MiB this leaves at least some waste in each block.

I do not suppose that a defrag will do much in this case. Unless someone has another idea, I will look into weather I can get another drive, set up smaller block sizes and move over the data.

Seems like an external disk. I believe you want to change the filesystem to NTFS, it will not use such amount of space.

Is the overhead of exFAT that much more than NTFS or is it rather the block size the issue here?

There are several issues:

  • the exFAT is designed to overcome the limitations of legacy FAT filesystem, however it coming with a high cost - the block size is big.
  • it doesn’t have a journal, so in a long run there is a higher chance to lose the data.

You can read more there:

And yes, the block size is an issue in this exact case, but I would take into consideration the durability too. It’s more important to Storage Node Operator though, since the network have an additional 79 pieces (except yours), when only 29 is needed to reconstruct the file.

1 Like