Overused Space is excessive

Chris21788 · August 22, 2024, 12:19am

Hey everyone,

I’ve been tracking my disk usage for some time, and have just recently noticed that one of my nodes is currently at 122.8% disk usage, which equates to being overused by 2.28TB!

Now I can understand some MB, or even a few 10s of GBs, but 2.28TB is quite excessive, and has filled up the buffer I have on the node so it never gets 100% full.

I haven’t changed my configuration for quite some time, and it’s always been 10TB max allocation, but this is a first for me. Some quick browsing reveals that small amounts are fine, but 2.28TB is quite the “over-use”. Not only is it taking up more disk than I’d like, I’m not even sure if I’m getting paid for that disk usage.

Thoughts on where I can find this and prevent this from ever happening again?

Running 1.110.3, on a RPI4 via Portainer.

Alexey · August 22, 2024, 4:24am

Does it match the usage reported by the OS (you need to use SI measure units to compare)?

Chris21788 · August 22, 2024, 2:25pm

My disks are 14TB disks, formatting, resulting in 12.7TB free.

user@raspberrypi:~ $ df
Filesystem       1K-blocks        Used Available Use% Mounted on
...
/dev/sdb1      13563485152 12462425408 417424340  97% /mnt/csrpistorj1

and

user@raspberrypi:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
...
/dev/sdb1        13T   12T  399G  97% /mnt/csrpistorj1

12462425408/1024/1024/1024 = 11.6TB, so it’s not 1:1, but I’m not sure what that has to do with the overused disk. Regardless, whether the OS is seeing 10TB, or the storj application is seeing 10TB, it should halt and future ingress.

Let me know if you need anything else.

Alexey · August 23, 2024, 3:08am

You need to use --si instead of -h if you want to compare with the dashboard. The Storj software uses SI measure units (as a disk manufacturers), not binary.
12462425408 = 12.46TB
So, looks like the dashboard shows the used space pretty close to what’s reported by the OS. The average used space, if you would peek the last fully reported day likely has something about 7TB of the confirmed used space. So, the difference is about 5.46TB of uncollected garbage (see When will "Uncollected Garbage" be deleted?).

The overused itself is happened likely when you reduced the allocation to 10TB instead of previous likely 12TB, am I correct?
If not, and you are always have had the 10TB as allocated, then I guess that used space reported by the node was way off until the node is restarted and the used-space-filewalker has been calculated the used space and updated the databases, so now the node is aware that it has an overused disk space.

When the database is not matching the actual usage on the disk, the node wouldn’t report to the satellites that it’s full. Thus it can accept traffic and store more than was allocated.
We have a precaution mechanism though - when the node detect that it has less than 5GB on the disk or in the allocation, it will notify the satellites that it’s full and all ingress would stop independently of the databases state.
The only way to correct the databases state is let it finish the used-space-filewalker. It’s started on every restart.

Chris21788 · August 25, 2024, 12:49am

Hi @Alexey , so the command doesn’t really show much different (Since it’s still a “human-readable” format):

user@raspberrypi:~ $ df --si
Filesystem      Size  Used Avail Use% Mounted on
..
/dev/sdb1        14T   13T  563G  96% /mnt/csrpistorj1

As for the allocation of my node, I never reduced it. It’s always been 10TB, similar to my other node in a similar setup. (They literally share the same config, except for ports and ddns entry.)

Regarding the used-space-filewalker, I’m guessing it’ll take a while, but I’m surprised that the dashboard is reporting the over-used space… isn’t that pulling from the database? If that’s the case, then the database is already up-to-date.

Alexey · August 25, 2024, 3:31am

Yes, it is. What I’m tried to say: some time ago the databases were not updated with the actual usage and node reported to the satellites, that it’s still has a free space in the allocation, where it’s not. And when the node was restarted some time ago, the used-space-filewalker is successfully calculated the actual used space and updated the databases. And now it shows the overusage, which was not visible to the node before that.