Disk usage discrepancy?

Alexey · October 19, 2024, 9:27am

Yes, as explained before, you need to enable a scan on startup if you disabled it (it’s enabled by default) and restart the node. Then you need to check that all used-space-filewalkers are finished without issues for all trusted satellites. You also need to remove the data of all untrusted satellites.

apitsos · October 19, 2024, 9:52am

Hey @Alexey,

This is not the same. I found this thread as well, but this thread was about a difference between the Disk usage shown on the dashboard, between the left (Average) and the right window (Total). At list on the initial level.

My problem is completely different and we are talking about a huge difference. According to the dashboard a total of 8.27 TB (Used) + 3.86 TB (Trash) = 12.13 TB should be consumed. As you can see from the next screenshot, the amount of storage consumed in blobs is 24.1 TB. We are talking for the double amount of storage (~12 TB more).

And nevertheless, this thread has 1100 posts. It’s extremely difficult to figure out what exactly should I do. Closing the thread I opened was really unexpected and a bit frustrating, as I thought you offer a nice level of support here.

Finally, I would really appreciate if you could specify what exactly should I do in order to fix this huge problem of 12TB of useless storage occupation.

With regards,
Angelos Pitsos

Alexey · October 19, 2024, 10:26am

No. This current thread about a discrepancy between a piechart and the usage reported by the OS.
The difference between an Average and Used is a different thread, this one: Difference between average disk space and used space (and it cannot be fixed on your side by the way).

The current one is the exact result of not updated databases in your setup. To fix that you need to enable the scan on startup and restart the node, then wait until all used-space-filewalkers will successfully finish their scans for all trusted satellites, then they should update the databases - again without an issues. After a hour the stat on the piechart will be very close to what’s reported by the OS (you need to use the SI measure units, i.e. --si option when you calculate the space, either used or free).

alpharabbit · October 19, 2024, 1:30pm

A RAID5 array of that size on windows will likely take forever to finish. This is not an appropriate setup in my opinion.

Alexey · October 20, 2024, 5:47am

Well, it’s already exist and I guess is used not only by Storj. So, they could help their system only by adding RAM, if that’s a classic RAID5 made not with Windows Storage Spaces.
Otherwise they may have an option to add an SSD and configure the layered setup.

LxdrJ · October 20, 2024, 9:05am

Yesterday I performed a reboot have the docker container on ubuntu. But somehow there are some GBs missing

Alexey · October 20, 2024, 10:35am

The scan on startup should fix the difference. However, it’s likely a result of the not flushed cache (the timeout for restart was too short). The used space is flushed to the databases once a hour by default.

Make sure, that you have configured the grace timeout to 300s for docker nodes. You likely need to add --stop-timeout 300 to your docker run command before the image name.

LxdrJ · October 21, 2024, 8:15am

Yes I have that from standard config. Now I have 300gb more ingress then used space / deleted, is that normal?

Alexey · October 22, 2024, 7:43am

Does your databases are updated?
You may check for filewalker/database errors:

docker logs storagenode 2>&1 | grep error | grep -E "filewalker|database" | tail

If you have one - you need to fix it.

By the way - the ingress is not related to the usage directly. The upload can be canceled during the way, you also have deletions of the TTL expired data and/or deletions from the trash, also the garbage collector would move the deleted data to the trash. Thus, bandwidth usage almost never matches the growth of used space.

Just make sure that the usage reported by the OS matches the usage on the piechart. If not, you need to have a successfuylly finished used-space-filewalker for all trusted satellites.

apitsos · October 24, 2024, 2:56pm

Hi @Alexey,

could you please enlighten me how I can activate the scan at startup, as you said. I went above in the previous posts, but got lost. I don’t know how to do it.

With regards,
Angelos Pitsos

Alexey · October 25, 2024, 3:17am

The scan on startup is enabled by default. So, if you disabled it, you already know how to enable it back
This is a command line option:

      --storage2.piece-scan-on-startup                           if set to true, all pieces disk usage is recalculated on startup (default true)

or as a parameter storage2.piece-scan-on-startup: true in your config.yaml. If you changed it, you need to save the config (or stop and remove the container in case of docker) and restart the node (or run a new container in case of docker).
But if you didn’t disable it then no changes are required, just restart the node and track the progress of the used-space-filewalker and make sure that it’s not failed.
See

LxdrJ · October 25, 2024, 5:44am

Hi Alexey didn’t find anything in the logs. But next time after restart I will double check. Somehow there is some discrepancy between file system and dashboard. Hope this filewalker will help.

Alexey · October 25, 2024, 6:37am

Yes, it should, if it’s not failed.

snorkel · October 25, 2024, 6:56am

We all have this in docker run, but the db cache is not always flushed to the db-es. Sometimes some caches remain unflushed for some db-es. I couldn’t find a pattern.
I believe Docker dosen’t check if the cache was flushed, and kills the node anyway.

arrogantrabbit · October 25, 2024, 5:52pm

No, it’s not dockers fault. At least, not entirely.

There is no problem with use of SQLite located on the mounted volume in Linux. It works correctly, because locking is handled by the filesystem in the same kernel, and it does not matter if process dies prematurely, filesystem “does the right thing”.

This is not the case on windows and macOS where there is VM, and as a result, separate kernel, between the host filesystem and the pretend one that SQLite is seeing — you are effectively using SQLite with data files over mounted remote filesystem and this is one of the recipes of how to corrupt it.

snorkel · October 26, 2024, 5:40am

I believe you are right. When I was keeping the db-es on the storage drive, I never saw cache file leftovers after stopping the nodes.
This started after moving them on USB SSD or system m2 drives, depending on machine.
But this can be easily proven in practice. I’ll come back with results.
BTW I use Linux+Docker, with ext4 fs.

Alexey · October 26, 2024, 6:02am

usually it happens, if this timeout is not enough and the node has been forcibly terminated by docker. However, I may be wrong and we have a bug in the db flush code (it flushes changes to the disk every hour by default, as far as I understand it’s true only for used-space and bandwidth usage databases).

But it also could be the case, described by @arrogantrabbit when the mount is using the network protocol for the storage (Windows and Mac).

this is interesting, I wouldn’t expect this to happen, because it’s mounted without using of a network protocol.

kmserg1 · October 26, 2024, 11:12am

Hi.
I Noticed that Trash on my Dashboard has started to grow and it is now 2.38 TB. That’s more than useful data.

At the same time the actual size Trash directory on the disk is 490 Gb.

It looks like the program can’t count the size of the Trash correctly. Because of this I can’t use my disk to its full potential.
Please tell me how to fix it.

I’m using small comuter.
CPU: Intel(R) Core™ i7-5500U CPU @ 2.40GHz 2.40 GHz
RAM: 8 Gb
System disk: SSD
Storage Disk: 5Tb, external, USB 3, SATA
OS: Windows 10

snorkel · October 26, 2024, 1:09pm

Stopped all my nodes and didn’t saw the cache files anymore. Maybe it’s related to slow network activity or other code impruvements; or maybe it’s because I reduced the sync time to 15 min, instead of 1h. I don’t realy know, but I moved all db-es back to storage drive, and removed the USB sticks, just to be sure I won’t see undeleted cache files next time I stop a node. And with all the code changes in the last months, the db activity has beed reduced significantly. I don’t see any benefits of moving db-es to separate drive now.

EasyRhino · October 26, 2024, 5:03pm

when disk size mismatches the usual fix is to restart the node and let it restart the used space filewalker, then reporting is accurate.

with that amount of data it could take more than a day unless the drives are fast.