Disk usage discrepancy?

Alexey · January 18, 2024, 3:58am

No. They will affect each other and you will have even higher discrepancy, but now because of concurrency of multiple nodes for the single resource - your disk.
The fix likely would be implemented before you fill 22TB.

Alexey · January 18, 2024, 4:02am

With disabled lazy filewalker? How many weeks?

Alexey · January 18, 2024, 4:09am

No, the forget-satellite with a --force flag will delete data right away. If you skipped the --force flag, data may remain and will not be removed and you likely would need to remove it manually, see Satellite info (Address, ID, Blobs folder, Hex) for folder names of the satellites in the blobs folder.

If you used a --force flag, but still has a discrepancy, then I suggest to follow this recommendation:

and make sure that the filewalker on start is not disabled (it’s enabled by default).

Ottetal · January 18, 2024, 7:19am

Hiya friend. I’ve never made it on location - so without disabled lazy file walker, but for 9 days. It happened the night after v1.94.2, so it must have run during reset.

Ottetal · January 18, 2024, 2:32pm

Update on the same node:
It how has additional 200GB worth of stuff to delete. Truly exiting

HGPlays · January 19, 2024, 10:01am

I did run with --force:

Yet i still have the same issue:

daki82 · January 19, 2024, 9:02pm

it needs some time to delete. let the window open until there are only 4 blob folders left?

Alexey · January 20, 2024, 1:17am

Accordingly the sign >>, your command either have a missing quote somewhere or you used not straight ones " but curly ones (“ and ”) or you pressed a Shift-Enter. Because such sign meaning that the command is waiting for input

daki82 · January 20, 2024, 6:49am

Refer to this windows sample, move to the storagenode dir at first with the cd command

Windows7ge · January 20, 2024, 3:06pm

My node has recently started going offline without warning. I’ve started digging into it and see my ZFS array has been running a scrub for several days which might be the problem.

While observing that though ZFS reports that my 40TB raidz1 (25.6TiB usable) is all out of space. But in Storj Docker I only allotted it 24TB. And in addition to that Storj reports that it’s only using 21.69TB in the WebUI.

I remember reading we want to keep something like 10% free space. So I’m confused. The scrub could be causing Storj to hang, maybe the array being full is causing it to hang. But then why is the array full? I don’t keep anything else on it. It’s dedicated storage.

As a temporary fix I could expand the array another 40TB and observe the behavior but I’m just confused.

Alexey · January 20, 2024, 4:35pm

you may reduce the allocation below the usage and disable a lazy filewalker to give it a normal priority. Perhaps it would be able to finish its work before crash/restart.

Windows7ge · January 20, 2024, 4:57pm

I don’t understand most of what you said. If I told Storj not to use more than 24TB, it reports it’s using 21.69TB, but the file system reports it’s using 25.6TiB I have no way to measure how low I need to set Storj to get it back under control. 20TB? 21TB? 22TB?

Based on what I’ve read even doing this will just prevent it from taking in more data. The only way data will shrink on the node is if the network removed data no longer needed which will taken an indeterminable amount of time. Meanwhile will that stop the crashes? I have to do manual restarts every time I get an e-mail saying my node is offline.

And what is a filewalker?

Windows7ge · January 20, 2024, 7:41pm

I just picked an arbitrary value of 20TB down frim 24TB and I’ll monitor the free space on the array. 6.6GB of free space which I’d expect to see maybe in the few 100’s of GB.

After the update the WebUI now reports overused space.

Screenshot from 2024-01-20 14-38-47

I’ll monitor if this slowly starts to shrink over the coming weeks and hope this plus after my ZFS scrub finishes that the crashes stop.

Alexey · January 21, 2024, 3:54am

doesn’t matter, but it should be less or equal of used space (accordingly Average Disk Used Space on the left graph), this will prevent the node to receive more ingress, freeing up some IOPS to the filewalkers.

We call “filewalker” the process of scanning pieces for different purposes:

scan to calculate a used space;
scan to move a garbage to the trash;
scan to delete expired pieces;
scan to delete pieces from the trash.

HGPlays · January 22, 2024, 10:54pm

i gave it another shot and it lookes better - but i still have the same issue with alot of used space i cant get rid of.

Alexey · January 23, 2024, 3:13am

Now your node should finish several filewalkers without interruption from each remained satellites.

HGPlays · January 23, 2024, 8:50am

Thanks - i will keep an eye on it

mcanto73 · January 31, 2024, 6:20pm

Hello,
I have 5 nodes running from almost 2 years in 5 linux servers and I noticed that what reported by “df” was very different compared the value I have in dashboard / graphana tools. The difference was more than 4 TB in some cases.
I saw a fix and I tried to apply :
<<
Stop the storagenode
Rename the piece_spaced_used.db
Execute with sqlite3:
sqlite3 F:\StorjShareV3\piece_spaced_used.db
When you see a sqlite> prompt, execute script:

CREATE TABLE versions (version int, commited_at text);
CREATE TABLE piece_space_used (
total INTEGER NOT NULL DEFAULT 0,
content_size INTEGER NOT NULL,
satellite_id BLOB
);
CREATE UNIQUE INDEX idx_piece_space_used_satellite_id ON piece_space_used(satellite_id);
.exit
Start the storagenode

The problem is that now , it’s running since a lot of hours , I have very low values in disk space, for instance :

ubuntu@hpool2:~$ df -h | grep STORJ
mcanto.ddns.net:/STORJ3/STORJ_NFS2 14T 12T 1.8T 87% /STORJ

before the fix dashboard reported 9 TB

Now :
Uptime 8h25m35s

               Available         Used        Egress     Ingress
 Bandwidth           N/A      2.47 TB     531.55 GB     1.94 TB (since Jan 1)
      Disk       9.98 TB     16.63 GB

Any idea ?
Could I put back the old piece_spaced_used.db file also if it’s OLD (less than 1 day)?

Best regards

Alexey · February 1, 2024, 4:20am

NFS is not supported
The discrepancy is usually related to a failed filewalker

You need to search in your logs errors related to walk, retain and FATAL
The possible workaround could be to run a filewalker with a normal priority by disabling the lazy one with these parameters in your config.yaml:

save the config and restart the node.

naxbc · February 2, 2024, 6:02pm

Hi @Alexey just added “pieces.enable-lazy-filewalker: false” as I´m having the same issue reported:

Any more advice?
How long does it take to reflect on Dashboard?