Disk usage discrepancy?

daki82 · January 11, 2024, 12:20pm

Got it, but its only matching at the “end of the day” (midnight?) as its the aktual one jet to calculate via filewalk, even at my fast node, it is sometime 0 in the morning.

HotDrive · January 11, 2024, 6:02pm

Hi all,

Can someone explain why my node resets used space everytime it restarts? I’ve read a lot in here related to Linux machines, but I’m on Windows and can’t relate.
Basically, everytime my node restarts end up like this:

Although, before reseting, was like this:

Windows disk usage tells a very different story:
Nova imagem

Any help?

snorkel · January 11, 2024, 8:21pm

I didn’t stoped lazy FW, because I wait for 2 nodes to update, than I will stop them, modify one, and start them just to see the difference in FW running time, because they have the same space occupied.

Roberto · January 12, 2024, 9:14am

Maybe it’s the filewalker starting automatically

HotDrive · January 12, 2024, 1:26pm

After some digging, were is what I found:

Everytime the node restarts, the lazyfilewalker process is run, from every satellite.
In my case, satellite 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE was the first.
2024-01-11T19:12:23Z INFO lazyfilewalker.used-space-filewalker starting subprocess
2024-01-11T19:12:23Z INFO lazyfilewalker.used-space-filewalker subprocess started
2024-01-11T19:12:24Z INFO lazyfilewalker.used-space-filewalker.subprocess Database started
2024-01-11T19:12:24Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started
2024-01-11T19:37:41Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker completed
2024-01-11T19:37:41Z INFO lazyfilewalker.used-space-filewalker subprocess finished successfully

Then, it was time for satellite 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6

2024-01-11T19:37:41Z INFO lazyfilewalker.used-space-filewalker starting subprocess
2024-01-11T19:37:41Z INFO lazyfilewalker.used-space-filewalker subprocess started
2024-01-11T19:38:25Z INFO lazyfilewalker.used-space-filewalker.subprocess Database started
2024-01-11T19:38:25Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started

…and that is it… (it has been over 18 hours).

Does this mean that, for dashboard info to be correct, all 4 satellites must run this process? And if the node gets restarted, for whatever reason, it all starts from square 1?

HotDrive · January 12, 2024, 5:18pm

Ok, after some time I got this from satellite 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 gave me this:

2024-01-12T16:14:10Z INFO retain Prepared to run a Retain request. {“Created Before”: “2024-01-08T17:59:59Z”, “Filter Size”: 117305, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}

Then:

2024-01-12T16:14:10Z INFO lazyfilewalker.gc-filewalker starting subprocess
2024-01-12T16:14:10Z INFO lazyfilewalker.gc-filewalker subprocess started
2024-01-12T16:15:26Z INFO lazyfilewalker.gc-filewalker.subprocess Database started

peter_linder · January 12, 2024, 9:38pm

The node that I’m investigating is indeed a VM, but the storage system should be well fast enough with <1ms latency.

I hear that there is possibly a bug with bloom filter cleanup on large nodes (>14M pieces). This node has 31M pieces for the us satellite and may be affected. The bloom filter filling up for large nodes (and thus becoming ineffective by allowing all pieces to remain) makes sense.

Lets just hang on and more info will come.

snorkel · January 12, 2024, 10:43pm

There are a few threads on the same topic. They should be merged maybe?

snorkel · January 12, 2024, 10:47pm

Maybe stop the lazy FW? It will run faster and sync the space used faster.

Alexey · January 13, 2024, 1:49am

I already merging each new topic to here.

elek · January 13, 2024, 8:36am

See: Debugging space usage discrepancies - #46 by elek

If you are willing to debug, you can also send me the list of 31M blob files with sizes (huge), and I can double check if the problem is with the limit on the bloom filter. (because I can compare it with the database)

peter_linder · January 13, 2024, 3:24pm

Absolutely, I will prepare the file. Do you also need node id?

peter_linder · January 13, 2024, 6:58pm

You may fetch the file from here:

http://94.127.38.118/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa.txt.gz

Kleysley · January 14, 2024, 8:31am

I recently got an automated email that one of my 2 storagenodes was offline. When I checked I noticed that the disk (10TB; 9TiB) was completely filled up. Got it up and running again (by deleting some log files) but the dashboard says it uses 6.5TB. Somehow, there are multiple TB of data more in the storage directory than Storj says there should be.

The only thing installed on the machine is docker and Storj’s software, nothing else. I also confirmed that the storagenode data folder is taking up around 8.8TiB even though the dashboard shows around 6.5TB used.

I have already tried deleting stuff inside the garbage folder and the trash folder, with no luck.

Is there any way to try to fix this?

unrealSpeedy · January 14, 2024, 8:39am

I have the same problem, no solution so far.
I have in mind, we missed some bloom filters and/or we have Data left from the decomissioned Satellites.

Kleysley · January 14, 2024, 8:59am

Do you know if there is anybody working on this? Or if people are generally aware of this issue?

unrealSpeedy · January 14, 2024, 9:13am

If you have maybe data legt from decomissioned satellites you can try this: How To Forget Untrusted Satellites

Kleysley · January 14, 2024, 11:09am

Unfortunately, that only freed up like 20GB, the problem still persists…

Is there anything about bloom filters that we could try? (I’m not exactly sure what they are)

daki82 · January 14, 2024, 1:06pm

Assuming you have a small clustersize and the right filesystem, logs redirected, defragmentation checked at least. temp folder empty or just some new files in it.

nothing to do exept

. rn its in the analysation phase.

snorkel · January 14, 2024, 3:10pm

Nobody knows what they are, but they sound realy cool…