Disk usage discrepancy?

Got it, but its only matching at the “end of the day” (midnight?) as its the aktual one jet to calculate via filewalk, even at my fast node, it is sometime 0 in the morning.

Hi all,

Can someone explain why my node resets used space everytime it restarts? I’ve read a lot in here related to Linux machines, but I’m on Windows and can’t relate.
Basically, everytime my node restarts end up like this:


Although, before reseting, was like this:

Windows disk usage tells a very different story:
Nova imagem
Any help?

I didn’t stoped lazy FW, because I wait for 2 nodes to update, than I will stop them, modify one, and start them just to see the difference in FW running time, because they have the same space occupied.

2 Likes

Maybe it’s the filewalker starting automatically

After some digging, were is what I found:

  • Everytime the node restarts, the lazyfilewalker process is run, from every satellite.
    In my case, satellite 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE was the first.
  • 2024-01-11T19:12:23Z INFO lazyfilewalker.used-space-filewalker starting subprocess
  • 2024-01-11T19:12:23Z INFO lazyfilewalker.used-space-filewalker subprocess started
  • 2024-01-11T19:12:24Z INFO lazyfilewalker.used-space-filewalker.subprocess Database started
  • 2024-01-11T19:12:24Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started
  • 2024-01-11T19:37:41Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker completed
  • 2024-01-11T19:37:41Z INFO lazyfilewalker.used-space-filewalker subprocess finished successfully

Then, it was time for satellite 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6

  • 2024-01-11T19:37:41Z INFO lazyfilewalker.used-space-filewalker starting subprocess
  • 2024-01-11T19:37:41Z INFO lazyfilewalker.used-space-filewalker subprocess started
  • 2024-01-11T19:38:25Z INFO lazyfilewalker.used-space-filewalker.subprocess Database started
  • 2024-01-11T19:38:25Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started

…and that is it… (it has been over 18 hours).

Does this mean that, for dashboard info to be correct, all 4 satellites must run this process? And if the node gets restarted, for whatever reason, it all starts from square 1?

Ok, after some time I got this from satellite 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 gave me this:

  • 2024-01-12T16:14:10Z INFO retain Prepared to run a Retain request. {“Created Before”: “2024-01-08T17:59:59Z”, “Filter Size”: 117305, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}

Then:

  • 2024-01-12T16:14:10Z INFO lazyfilewalker.gc-filewalker starting subprocess
  • 2024-01-12T16:14:10Z INFO lazyfilewalker.gc-filewalker subprocess started
  • 2024-01-12T16:15:26Z INFO lazyfilewalker.gc-filewalker.subprocess Database started

The node that I’m investigating is indeed a VM, but the storage system should be well fast enough with <1ms latency.

I hear that there is possibly a bug with bloom filter cleanup on large nodes (>14M pieces). This node has 31M pieces for the us satellite and may be affected. The bloom filter filling up for large nodes (and thus becoming ineffective by allowing all pieces to remain) makes sense.

Lets just hang on and more info will come.

2 Likes

There are a few threads on the same topic. They should be merged maybe?

Maybe stop the lazy FW? It will run faster and sync the space used faster.

I already merging each new topic to here.

1 Like

See: Debugging space usage discrepancies - #46 by elek

If you are willing to debug, you can also send me the list of 31M blob files with sizes (huge), and I can double check if the problem is with the limit on the bloom filter. (because I can compare it with the database)

1 Like

Absolutely, I will prepare the file. Do you also need node id?

You may fetch the file from here:

http://94.127.38.118/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa.txt.gz

I recently got an automated email that one of my 2 storagenodes was offline. When I checked I noticed that the disk (10TB; 9TiB) was completely filled up. Got it up and running again (by deleting some log files) but the dashboard says it uses 6.5TB. Somehow, there are multiple TB of data more in the storage directory than Storj says there should be.

The only thing installed on the machine is docker and Storj’s software, nothing else. I also confirmed that the storagenode data folder is taking up around 8.8TiB even though the dashboard shows around 6.5TB used.

I have already tried deleting stuff inside the garbage folder and the trash folder, with no luck.

Is there any way to try to fix this?

1 Like

I have the same problem, no solution so far.
I have in mind, we missed some bloom filters and/or we have Data left from the decomissioned Satellites.

1 Like

Do you know if there is anybody working on this? Or if people are generally aware of this issue?

If you have maybe data legt from decomissioned satellites you can try this: How To Forget Untrusted Satellites

Unfortunately, that only freed up like 20GB, the problem still persists…

Is there anything about bloom filters that we could try? (I’m not exactly sure what they are)

Assuming you have a small clustersize and the right filesystem, logs redirected, defragmentation checked at least. temp folder empty or just some new files in it.

nothing to do exept

. rn its in the analysation phase.

Nobody knows what they are, but they sound realy cool… :nerd_face::man_dancing:t2:

1 Like