Debugging space usage discrepancies

Toyoo · February 12, 2024, 11:45pm

I do not have this problem. It really sounds to me like a bunch of people who can’t debug their own problems complain loudly, whereas a silent majority just doesn’t have any problems. Maybe you shouldn’t be operating a node?

Alexey · February 13, 2024, 2:56am

So, your node is failed to finish any filewalker, and the reason is

this is mean that your disk is too slow to respond. How this disk is connected to this PC? Is it SMR? What’s filesystem on this drive?

The recommendations how to reduce the response time:

Stop the service
Check the disk for errors and fix them
Perform a defragmentation
Enable automatic defragmentation if it were disabled (it’s enabled by default)
Start the service
Monitor for errors related to the filewalker.
If you would still see errors with filewalkers, then disable the lazy mode and enable scan on the start if you disabled it, in your config.yaml:

save the config and restart the service. It will consume more IOPS than lazy, but should successful finish the scan. You need to keep it like this at least 2 weeks to allow your node to process two bloom filters for each satellite to move most of the garbage to the trash.
The trash will be cleaned automatically after a week.

Alexey · February 13, 2024, 3:18am

not many, only who did it wrong, I’m sorry. My nodes working normally.
At the moment these nodes are usually affected:

VM
FS: exFAT, zfs/BTRFS without caching device/enough RAM, network filesystems, NTFS under Linux
some RAID configurations with parity without proper tuning
running multiple nodes on the one disk/pool
Windows: disabled/never performed defragmentation for NTFS
FATAL errors in the log
failing disk (cable, power supply, bad blocks)

In all these cases the disabling a lazy mode usually helps (except FATAL errors or disk errors - they need to be fixed first).

github.com/storj/storj

[storagenode] Lazyfilewalker optimization

opened 12:39AM - 21 Jan 24 UTC

profclems

Needs Estimation

This is an ongoing discussion on the forum with great suggestions coming from SN…Os. For larger nodes, the lazyfilewalker might never complete for some of the satellites. It might keep running for weeks and then once the node is restarted (especially for docker nodes that are auto-updated and restarted) the Lazyfilewalker would have to start again. ## Possible Solutions ### Used Space Calculation - The lazyfilewalker should never start from the beginning after an interruption. It should periodically save its state and resume from the last one after restart (Ref: https://forum.storj.io/t/debugging-space-usage-discrepancies/24874/90?u=clement) - https://forum.storj.io/t/debugging-space-usage-discrepancies/24874/108?u=clement - https://forum.storj.io/t/debugging-space-usage-discrepancies/24874/106?u=clement TLDR; - the filewalker should be able to resume after being interrupted - we do this by taking advantage of the two-level directory structure - read the two-letter directory names from the satellite blobs folder and sort them alphabetically - Save the state after walking through each 2-letter directory - to reduce the mismatch for uploads/deletes during the file walker, for each upload/delete, check whether its two-letter directory has been scanned or not. If it was, update the used space cache PS: We don’t need the filewalker (used space calculation) to run on each restart. Maybe we can get it to only run once if it has never run or the used space db is inaccessible/malformed or empty. If we can guarantee that every other read/write operation updates the usage cache, we wouldn't need to rely on the filewalker ### Garbage Collection - GC filewalker should periodically save the state after walking through each two level directory

github.com/storj/storj

GC bloom filter should be stored on disk in case of node restart

opened 08:11AM - 25 Jan 24 UTC

mniewrzal

Needs Estimation

Currently lazy file walker will always start from the beginning on node start (s…ee #6708). Fixing used space calculation is one thing but we should also improve GC process. At the moment node is receiving bloom filter and its processing all satellite pieces against this bloom filter. The issue is that GC takes a lot of time and with node restart (updates, issues) we are loosing bloom filter and even if we start file walker from last position we won't have data to finish GC process. Acceptance Criteria: * store bloom filter on disk to be able to restart GC process from last position * bloom filter should be removed after successful GC

JWvdV · February 13, 2024, 7:36am

Indeed, I think the problem is that gc-filewalker is being interrupted by other processes (like updating the node).

Add SMR drives to the list, and you’re probably quite complete.

So, indeed:

saving the bloom filter to the disk
iterating the subfolders of every satellite in an ordered fashion and save last filtered folder so you don’t have to start from scratch every time (so, the node is able to finish the bloom filter before the next one arrives).

Besides, I’m wondering whether files are being deleted right away during the gc-filewalk run.

Alexey · February 13, 2024, 7:52am

SMR actually not an issue - they are slow on write, but not read, so they should be ok with a lazy filewalker (at least I cannot find an evidence of the issue with a SMR drive).

They can, but I cannot confirm that on my nodes, these filewalkers didn’t intersect (at least on my nodes), except maybe a used-space-filewalker.

JWvdV · February 13, 2024, 7:20pm

Well, I can tell you I only have this problem on nodes using a SMR drive.

Besides, deleting a file or moving out to trash also takes writing (and often also reordering) of meta data.

Alexey · February 14, 2024, 3:15am

I see. Then I would add a SMR disks to the list of suspicious to a space usage discrepancy.
And the filesystem is zfs, single drive and no special device?
I would like to know, is SMR affected too if it’s ext4 formatted.

Ruskiem · February 14, 2024, 4:08am

Yea bros, get going on this!
Story time!

A node of mine, 8TB disk, set for 7TB in config.yaml, and stuffed to the brim, got like 1,2TB free space (guess some natural deletes occurred), but the ingress didn’t start, i guess it was waiting for filewalker to finish walking, he finished after ~96h and discovered that free space finally and started to hoover some more data, hurray!
Some other node of mine, got 174h of uninterrupted online time, BUT it still was into the walking process, its 10TB disk, fully dedicated to STORJ, only ~320GB of free space left,
174h ago, i set it to 4TB to stop ingress, because dashboard thinked he has some 1TB or more of free space! but he did not!
So im waiting for him to finish the walk, it was just around the corner, and BAM, at 3:00 am an ISP restarted a router, and VPN needed to reconnect, and changed the open ports, so need to update the config.yaml, the storagenode process is still ongoing, still walking, but the port is new now, the one it was on, its closed now. Don’t wanna restart the storagenode.exe to get online, the IP is same, just wonder if it need the port opened to report data to satellites if he finish the walk or it will not report … lol. Im determined to be offline just to let it finish that walking, God bless it! Lol

Alexey · February 14, 2024, 4:33am

It will update databases and report on the next check-in.

snorkel · February 14, 2024, 5:16am

Can’t you fix those ports to be remembered on each reconnect? Maybe you can talk to the VPN provider to reserve the ports for you… I don’t know how this works, I don’t use VPNs,… if I sound dumb.

JWvdV · February 14, 2024, 7:03pm

For sure, and essentially I’m quite sure the walk just takes too long and is being interrupted by a update/restart of the system or another walk. Systems having less than 70% filled don’t have this particular problem in my case.

zip · February 16, 2024, 1:44pm

Beautiful

2024-02-16T10:25:04Z    INFO    lazyfilewalker... {... "bloomFilterSize": 4100003}

elek · August 22, 2024, 8:32am

It depends on the context, where are these fields. Sorry, I don’t remember exactly. But usually the software differentiate between the raw size of the original file, and the size of the encrypted data which is stored.

Usually there are only a few byte differences (The encryption includes a few additional bytes for message authentication. Think about it as a very lightweight checksum).

Ah, maybe it’s on the storagenode, where pieces are stored in separated files. The first few bytes contain a header which is included in total size, but not included in content size.

elek · August 22, 2024, 2:55pm

Payment is based on content_size (includes encryption overhead, doesn’t include size of piece header / other technical entries)