Disk usage discrepancy?

Yes, but it seems to me that the garbage collector filewalker, which I think scans the entirety of a satellite’s blobs to check for trash based on the bloom filter, could also simultaneously be used to compute the used space for that satellite.

I believe GC reads only the piece name, but the Filewalker reads more than just the piece name. But maybe I’m wrong.

They have a different purpose, gc-filewalker moves pieces to the trash, where the used-space-filewalker just calculates used space (similar to du command). They also can run in parallel, but the used-space-filewalker is usually executed only on start and it scans all pieces, not only pieces filtered by Bloom filter.
Later the node updates a used space cache in the local databases every hour by default and after all successful upload.
So, they are different processes anyway, I don’t think that’s a good idea to combine them.

@Alexey Hello, I have roll back to working databases.
I have deleted untrusted with :
ubuntu@hpool2:~$ docker exec -it storagenode3 ./storagenode forget-satellite --force 12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB 12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo --config-dir config --identity-dir identity
2024-02-11T18:21:29Z INFO Configuration loaded {“process”: “storagenode”, “Location”: “/app/config/config.yaml”}
2024-02-11T18:21:29Z INFO Anonymized tracing enabled {“process”: “storagenode”}
2024-02-11T18:21:29Z INFO Identity loaded. {“process”: “storagenode”, “Node ID”: “12KvCfrk8bo2FeFhoYfZKA7XxSB1LhVTGdGNwzLhLz6T3AdZo5x”}
2024-02-11T18:21:29Z WARN Satellite not found in satelliteDB cache. Forcing removal of satellite data. {“process”: “storagenode”, “satelliteID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”}
2024-02-11T18:21:29Z INFO Removing satellite from trust cache. {“process”: “storagenode”, “satelliteID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”}
2024-02-11T18:21:29Z INFO Cleaning up satellite data. {“process”: “storagenode”, “satelliteID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”}
2024-02-11T18:27:55Z INFO Cleaning up the trash. {“process”: “storagenode”, “satelliteID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”}
2024-02-11T18:30:04Z INFO Removing satellite info from reputation DB. {“process”: “storagenode”, “satelliteID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”}
2024-02-11T18:30:04Z INFO Removing satellite v0 pieces if any. {“process”: “storagenode”, “satelliteID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”}
2024-02-11T18:30:04Z INFO Removing satellite from satellites DB. {“process”: “storagenode”, “satelliteID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”}
2024-02-11T18:30:04Z WARN Satellite not found in satelliteDB cache. Forcing removal of satellite data. {“process”: “storagenode”, “satelliteID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”}
2024-02-11T18:30:04Z INFO Removing satellite from trust cache. {“process”: “storagenode”, “satelliteID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”}
2024-02-11T18:30:04Z INFO Cleaning up satellite data. {“process”: “storagenode”, “satelliteID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”}
2024-02-11T18:34:31Z INFO Cleaning up the trash. {“process”: “storagenode”, “satelliteID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”}
2024-02-11T18:35:55Z INFO Removing satellite info from reputation DB. {“process”: “storagenode”, “satelliteID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”}
2024-02-11T18:35:55Z INFO Removing satellite v0 pieces if any. {“process”: “storagenode”, “satelliteID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”}
2024-02-11T18:35:55Z INFO Removing satellite from satellites DB. {“process”: “storagenode”, “satelliteID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”}

but the space is still there

in some nodes I have see that I still have 6 directories inside blobs :
pi@mcanto:~ $ sudo ls -lhrt /STORJ/STORJ/storage/blobs
total 124K
drwx------ 1026 root root 20K Jan 18 2022 ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
drwx------ 1026 root root 20K Feb 5 2022 v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa
drwx------ 1026 root root 20K Feb 18 2022 qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa
drwx------ 1026 root root 20K Apr 23 2022 6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa
drwx------ 1026 root root 20K May 24 2022 pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa
drwx------ 1026 root root 20K Nov 2 2022 arej6usf33ki2kukzd5v6xgry2tdr56g45pp3aao6llsaaaaaaaa

Can I manually delete ?
How can I force space reclaim ?

Here I lost 4 TB !!!
Available Used Egress Ingress
Bandwidth N/A 0.87 TB 185.14 GB 0.68 TB (since Feb 1)
Disk 1.03 TB 8.97 TB
Internal 127.0.0.1:7778
External x-proxy2.ddns.net:28967
^C2024-02-11T18:44:14Z INFO Got a signal from the OS: “interrupt” {“process”: “storagenode”}
Error: context canceled
ubuntu@hpool2:~$ df -h /STORJ
Filesystem Size Used Avail Use% Mounted on
mcanto.ddns.net:/STORJ3/STORJ_NFS2 14T 13T 1.3T 91% /STORJ

filesystem is ext4

Hany suggestion ?
Thanks

It takes some time to run the deletes, give it a day or two.

yes, you can. Folders remained because you likely didn’t provide a --force flag. You can calculate size of the blobs subfolders of the decommissioned satellites:

Network filesystems are not supported and will work wrongly, the only working network protocol for storage is iSCSI. It’s better to run the node directly on your file server/NAS instead.

Still, my schedule task is set to run every sunday.
Also, filewalker runs when node starts, right?
I should be able to see anything there, no?

There are several filewalkers, each with own schedule

The used space filewalker (used-space-filewalker) is executed only after start, the garbage collector filewalker (gc-filewalker) runs 1-2 times per week and initiated by satellites (they sending a bloom filter), expiration filewalker (collector) runs every hour by default, retain runs weekly, trash scan (pieces:trash) runs every 24h.

2 Likes

Can you detail what every walker scans and reads?
Like GC scans the entire blobs and reads file names, etc.

used space filewalker scans all used space in the data location (blobs and likely trash), GC scans only blobs, retain scans only blobs, pieces:trash only trash

1 Like

Alexey’s information is a bit imprecise.

  1. Used space file walker scans trash and blobs directories of all satellites, reads all inodes to estimate disk usage. No writes. Happens only at node startup.
  2. Retain or garbage collector or bloom filter (multiple names for the same thing): scans the blobs subdirectory of a single satellite. Reads file names. For those files that match the bloom filter used in a given scan, reads inodes to check their modification timestamp (filter is supposed to act only of files created before a specific date), then potentially moves the file to the trash subdirectory. On average, unless there’s a lot of deletes happening, only a small number of inodes are read. Happens when a satellite sends a bloom filter, roughly once per week per satellite.
  3. Trash file walker scans trash directories of all satellites. Reads all inodes to read their modification timestamp (which signifies the time a piece was moved to trash), and potentially removes files. Happens once a day.

What Alexey cites as expired pieces is not actually a file walker. Expiration dates are stored in a SQLite database, and to learn which files to remove, it’s enough to just perform a SQL query.

There used to be a fourth file walker, graceful exit. But it is no longer in use, and obviously it was not used during regular node operation.

3 Likes

The expiration date is stored in the piece’s header, and cached in the database.
If the database is lost, this information should be recovered. I believe it’s performed by a retain process.

Hello @Alexey ,
I wait few days before replying because I tried to untrust old satellite.
All nodes now have only 4 directories.
The space issue is critical only on one node :
14T total 13T used but in dasboard I have only 9T, so 4 T lost
I understand your point about NFS but I have 4 with NFS and only this one has issues.

Is there a way to force pieces check , deletion if no more needed or whatever ?

Thank you

Yes, @mcanto73 You can set “storage2.piece-scan-on-startup: true” in config.yaml if its Windows GUI, and restart the node, and wait few days for process to be finished, so the node updates the used space. You may also want to set available space temporarily to for example: 4TB (“storage.allocated-disk-space: 4 TB”), to not get any new ingress until Your filewalker scan process will be completed, and it will update the dashboard with correct numbers of used disk space.

1 Like

Hello,
I’m running with docker containers , do you know if I just need to change the parameter in the config.yaml file or I need to pass something to docker command line maybe during the creation?

Said this two question

  1. I thought that the scan happens during each restart …
  2. You suggested that it’s better to stop incoming traffic. Is that mandatory ?

Thanks
:blush:

Check my tests in Tuning the filewalker thread. Also, if you want to put the allocated space in config, let the run command without a value between quotes.

It isn’t, and it would be very costly. Opening each file just to check if it has an expiration timestamp would double the I/O needed to run GC.

Then how this information is recovered, if the database got deleted?

If you didn’t disable the scan on startup, it should be enabled by default.
However since you use unsupported slow network filesystem, in lazy mode it could take weeks to finish the scan, so I would recommend to disable the lazy mode:

You may also add it as a command line flag in your docker run command after the image name instead: --pieces.enable-lazy-filewalker=false