High Trash usage - 27 TB - no more uploads

hello
There should be no risk

  1. the working path is the Trash folder
  2. the output of ls are not the files to be deleted, created by customers, but the satellite sub-dir or something like that

Thank

I’m not sure what is the objection here.

It does not matter if this happens to work in this narrow scenario.

There is no reason to use and publish more verbose, wrong, known broken, dangerous code instead of clean one that is also shorter to type. Literally no advantage.

  • $(ls) — 5 characters, wrong, dangerous, slow
  • * — one character, correct, safe, fast.

It almost feels malicious to publish code that can screw over future readers who might copy-paste it without understanding the implications.

1 Like

The script use “*” for the deletion (after filtering with find to select files older than 8 days) but use the ls in a fir cycle to chdir to the specific satellite folder, in my case :

pi@mcanto:/STORJ5/storage/trash $ ls
pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa
qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa
ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa

I just thought that was better to divide … performance are the same : it runs 4 find commands insteads of only one but in any case over the sane millions of files so it doesn’t matter :wink:

Best regards

Thanks

Yeah, in terms of performance there is not much if anything you can do: the unlink is expensive and not as optimized as other filesystem operations, on the design level.

Relevant thread: Unlink performance on FreeBSD

Perhaps even doing everything sequentially can be faster by reducing contention.

This is mean either that it’s still in progress, or you have had an error related either to a filewalker or the databases.

zcat node.log.1.gz | grep error | grep -E "filewalker|database"
cat node.log | grep error | grep -E "filewalker|database"

what @arrogantrabbit is mean:

for n in * ; do echo $n; /usr/bin/find $n -type f -ctime +8 -print -delete; done

I’m not sure that the running deletion commands with xargs would be any faster.

Hello,
I just noticed some errors if not “grep” only for start/stop

ubuntu@hpool:/STORJ_LOCAL-5/LOG$ cat node.log | grep “\sused-space”
2024-11-10T04:39:27Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Lazy File Walker”: false, “Total Pieces Size”: 4924677327714, “Total Pieces Content Size”: 4910494774626, “Total Pieces Count”: 27700299, “Duration”: “12h39m18.037612122s”}
2024-11-10T04:39:27Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-11-10T04:45:21Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Lazy File Walker”: false, “Total Pieces Size”: 10708280576, “Total Pieces Content Size”: 10701173504, “Total Pieces Count”: 13881, “Duration”: “5m54.47502696s”}
2024-11-10T04:45:21Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2024-11-10T13:45:17Z ERROR pieces used-space-filewalker failed {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Lazy File Walker”: false, “error”: “filewalker: filewalker: context canceled; used_space_per_prefix_db: context canceled”, “errorVerbose”: “filewalker: filewalker: context canceled; used_space_per_prefix_db: context canceled\n[tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:181\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:749\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:81\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78](http://tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:181\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:749\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:81\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78)”}
2024-11-10T13:45:17Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-11-10T13:45:17Z ERROR pieces used-space-filewalker failed {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Lazy File Walker”: false, “error”: “filewalker: used_space_per_prefix_db: context canceled”, “errorVerbose”: “filewalker: used_space_per_prefix_db: context canceled\n[tstorj.io/storj/storagenode/storagenodedb.(*usedSpacePerPrefixDB).Get:81\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:96\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:749\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:81\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78](http://tstorj.io/storj/storagenode/storagenodedb.(*usedSpacePerPrefixDB).Get:81\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:96\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:749\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:81\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78)”}
2024-11-10T13:45:33Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-11-10T13:45:41Z ERROR pieces used-space-filewalker failed {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Lazy File Walker”: false, “error”: “filewalker: context canceled”, “errorVerbose”: “filewalker: context canceled\n[tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:78\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:129\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:752\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:83\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78](http://tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:78\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:129\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:752\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:83\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78)”}
2024-11-10T13:45:41Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-11-10T13:45:41Z ERROR pieces used-space-filewalker failed {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Lazy File Walker”: false, “error”: “filewalker: used_space_per_prefix_db: context canceled”, “errorVerbose”: “filewalker: used_space_per_prefix_db: context canceled\n[tstorj.io/storj/storagenode/storagenodedb.(*usedSpacePerPrefixDB).Get:81\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:96\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:752\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:83\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78](http://tstorj.io/storj/storagenode/storagenodedb.(*usedSpacePerPrefixDB).Get:81\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:96\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:752\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:83\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78)”}
2024-11-10T13:45:41Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2024-11-10T13:45:41Z ERROR pieces used-space-filewalker failed {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Lazy File Walker”: false, “error”: “filewalker: used_space_per_prefix_db: context canceled”, “errorVerbose”: “filewalker: used_space_per_prefix_db: context canceled\n[tstorj.io/storj/storagenode/storagenodedb.(*usedSpacePerPrefixDB).Get:81\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:96\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:752\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:83\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78](http://tstorj.io/storj/storagenode/storagenodedb.(*usedSpacePerPrefixDB).Get:81\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:96\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:752\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:83\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78)”}
2024-11-10T13:45:41Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-11-10T13:45:41Z ERROR pieces used-space-filewalker failed {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Lazy File Walker”: false, “error”: “filewalker: used_space_per_prefix_db: context canceled”, “errorVerbose”: “filewalker: used_space_per_prefix_db: context canceled\n[tstorj.io/storj/storagenode/storagenodedb.(*usedSpacePerPrefixDB).Get:81\n\tstorj.io/storj/storagenode/pieces](http://tstorj.io/storj/storagenode/storagenodedb.(*usedSpacePerPrefixDB).Get:81\n\tstorj.io/storj/storagenode/pieces).(*FileWalker).WalkAndComputeSpaceUsedBySatelliteWithWalkFunc:96\n[tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:752\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:83\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78](http://tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:83\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkAndComputeSpaceUsedBySatellite:752\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:83\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78)”}
2024-11-10T13:45:47Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-11-11T08:55:57Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Lazy File Walker”: false, “Total Pieces Size”: 4864694178178, “Total Pieces Content Size”: 4850745341314, “Total Pieces Count”: 27243822, “Duration”: “19h10m9.907232123s”}
2024-11-11T08:55:57Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-11-11T08:55:58Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Lazy File Walker”: false, “Total Pieces Size”: 10708280576, “Total Pieces Content Size”: 10701173504, “Total Pieces Count”: 13881, “Duration”: “138.876445ms”}
2024-11-11T08:55:58Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}

not sure that this could be relevant and how to fix it.

Regarding xargs with r0 parameters I read somewhere that is useful to manage strange characters adding a null at the end and using the null as separator… or something like that

regarding this :
for n in * , instead of, for n in $(ls)
consider that the n is not the file name to be checked but the satellite folder, so in any case are 4/10 and not milion like files are.

in any case the performance issues is the rm command on ext4. I saw a lot of documentation

Best regards

If that happens on restart, then it’s kind of normal - all processes going to be terminated.
Do you have these errors while the node is running?
Also, if all 4 of them were successful, the databases should be updated.

Hello,
I have not restarted but it shows 28 hors running so I suppose that there has been an automatic restart.
In any case , even if some completed,it still report 0 byte free ! 3.7 TB free in the filesystem

What could I do ?

Thanks

Only wait until all used-space-filewalkers will finish the scan since the last restart.
Because right now you logs shows only several starts of the used-space-filewalkers but only a few completed.

I have a suspicion that we might introduced a bug in used-space-filewalker, when we tried to implement a continuation of the used-space-filewalker.
You can test that:

  1. Stop and remove the container (or stop the service, if you use a service based setup)
  2. Rename the used_space_per_prefix.db database to used_space_per_prefix.db.bak
  3. Move all databases (*.db) to the temp folder
  4. Start the node, wait until it would re-create all databases
  5. Stop the node
  6. Move databases from the temp folder back to the storage location with replace
  7. Start the node
  8. Wait until all used-space-filewalkers will finish the scan without issues (no database or filewalker errors)
  9. Check the dashboard

Hello,
luckily was not necessary.

this two parameters fix the issue (docker-compose) :
- STORJ_PIECES_ENABLE_LAZY_FILEWALKER=false
- STORJ_STORAGE2_PIECE_SCAN_ON_STARTUP=true

here there is the logs :
2024-11-10T13:45:47Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-11-11T08:55:57Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Lazy File Walker”: false, “Total Pieces Size”: 4864694178178, “Total Pieces Content Size”: 4850745341314, “Total Pieces Count”: 27243822, “Duration”: “19h10m9.907232123s”}
2024-11-11T08:55:57Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-11-11T08:55:58Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Lazy File Walker”: false, “Total Pieces Size”: 10708280576, “Total Pieces Content Size”: 10701173504, “Total Pieces Count”: 13881, “Duration”: “138.876445ms”}
2024-11-11T08:55:58Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2024-11-11T19:33:54Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Lazy File Walker”: false, “Total Pieces Size”: 2360803709184, “Total Pieces Content Size”: 2358760988928, “Total Pieces Count”: 3989688, “Duration”: “10h37m56.779645077s”}
2024-11-11T19:33:54Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-11-11T23:27:50Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Lazy File Walker”: false, “Total Pieces Size”: 183745270784, “Total Pieces Content Size”: 183424263168, “Total Pieces Count”: 626968, “Duration”: “3h53m56.193484779s”}
ubuntu@hpool:/STORJ_LOCAL-5/LOG$ cat /home/ubuntu/storj-docker-5/docker-compose.yml

now I have 60GB in the Trash and almost 4 TB free reported.
The main issue is that it takes a lot of time and sometimes it restart by itself and need to start again from the beginning.
I will do this in all my nodes.

Best regards

1 Like

So, seems the only problem was the interruption of the used-space-filewalker?

Hello,
the issue has been fixed with the piece scan that took a lot of time and need to run without any interruption (auto restart )
It could be interesting to know how all the 5 nodes went in this bad state (a lot of Trash and totally wrong use space information).

Let’s see how it works after the fix , I will monitor all the nodes

Best regards

1 Like

Easily - you can disable the scan on startup and the difference could be significant: we have had several bugs related to the size calculation for the trash…
And only the scan on startup was able to fix it…