Trash does not go away in 7 days

hello.
Let’s hope that is the problem and it is solved quickly.

[quote=“Mitsos, post:41, topic:25936”]
Update on the trash saga: turns out that prefix-named folders aren’t getting deleted.
[/quote]

The node is running version 1.102.3

I have exactly the same problem on 40 nodes! When will the trash bin be emptied? The trash has only been increasing for 3 weeks now. 102.4

2 Likes

Can anyone with this problem let us know which of the following log messages they can find? I’d like to see the full log lines.

  • emptying trash started
  • emptying trash failed
  • emptying trash finished
  • trash-filewalker started
  • trash-filewalker completed

Edited to add: If the undeleted trash directories are causing problems, you can rm -rf them yourself. If the timestamp from the directory name is more than 7 days ago, this will be 100% safe to do. Obviously the node software should be doing it for you, but this is a workaround you can use until we figure out what’s wrong and fix it.

1 Like
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        starting subprocess        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        subprocess started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        trash-filewalker started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode", "dateBefore": "2024-04-22T00:48:56+03:00"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        Database started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        trash-filewalker completed        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode", "bytesDeleted": 0, "numKeysDeleted": 0}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        subprocess finished successfully        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        starting subprocess        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        subprocess started        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        trash-filewalker started        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Process": "storagenode", "dateBefore": "2024-04-22T00:48:56+03:00"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        Database started        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Process": "storagenode"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        trash-filewalker completed        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Process": "storagenode", "bytesDeleted": 0, "numKeysDeleted": 0}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        subprocess finished successfully        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        starting subprocess        {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        subprocess started        {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        trash-filewalker started        {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Process": "storagenode", "dateBefore": "2024-04-22T00:48:56+03:00"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        Database started        {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Process": "storagenode"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        trash-filewalker completed        {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Process": "storagenode", "bytesDeleted": 0, "numKeysDeleted": 0}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        subprocess finished successfully        {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        starting subprocess        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        subprocess started        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        trash-filewalker started        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode", "dateBefore": "2024-04-22T00:48:56+03:00"}
Apr 29 00:48:56 server1 node8[935537]: 2024-04-29T00:48:56+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        Database started        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode"}
Apr 29 00:48:57 server1 node8[935537]: 2024-04-29T00:48:57+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker.subprocess        trash-filewalker completed        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode", "bytesDeleted": 0, "numKeysDeleted": 0}
Apr 29 00:48:57 server1 node8[935537]: 2024-04-29T00:48:57+03:00        INFO        lazyfilewalker.trash-cleanup-filewalker        subprocess finished successfully        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}

Thank you. Does it perhaps work when you disable the lazy filewalkers?

I think that would look like

pieces.enable-lazy-filewalker: false

in the config.yaml or

STORJ_PIECES_ENABLE_LAZY_FILEWALKER=false

as an environment variable.

1 Like

No difference, prefix-named aren’t being deleted on restart.

Apr 29 01:09:20 server1 node8[938354]: 2024-04-29T01:09:20+03:00        INFO        pieces:trash        emptying trash started        {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
Apr 29 01:09:20 server1 node8[938354]: 2024-04-29T01:09:20+03:00        INFO        pieces:trash        emptying trash finished        {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "elapsed": "13.276831ms"}
Apr 29 01:09:20 server1 node8[938354]: 2024-04-29T01:09:20+03:00        INFO        pieces:trash        emptying trash started        {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
Apr 29 01:09:20 server1 node8[938354]: 2024-04-29T01:09:20+03:00        INFO        pieces:trash        emptying trash finished        {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "elapsed": "31.778091ms"}
Apr 29 01:09:20 server1 node8[938354]: 2024-04-29T01:09:20+03:00        INFO        pieces:trash        emptying trash started        {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
Apr 29 01:09:20 server1 node8[938354]: 2024-04-29T01:09:20+03:00        INFO        pieces:trash        emptying trash finished        {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "elapsed": "459.802µs"}
Apr 29 01:09:20 server1 node8[938354]: 2024-04-29T01:09:20+03:00        INFO        pieces:trash        emptying trash started        {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
Apr 29 01:09:20 server1 node8[938354]: 2024-04-29T01:09:20+03:00        INFO        pieces:trash        emptying trash finished        {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "elapsed": "9.318982ms"}
1 Like

Thank you again. That narrows down the problem.

My next ask: can someone run with debug logging enabled? There aren’t a lot of debug log lines in this code, but there are a few that might explain what’s going on. In particular, I’m looking for a line reading directory not found.

1 Like

Let me add these:

2024-04-28T19:22:28Z    INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-04-28T19:22:28Z    ERROR   pieces:trash    emptying trash failed   {"Process": "storagenode", "error": "pieces error: filestore error: remove storage/trash/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/2024-04-13: directory not empty", "errorVerbose": "pieces error: filestore error: remove storage/trash/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/2024-04-13: directory not empty\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash:189\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:319\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:431\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:84\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}
2024-04-28T19:22:28Z    INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-04-28T19:22:28Z    ERROR   pieces:trash    emptying trash failed   {"Process": "storagenode", "error": "pieces error: filestore error: remove storage/trash/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/2024-04-13: directory not empty", "errorVerbose": "pieces error: filestore error: remove storage/trash/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/2024-04-13: directory not empty\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash:189\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:319\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:431\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:84\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}
2024-04-28T19:22:28Z    INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-04-28T19:22:28Z    ERROR   pieces:trash    emptying trash failed   {"Process": "storagenode", "error": "pieces error: filestore error: remove storage/trash/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/2024-04-13: directory not empty", "errorVerbose": "pieces error: filestore error: remove storage/trash/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/2024-04-13: directory not empty\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash:189\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:319\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:431\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:84\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}
2024-04-28T19:22:28Z    INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-04-28T19:22:28Z    ERROR   pieces:trash    emptying trash failed   {"Process": "storagenode", "error": "pieces error: filestore error: remove storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-13: directory not empty", "errorVerbose": "pieces error: filestore error: remove storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-13: directory not empty\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash:189\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:319\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:431\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:84\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}

That is one node that has the nested directories in trash like: /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-13/2024-04-13/2024-04-13/2a/

Obviously they do not get deleted because they are not empty as there are the nested directories in it. If it is 100% safe to delete those folders after 7 days there should be no need to check if they are empty or not.
Version was 1.99.3 and it seems it is now 1.102.3 since today or yesterday. If that version already contains a fix for that, maybe the nested folders will get removed on the next run of the trash chore. If it has not been fixed yet, then that’s the actual status.

2 Likes

Are you sure about this? Using the command ‘curl -sL http://ip:port/api/sno/satellites’, I get the averageUsageBytes value as 4043689674570.715, and on the left graph, it’s represented as 4.04TB. So, I guess this bug was already fixed.

Debug doesn’t show any directory not found. Other than the previous log lines (start, finish) these are there:

Apr 29 10:47:58 server1 node8[1012532]: 2024-04-29T10:47:58+03:00        DEBUG        blobscache        deleted piece        {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "disk space freed in bytes": 768}
Apr 29 10:47:58 server1 node8[1012532]: 2024-04-29T10:47:58+03:00        DEBUG        pieces        deleted piece        {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "M4ZLCYCHI4A6IHNOAFHVSHYZO2X7IZNNHTV3K3Y5CCVYYOH3SQWA"}
Apr 29 10:47:58 server1 node8[1012532]: 2024-04-29T10:47:58+03:00        INFO        collector        deleted expired piece        {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "M4ZLCYCHI4A6IHNOAFHVSHYZO2X7IZNNHTV3K3Y5CCVYYOH3SQWA"}
Apr 29 10:47:58 server1 node8[1012532]: 2024-04-29T10:47:58+03:00        DEBUG        blobscache        deleted piece        {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "disk space freed in bytes": 145408}
Apr 29 10:47:58 server1 node8[1012532]: 2024-04-29T10:47:58+03:00        DEBUG        pieces        deleted piece        {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "ARSUG3EUGDWTUTEFBNSKBI45TU7TIINBIBQ7UXQ5F5ULLEJIBRSQ"}
Apr 29 10:47:58 server1 node8[1012532]: 2024-04-29T10:47:58+03:00        INFO        collector        deleted expired piece        {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "ARSUG3EUGDWTUTEFBNSKBI45TU7TIINBIBQ7UXQ5F5ULLEJIBRSQ"}

Huh, I wasn’t aware of those. That would definitely break trash collection. That means the per-day dir transition ran more than once, but it’s not clear how that could happen. If you haven’t deleted them yet, could you show the timestamps on:

  • /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-13
  • /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-13/2024-04-13
  • /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-13/2024-04-13/2024-04-13
  • /storage/trash/.trash-uses-day-dirs-indicator

(preferably with stat(1) or something like it to show subsecond precision)?

If the higher layers and the indicator file all have birth times within a few microseconds of each other, then perhaps there are multiple things “opening” the blobstore layer at the same time for some reason.

It’s 100% safe for you, the operator, as long as you know things like: you haven’t put anything in there yourself, there aren’t any bind mounts under that point, etc. For the part of the storagenode software, it has to avoid deleting things that aren’t expected, in case of configuration mistakes or a bug causing confusion about where important files are.

2 Likes

Then you are probably also not aware of my other trash issue here:

Great.

I have it on more than one node and others have it as well.

Unfortunately birth time does not get recorded. It has only access, modify and change time.

/trash/.trash-uses-day-dirs-indicator
Access: 2024-04-13 06:23:09.153704449 +0300
Modify: 2024-04-13 06:23:09.153704449 +0300
Change: 2024-04-13 06:23:09.153704449 +0300

/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-13
Access: 2024-04-13 05:12:47.112192010 +0300
Modify: 2024-04-13 05:20:39.930751195 +0300
Change: 2024-04-13 06:23:05.453527813 +0300

trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-13/2024-04-13/
Access: 2024-04-13 03:18:03.539695399 +0300
Modify: 2024-04-13 03:18:03.539695399 +0300
Change: 2024-04-13 05:20:39.930751195 +0300

/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-04-13/2024-04-13/2024-04-13/
Access: 2022-11-18 16:12:12.827925254 +0200
Modify: 2022-07-17 12:28:09.453219325 +0300
Change: 2024-04-13 03:18:03.539695399 +0300

I don’t know if it is helpful without birth date.

Fine then. But I don’t understand why you would expect empty trash date folders. According to @elek the idea is to simply delete the folders that are old enough:

And was there a bug that did not even delete empty date folders:

But well, then you need to find a way to tell the node software that it is fine to delete these nested folders and make the trash collection work again on the affected nodes. As I have these on more than on one node and I checked only a few when that update that performed the migration had been applied, I cannot check every node and perform manual cleanup of the erroneous folder structures.

2 Likes

Why are the pieces deleted in the dated trash folders, and not just remove the entire folder?
This way you won’t have undeleted empty folders. Just check the name of the folder and delete it if it’s too old. No need for another piece scan in trash too.

5 Likes

Here is another one:

2024-05-01T06:40:43Z    DEBUG   pieces:trash    starting to empty trash {"Process": "storagenode"}
2024-05-01T06:40:43Z    INFO    pieces:trash    emptying trash started  {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-01T06:40:43Z    INFO    lazyfilewalker.trash-cleanup-filewalker starting subprocess     {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-01T06:40:43Z    INFO    lazyfilewalker.trash-cleanup-filewalker subprocess started      {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-01T06:40:43Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "dateBefore": "2024-04-24T06:40:43Z", "Process": "storagenode"}
2024-05-01T06:40:43Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      Database started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode"}
2024-05-01T06:40:43Z    INFO    lazyfilewalker.trash-cleanup-filewalker subprocess exited with status   {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "status": 1, "error": "exit status 1"}
2024-05-01T06:40:43Z    ERROR   pieces:trash    emptying trash failed   {"Process": "storagenode", "error": "pieces error: lazyfilewalker: exit status 1", "errorVerbose": "pieces error: lazyfilewalker: exit status 1\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:85\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkCleanupTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:423\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:84\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}

The exit code resp. error message is different.
Despite it has also the nested folders:

/storage/trash/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/2024-04-20/2024-04-20/2024-04-20/2024-04-20/2024-04-20/2a/

The first step to remove an entire folder is to clean up its contents. At a programming level operating systems do not provide an operation like “remove the entire folder”. Operating systems only provide operations “remove this single file” and “remove an empty directory”.

Maybe storage nodes just forgot about this last step…

2 Likes

Do you happen to have the logs from around the time the .trash-uses-day-dirs-indicator file was created? It must have hit some error trying to migrate that satellite.

That is the idea, but “deleting folders” still involves iterating over all files, and deleting still involves reading from and writing to each file’s inode (or the MFT, on NTFS). The performance benefit is just that we don’t have to iterate over all trashed files, only the ones in the day-folder we are deleting. If files are trashed at a mostly constant rate, and we keep 7 days of trash, then deleting each day-folder should take about 1/7 the time it would have taken to iterate over all trash files and check the creation time for each one.

But yes, there are clearly bugs with trash cleanup. Multiple ones. We’re going to be working on it.

3 Likes

Unfortunately not. Logs are gone.

I see. Well the plan sounded like performing something like rm -rf on the date folder would do.

The only reason directories aren’t being deleted and we are currently in this mess is that filewalker seems to need to run for every trashed file and update the database so that a value can be displayed on a piechart.

What I propose we do to finally fix this situation:

  1. First of all get rid of any logic that counts space based on filewalkers. This logic should be migrated to this:
    a. Total space = config.yaml’s storage.allocated-disk-space
    b. Free space = (current free as reported by the operating system) - (new config.yaml storage.free-over-provision-space). This value should be displayed as-is on the dashboard and reported to the satellite as the current value is being reported (hourly?)
    c. Used space = Total space - free space (b above) - trash (see below)
    d. Trash = For every date-named directory, create a file within this directory at start of bloom. For every file moved to trash, take the current value from this file and add the space occupied by the piece, store new value to file, move on to the next piece. The dashboard should search for this file within the satellites’ date directories, add up the values and display it. (in simpler words add 4(satellites) x 7(days) worth of values)
  2. We are finally getting to the interesting part: The trash cleanup routine should run once per day per satellite. This routine’s task should only be limited to asking the operating system to remove the directories older than X days.

Migrating to this should get rid of the logic that the current trash cleanup uses, which, to my non-programmer understanding, is to go over every single piece within every date-named (or otherwise) directory for every satellite, go and update the databases with usage/pieceID, so we can have a nice pie chart.

Now, in the case that SHTF and trash is restored: This could not be solved any simpler, even if you tried to come up with a simpler solution. There is a forum announcement that “SHHTF, we needed to restore pieces from trash. There is nothing you need to do to since we have migrated to a new method of calculating space and the dashboard will be updated the next time you hit refresh on it”.

My $0.02. Some will not like them, some will. Some will disagree, some will agree. That’s life, tomorrow is another day.

It’s used the same API as a dashboard…
It should be fixed, sure, I just do not have an information in which version.