Why trash not deleted

jammerdan · July 15, 2024, 3:31am

Today is the 15th.

I have no idea why this trash folder is still there:

ls /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/
2024-07-06

When I grep the logs for trash-cleanup I see the last process for this satellite on 13th:

2024-07-13T16:58:05Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker started        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode", "dateBefore": "2024-07-06T16:58:05Z"}
2024-07-13T16:58:05Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      Database started        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode"}
2024-07-13T16:58:05Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker completed      {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode", "bytesDeleted": 0, "numKeysDeleted": 0}
2024-07-13T16:58:05Z    INFO    lazyfilewalker.trash-cleanup-filewalker subprocess finished successfully        {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}

So it looks like it has not ran on on the 14th.

On another node I have the same situation with a date folder from the 5th that is still there.

Why are these not deleted after more thatn 7 days?

Alexey · July 15, 2024, 3:41am

Does it has data or is it just an empty folder?

I think it is tried to delete the data before this date (not including this day).

Did your node restart on 2024-07-14?

jammerdan · July 15, 2024, 3:58am

The folder has data in it.

Yes but that was on the 13th. Since then it did not start again. For the other satellite I have

2024-07-14T22:54:25Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode", "dateBefore": "2024-07-07T22:54:25Z"}
2024-07-14T22:54:25Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      Database started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode"}

So why is there no trash cleanup on the 14th deleting everything before 7th?

And as said I have another node where date folder from the 5th is still there:

ls /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/
2024-07-05  2024-07-09  2024-07-10  2024-07-12  2024-07-13  2024-07-14  2024-07-15

There was one big Bloom filter on the node and moving to trash is still ongoing. I thought even while retain is running that old trash folders get deleted.

Alexey · July 15, 2024, 4:05am

On my node it’s still in process

2024-07-13T04:24:26Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker started       {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode", "dateBefore": "2024-07-06T04:24:26Z"}
2024-07-13T04:24:26Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker completed     {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode", "bytesDeleted": 0, "numKeysDeleted": 0}
2024-07-14T13:15:08Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker started       {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode", "dateBefore": "2024-07-07T13:15:08Z"}

jammerdan · July 15, 2024, 4:18am

Are we running one satellite at a time?

So maybe the other satellites start when this one has finished which is currently the last in y log:

2024-07-14T22:54:25Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      trash-filewalker started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode", "dateBefore": "2024-07-07T22:54:25Z"}
2024-07-14T22:54:25Z    INFO    lazyfilewalker.trash-cleanup-filewalker.subprocess      Database started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode"}

I don’t see any progress in the date folder for this satellite. Still 1024 subfolders while running quite some time now.

ls /storage/trash/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/* | wc -l
1024

So if this takes long all other deletions will be delayed as well? So you may have 7 days for the first satellite but 10 or 12 days when deletions for the last satellite start?

Alexey · July 15, 2024, 4:23am

I think yes, they run one by one.

jammerdan · July 15, 2024, 4:41am

Oh I see progress here. I thought the subfolders do get deleted when they are empty but it appears they are not.
I see most of them empty and only some at the end still being full. So it seems folder contents are being deleted.
Ok let’s see if then the other satellites will follow.

I checked the other node that still has the folder from the 5th and it appears to be the same. Many subfolders empty already some are not.
But this shows that deletions can take ages and files are in trash much much longer than the 7 days that are exepected.

striker43 · July 15, 2024, 5:51am

I am seeing the same, my HDD’s have a hard time to keep up with all the load from huge ingress, a lot of data in the trash that has to be deleted, deleting the TTL data and also processing new bloom filters at the same time…

jammerdan · July 17, 2024, 7:09am

Why don’t the subfolders get deleted when they are empty.
It would be much easier to monitor deletion progress. Instead of checking which folders are empty you would only have to keep track of the number of subfolders.

Alexey · July 17, 2024, 7:12am

I do not know. But seems they are deleted only when they have had files in it.
At least my nodes does not have empty subfolders (except the Stefan satellite for the April, when the node get this update I suppose). And I didn’t delete anything manually there.

jammerdan · July 17, 2024, 7:41am

On retain it is easy, as this creates the subfolders one after the another.
For deletion it would be great, if after all files in a subfolder have been deleted, the corresponding subfolder gets deleted as well.

jammerdan · July 18, 2024, 5:10am

@Alexey
So what is wrong with this one:

ls /storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/
2024-06-23  2024-06-24  2024-06-27  2024-07-12

The node is up for 5 days in a row. I can see retain processes running. But when I grep for trash-cleanup there is not a single line returned.

docker logs storagenode | grep trash-cleanup | wc -l
0

It seems that trash cleanup is not running which seems to be consistent with the old trash date folder from 23rd of June still being there.

What could I check?

Alexey · July 18, 2024, 7:59am

They are deleted, if there was at least a one file. Otherwise they will not be deleted, I believe.

I have no idea. Do they contains any file in them? Because my nodes doesn’t have these older folders for any trusted satellites, so likely a local issue. I.e. restart before the finish of the process (when the files are got deleted, but folders are not).

Since you do not have a log redirection (I can assume that, because you use the docker logs command), these lines likely were deleted with the one of the container’s recreations.
If you would have an older logs, we could a troubleshoot.

jammerdan · July 18, 2024, 8:02am

I need to check again but my impression was that the subdirectories get only emptied. Maybe deleted at the end, when all of them are empty.

Alexey · July 18, 2024, 8:04am

They are deleted, but as a part of the trash-filewalker process, if it was interrupted (when it was able to delete files, but not folders yet), they will remain forever. Maybe they would be deleted with the next run, but I’m clearly not sure, because I do not know, how to reproduce.
I would repeat, my nodes doesn’t have the empty trash folders in the folders of the trusted satellites.

P.S. All three still are on the 1.105.4 version

jammerdan · July 18, 2024, 10:59pm

More observation:

I can assure you that when I take this path for example from a node:
It has following subfolders:

ls /trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-07-10
hs  hu  hw  hy  ia  ic  ie  ig  ii  ik  im
ht  hv  hx  hz  ib  id  if  ih  ij  il

The trash cleanup is running on folder hw while I am writing.
Folders up to hw are empty and not (yet?) deleted. Hs ,ht, hu, hv all empty but still present. So to me this looks like they do not get deleted once they have been emptied.

One more thing on the same node: As said, the trash cleanup is currently running on the date folder above, 10th of July. But this is not the oldest of the date folders.
These are the date folders:

ls /trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
2024-07-05  2024-07-08  2024-07-11  2024-07-14  2024-07-17
2024-07-06  2024-07-09  2024-07-12  2024-07-15  2024-07-18
2024-07-07  2024-07-10  2024-07-13  2024-07-16

The oldest date folder is the 5th. The trash cleanup should work on that folder, not on the 10th.
When I enter the 5th, I can see that some subfolders are empty, some are still full of files.

ls /trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-07-05
a2  a5  aa  ad  ag  aj  am  ap  as  av  ay  bb  be  bh  bk  bn  bq  bt
a3  a6  ab  ae  ah  ak  an  aq  at  aw  az  bc  bf  bi  bl  bo  br  bu
a4  a7  ac  af  ai  al  ao  ar  au  ax  ba  bd  bg  bj  bm  bp  bs  bv

So for example: Folder az is already emptied, folder ba still has files in it.
It might be that the cleanup was interrupted. But why did it not resume with the 5th July folder? Why is the cleanup now running on the 10th July date folder instead? This does not make sense to me. I cannot see any logic here as this now leaves me with half emptied folders the trash cleanup is not working on and therefore they don’t yet get deleted.
I don’t even know if they will ever get deleted.

Please don’t tell me that I have to cleanup manually after the trash cleanup. This does not make any sense.

jammerdan · July 19, 2024, 5:22am

I have assurance now that after being interrupted the cleanup will not resume with the correct date folder but picks the next one.

Exactly the node I was talking about before got interrupted. Now instead of resuming with the 10th of July date folder it is working in the 11th of July date folder. So the subfolders in the 10th are partially empty while some are not.

I don’t think this is how it should be. The cleanup should start or re-start with the oldest date folder and not with some date folder in between.

jammerdan · July 20, 2024, 1:04am

Cleanup of 10th is finished. While it was running no subfolder was deleted, only emptied.
After all subfolders have been emptied, the date folder got deleted.

Now after this it seems that the cleanup for this satellite is considered finished despite that numerous subfolders from earlier date folders have not been emptied and therefore not been deleted.
Currently it is working on the next satellite. Which probably means that now all other satellites get their cleanup in a row until it is the turn of the US1 satellite again. It may then start again with the oldest date folder, which is the 5th of July.
I don’t know if such an implementation makes sense. Why does it not resume on the same date folder when interrupted?

Alexey · July 20, 2024, 7:05am

I believe it doesn’t have a state and always starting with a closest date (because usually it’s expected). Your case is special seems.

donald.m.motsinger · July 20, 2024, 12:01pm

It would be good if each xx folder would get deleted immediately after it got emptied.

Also, the trash size in the dashboard should get updated after the deletion of each xx folder. What happens now if the trash cleanup gets interrupted half way through. I assume that the trash size doesn’t get updated in this case. At the next run it will delete the remaining half of the trash. Does the trash cleanup reduces the trash size in the dashboard correctly or did it forget about the previously interrupted run and only lowers it by the amount of the 2nd pass?