Why trash not deleted

But what is the closest date?
When a trash cleanup starts with "dateBefore": "2024-07-14T03:46:45Z" is it supposed to work the oldes date folder or the youngest?
Because today I see in the logs that a trash cleanup has started with this date. And it is working on the date folder 2024-07-13 instead of 2024-07-12.

Edit:
Maybe some more to explain, because I found an older logline.
Yesterday the node started a trash-cleanup with "dateBefore": "2024-07-13T05:51:55Z". I can see it from an older logline.
The cleanup started with the date folder 12th which would be the day before 13th.
It was working on it all day but did not finish it on the 20th.
Now todays date switched to the 21st. The node started a trash cleanup with "dateBefore": "2024-07-14T03:46:45Z" and instead of picking the 12th date folder to continue, it started with the date folder 13th.

So it seems like it if there are more than one date folders it does not pick the oldest, it picks the most recent date. Which means that if there is a new start like in this case, it does not continue with the old folder but starts with a new one, abandoning the old folder in the middle of deletion. If it would select the oldest folder, it would always finish that first, before starting with a more recent one.

This could also explain why I see nodes that still have date folders from June, it would be simply that the deletion from most recent to oldest did not reach the oldest folders yet which means that these date is resting in the trash for almost 4 weeks now already.

1 Like

I believe it will lose it, because initially information is collected in memory, then flushed to the database every 1h by default. If you would reset in the middle, there is a high probability, that this information will be lost.
I would suggest to decrease the flush interval, if you expects that your node may restart randomly.

You are explained it better, than me:

Ok, I cannot verify it in the code if my observation is correct.

But it would explain an outcome like this

ls /trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
2024-06-23  2024-06-24  2024-06-27  2024-07-12

Which is very bad.

If we would always pick the oldest first, we would not see such old date folders.

I even think that we do not specify the ordering any kind, and this is simple returned by the OS…

Yes, maybe. I have no idea where this sort order is coming from.

For the date folders I still have no idea what sort order the process is following:

ls /trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
2024-07-05  2024-07-14  2024-07-23  2024-08-05  2024-08-15  2024-08-22
2024-07-06  2024-07-15  2024-07-24  2024-08-06  2024-08-16  2024-08-23
2024-07-07  2024-07-17  2024-07-26  2024-08-07  2024-08-17  2024-08-24
2024-07-08  2024-07-19  2024-07-29  2024-08-08  2024-08-18  2024-08-25
2024-07-09  2024-07-20  2024-07-31  2024-08-09  2024-08-19
2024-07-12  2024-07-21  2024-08-03  2024-08-10  2024-08-20
2024-07-13  2024-07-22  2024-08-04  2024-08-14  2024-08-21

It deleted them in following order:
2024-08-12, 2024-08-11, then went to 2024-07-28, 2024-07-27, 2024-07-18 and it it is currently deleting in 2024-07-14.
If there is some logic behind that I fail to see it.

I do not think that there is any logic for this trash-filewalker. It just should delete pieces from these folders if they are older than 7 days and update the databases. And the sort order seems doesn’t matter.

Ok, then it would be sooooo good if it would delete the oldest first. There is now 7 weeks of trash in that folder.

Why does it matter, if it removes the same amount of trash data in the random sort order?

1 Like

It matters a lot. Because SNOs are paid in TB / HOUR.
That’s 1,176+ hours behind…
It’s an incredibly obvious problem, particularly if it happens again and again, perpetuating what could be a permanent bias of undeleted data.

1/2 cent

1 cent.

Trash is not paid, so it doesn’t matter which date folder gets deleted first. It frees up the same amount of storage per hour.

1 Like

It will delete the same amount of data from the trash for the same amount of time independently of the sort order. The trash filewalker deletes all pieces older than 7 days. So it really doesn’t matter if it deleted 100GB from 2024-08-18 first and 100GB from 2024-08-16, and then 100GB from 2024-08-14, 300GB in total. Or it will delete the same 300GB in the revers order. Or in random. It will be still the same 300GB.

…or it errors out - which it’s obviously been doing for 7 weeks, perpetuating the worst of the cycle, what if there were 5000 GB in the 07-05-2024 folder. Thus the bias, because it obviously never got completed for over 6 weeks.
As a closed loop yes, it’s going to delete data at the same rate, that’s the obvious given.
With the same case in point, jammerdan should just have deleted the data himself, and saved his own sanity but he’s pointing out a pointless, yet obvious flaw which you are oblivious to.

Y’all make for crappy programmers.

And without actually freeing trashed data, how you ever going to get more paid data?

ffs
1/50 of a cent.

The sort order wouldn’t help the node to delete more data, the real problem is the failure of the deletion process. This need to be addressed instead of making the deletion process more complicated and likely slower (any ordering requires an additional time).
I guess disabling the lazy mode could solve the problem with crashes of the lazy trash-filewalker.

I dunno about inodes, but on NTFS a sort order is a processing necessity/built-in requirement on a directory call - it’s embedded in the code. And on 10-20 or even 256+ directories is a totally moot point, as it’s practically instantaneous; every dir listing gets the same processing order requested, be it default OS call or a sort order by special attribute call.
That’s kinda hard to explain, but suffice to say that even a default OS call is sort ordered.

No sorry. For me it is a failure to implement a structured deletion process, instead relying on a seemingly random order of deletion (if it is).

I’ve observed this issue on multiple nodes here are 2 more:

ls  trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
2024-07-05  2024-07-26  2024-07-27  2024-07-29  2024-07-31  2024-08-03  2024-08-06  2024-08-16  2024-08-18  2024-08-19  2024-08-23  2024-08-24

ls trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/
2024-06-24  2024-07-26  2024-08-06  2024-08-14  2024-08-15

These nodes have been up for 9 and 11 days, respectively, so it’s not a matter of crashing.

The latter data shows that there is still data left that customers wanted to have deleted over 2 months ago. This brings up questions of ethic and governance, because oldest data are the most urgent to delete and should go first. Leaving it up to the nodes to delete customer data in randomness looks like a very bad idea to me. I mean you also don’t delete the subfolders in a date folder by random but in a structured order. And obviously it does not matter that this may be

Deleting oldest first also it would be straightforward, logical and much easier to follow the deletion process and spot errors if it would follow a logic and not randomness.

What I also observe is that I have date folders where the deletion went half way through and obviously did not finish but started with another date folder. This leaves back a clutter of empty and full folders and date folders that are half processed. This could simply be avoided if deletion would be started on the oldest folder first.

1 Like

I still didn’t get your concern. The node can delete X amount of data per hour, independently of the order. The sort order will not change it, but it may slow down it, because first the node should execute a stat recursively (to get a modification date), order pieces by the modification date and then process the deletion, still doing the stat (to get a size) for every piece to update the databases with the correct used space (reducing the trash). So it will do the stat call twice per piece. Twice as longer to process. For slow systems, like yours by the way, it sounds like a very bad idea. Any restart and it should start the stat (not deletion!) from the scratch.

Why do you want to delete the older data before than a new one? What’s a difference? It would delete them all eventually. Since it’s obviously too slow process on your node, it would take days (weeks?) in any order.

Why would it have to do that? I thought:

I am talking about the order of processing the date folders and it seems easy to tell the system to delete the oldest folder first?
So in my example above, it should start with date folder 2024-07-05 until it is completely deleted, then next 2024-07-06 and so on. That is the sort order I am talking about.

1 Like