But what is the closest date?
When a trash cleanup starts with "dateBefore": "2024-07-14T03:46:45Z" is it supposed to work the oldes date folder or the youngest?
Because today I see in the logs that a trash cleanup has started with this date. And it is working on the date folder 2024-07-13 instead of 2024-07-12.
Edit:
Maybe some more to explain, because I found an older logline.
Yesterday the node started a trash-cleanup with "dateBefore": "2024-07-13T05:51:55Z". I can see it from an older logline.
The cleanup started with the date folder 12th which would be the day before 13th.
It was working on it all day but did not finish it on the 20th.
Now todays date switched to the 21st. The node started a trash cleanup with "dateBefore": "2024-07-14T03:46:45Z" and instead of picking the 12th date folder to continue, it started with the date folder 13th.
So it seems like it if there are more than one date folders it does not pick the oldest, it picks the most recent date. Which means that if there is a new start like in this case, it does not continue with the old folder but starts with a new one, abandoning the old folder in the middle of deletion. If it would select the oldest folder, it would always finish that first, before starting with a more recent one.
This could also explain why I see nodes that still have date folders from June, it would be simply that the deletion from most recent to oldest did not reach the oldest folders yet which means that these date is resting in the trash for almost 4 weeks now already.
I believe it will lose it, because initially information is collected in memory, then flushed to the database every 1h by default. If you would reset in the middle, there is a high probability, that this information will be lost.
I would suggest to decrease the flush interval, if you expects that your node may restart randomly.
It deleted them in following order:
2024-08-12, 2024-08-11, then went to 2024-07-28, 2024-07-27, 2024-07-18 and it it is currently deleting in 2024-07-14.
If there is some logic behind that I fail to see it.
I do not think that there is any logic for this trash-filewalker. It just should delete pieces from these folders if they are older than 7 days and update the databases. And the sort order seems doesnât matter.
It matters a lot. Because SNOs are paid in TB / HOUR.
Thatâs 1,176+ hours behindâŚ
Itâs an incredibly obvious problem, particularly if it happens again and again, perpetuating what could be a permanent bias of undeleted data.
It will delete the same amount of data from the trash for the same amount of time independently of the sort order. The trash filewalker deletes all pieces older than 7 days. So it really doesnât matter if it deleted 100GB from 2024-08-18 first and 100GB from 2024-08-16, and then 100GB from 2024-08-14, 300GB in total. Or it will delete the same 300GB in the revers order. Or in random. It will be still the same 300GB.
âŚor it errors out - which itâs obviously been doing for 7 weeks, perpetuating the worst of the cycle, what if there were 5000 GB in the 07-05-2024 folder. Thus the bias, because it obviously never got completed for over 6 weeks.
As a closed loop yes, itâs going to delete data at the same rate, thatâs the obvious given.
With the same case in point, jammerdan should just have deleted the data himself, and saved his own sanity but heâs pointing out a pointless, yet obvious flaw which you are oblivious to.
The sort order wouldnât help the node to delete more data, the real problem is the failure of the deletion process. This need to be addressed instead of making the deletion process more complicated and likely slower (any ordering requires an additional time).
I guess disabling the lazy mode could solve the problem with crashes of the lazy trash-filewalker.
I dunno about inodes, but on NTFS a sort order is a processing necessity/built-in requirement on a directory call - itâs embedded in the code. And on 10-20 or even 256+ directories is a totally moot point, as itâs practically instantaneous; every dir listing gets the same processing order requested, be it default OS call or a sort order by special attribute call.
Thatâs kinda hard to explain, but suffice to say that even a default OS call is sort ordered.
These nodes have been up for 9 and 11 days, respectively, so itâs not a matter of crashing.
The latter data shows that there is still data left that customers wanted to have deleted over 2 months ago. This brings up questions of ethic and governance, because oldest data are the most urgent to delete and should go first. Leaving it up to the nodes to delete customer data in randomness looks like a very bad idea to me. I mean you also donât delete the subfolders in a date folder by random but in a structured order. And obviously it does not matter that this may be
Deleting oldest first also it would be straightforward, logical and much easier to follow the deletion process and spot errors if it would follow a logic and not randomness.
What I also observe is that I have date folders where the deletion went half way through and obviously did not finish but started with another date folder. This leaves back a clutter of empty and full folders and date folders that are half processed. This could simply be avoided if deletion would be started on the oldest folder first.
I still didnât get your concern. The node can delete X amount of data per hour, independently of the order. The sort order will not change it, but it may slow down it, because first the node should execute a stat recursively (to get a modification date), order pieces by the modification date and then process the deletion, still doing the stat (to get a size) for every piece to update the databases with the correct used space (reducing the trash). So it will do the stat call twice per piece. Twice as longer to process. For slow systems, like yours by the way, it sounds like a very bad idea. Any restart and it should start the stat (not deletion!) from the scratch.
Why do you want to delete the older data before than a new one? Whatâs a difference? It would delete them all eventually. Since itâs obviously too slow process on your node, it would take days (weeks?) in any order.
I am talking about the order of processing the date folders and it seems easy to tell the system to delete the oldest folder first?
So in my example above, it should start with date folder 2024-07-05 until it is completely deleted, then next 2024-07-06 and so on. That is the sort order I am talking about.