Next trash problem: Trash not deleting

What I’ve gathered so far, for the past couple of weeks since I’ve never really monitored my nodes this closely:

  1. BFs come in, get saved to disk and are processed one by one:
pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa-1716130689348472000
qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa-1716141599172348000
ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa-1715795999994095000
ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa-1715968799997895000
v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa-1716141599968771000
  1. Trash cleanup doesn’t delete prefix directories until the entire run is done. Trash cleanup starts working on the oldest date directory first, goes through the prefix directories with the letters first, then the numbers (ie: 2024-05-03/aa before 2024-05-03/22). When all of the prefix directories are cleared (show no files in them), only then they are removed. Haven’t caught this while it’s happening so I can’t tell if it does rmdir one by one or not.

  2. If the lazy filewalkers are enabled, then they can only work with “spare” IOPS. Given that there is testing being done, that takes a higher priority. As a result, multiple filewalkers can be running at the same time (ie used+gc+trashclean all at the same time). A slight comment here: it would be perfect if there was some sort of detection that another filewalker is running (ie touch a .trash-cleanup) file somewhere and check for it. On node startup remove this file. IMNSHO, the filewalkers should be run in this order: first trash-cleanup, then gc, and only if neither of those is running used-space. There is no point in iterating through the entire satellite’s directory if those files will be moved to trash or if they are going to be deleted. node restart > .trash-cleanup touched > other filewalkers wait > .trash-cleanup removed > .gc-running > used-space waits > .gc-running removed > used-space starts. Regardless, I have not seen any issues with them running concurrently. They are all slowly but surely working through directories (verified with lsof multiple times).

There are significant improvements to be made by not having the filewalkers trample on each other. Cache data is evicted if the host is low on memory, so it’s better if that data is left for normal download/upload instead of evicting that because trash-cleanup is asking for metadata (so cache them), but gc is also asking for metadata (so evict trash-cleanup’s metadata and cache gc), but used-space comes along (so evict both trash-cleanup + gc). used-space can be last to finalize cache warmups for normal downloading/uploading.

3 Likes