I checked the database and can assure you that on this particular node, all records from it were correctly deleted by the TTL collector, almost immediately (with few minutes delay at max) after deleting the corresponding files from the disk. And it was this way for many days before even before update from v 1.105 to v 1.108.
You don’t seem to understand exactly what the problem with the TTL collector is in the current (version <= 1.108) storage nodes, which the above-mentioned patch should fix.
This is not because the TTL collector now allegedly does not delete records from the database at all after deleting files. It actually deletes them too, but it does so only after completing each pass. And starts this work anew if it is interrupted before it finished.
Which can cause serious problems on large nodes (and I see them too, but only on my other LARGE nodes). This will have to be fixed by the batching processing patch mentioned above.
However, if the TTL collector completes the pass successfully, it correctly deletes all processed records from the database at the end of each pass.
Both in version 1.108.x and in version 1.105.x and in some previous versions too. And this is true for my small node from the example above, where the collector correctly completed its work every hour by deleting all records processed in each pass from the database.
That’s where you were right. I randomly selected 3 pieces IDs from the collector warnings log and they were all found in the folder
\trash\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa\2024-07-27\
So it looks like it was actually deleted by GC. This latest one (it still running now, about 80% done):
2024-07-27T20:53:00+03:00 INFO retain Prepared to run a Retain request. {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Created Before": "2024-07-21T17:59:59Z", "Filter Size": 4624470, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
But this is not due to problems with the TTL collector, because GC now deletes files created no later than July 21 - as indicated in the Bloom Filter parameters. And the TTL collector on this node has already successfully processed and deleted all records from the database up to the beginning of July 28:
There is a SQL request to count records in piece_expirations.db which match criteria
sqlite> SELECT count(*) FROM piece_expirations WHERE piece_expiration < datetime('2024-07-28T04:00:00');
0
sqlite> SELECT count(*) FROM piece_expirations WHERE piece_expiration < datetime('2024-07-28T10:00:00');
5043
Although I just forgot that right now we don’t have any direct data deletes from the storagenodes after deleting them from the satellite at the command of the end client? And all the data deleted by the clients remains on the node until the garbage collector finds it sooner or later?
Then maybe that’s what’s happening here. If some large client recently(15-20 JUL) uploaded a large amount of data with TTL set on the US1 satellite, but then did not wait for the expiration of the specified period and deleted this data manually instead. This can lead to a situation where the GC deletes data earlier than the TTL collector, even with its completely correct and timely operation of both.
So maybe it was a false alarm. But then this is a need for further improvement. Because this is not a temporary and can be repeated on regular basis later (as long as the nodes do not process deletes directly and rely on GC).
I think I should create another suggestion on Github for an appropriate improvement: GC, when deleting files, should check whether the ID of the files it deletes exist in piece_expiration.db and if it is delete records from the database by itself. So that the TTL collector does not try to delete them again later and does not pour out thousands of useless warnings (which will provoke operators to simply disable or filter out these warnings, which then you can skip an important problem).