but something has to be changed. the log for one node has grown from 0 to 32 gb since yesterday. (because yesterday was a logrotate) that’s really a bit much. or is it foreseeable that it will be initialized once and then no longer or only rarely occur in the future?
I am also surprised that no file can be deleted, or are these bloom filters only made for ttl data?
2024-08-26T01:00:40+02:00 INFO retain Moved pieces to trash during retain {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 10807581, "Failed to delete": -10807581, "Pieces failed to read": 0, "Pieces count": 11078571, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "15h30m45.931357867s", "Retain Status": "enabled"}
It will start a new cycle today, I expect the next SLC BFs (which excludes expired pieces) sent out by the end of this week / early next week… (without error, the generated supposed to be finished by Sunday… )
and if I understand this correctly, the ttl data should not be affected at all
If I understand correctly, the BFs were changed to exclude expired pieces. So now it is possible your node wants to trash an already expired piece. This is only a warning and there is no issue as the file was already deleted by the TTL process.
Then I don’t understand correctly.
excludes does not exploit the ttl data only everything else is touched or not? That’s why I don’t understand the warnings.
If exclusive ttl data is taken then that would make sense because all ttl data was deleted in my case 10807581
Ah, I see where the confusion might be coming from! The BF tells the node which pieces to hold. By excluding expired pieces from the BF generation, expired pieces won’t be added to the BF. This results your node wanting to delete these expired pieces. But as the TTL process already removed the files, GC now gives a warning that the file cannot be found. This just means the TTL process was quicker to delete the file than GC, which is a good thing.
# custom level overrides for specific loggers in the format NAME1=ERROR,NAME2=WARN,... Only level increment is supported, and only for selected loggers!
log.custom-level: "piecestore=FATAL,collector=FATAL,orders=WARN,reputation=WARN,bandwith=WARN"
GC do not delete pieces, it collects them by batches (1,000 by default), then the retain process moves them to the trash. However, if the TTL collector was faster before the retain is to come to move these pieces to the trash - it will print a WARN and continue. So, actually this outcome is better than another, because the TTL collector deletes expired pieces directly while the GC+retain only would move them to the trash and they will be started to being deleted only 7 days later (the deletion itself sometimes can take days).
But if this is the case, won’t we always have such huge logs in the future when the data is deleted via ttl and then the gc runs over it again later?
Or is that because you really want to clean everything?
I had always understood the bloom filter to contain information about which files should be kept, all others are deleted and that there is no information about files to be deleted but only those that should be kept
I think it should be fixed with the next release. The TTL database would be replaced with plain append only files, so it should be more reliable and we wouldn’t have missing pieces with TTL, so the GC shouldn’t try to move them to the trash.
Also it’s likely a race condition which I described many times: the TTL collector removes pieces right away, meantime the garbage collector collects all pieces, which shouldn’t exist and when it is collected 1,000, it calls a retainer, it moves this list of pieces to the trash. Since these are parallel processes, it’s possible, that the TTL collector could remove the piece in the list from the disk before the retainer would try to move it to the trash.
So, the easiest solution would be to add to your log rotation config also a maximum size.
ok then i’ll just wait and see, but whenever such unusual things happen, i’m always irritated at first.
It also seems that I misunderstood the bloom filter, the GC and retain
I currently rotate weekly and retain for a month, which has always worked so far, maybe I’ll see if I rotate daily or something. Since I want to be able to look back a certain time, the size is secondary but currently already extremely special because I can not run my script because it puts the logs in RAM to evaluate ^^