Extremely large logs

CubeFan · August 26, 2024, 5:19pm

I hope I haven’t overlooked anything, but do you also have such extremely large logs? For me, the GC throws a lot of these errors:

and also the GC says that it can’t delete files:

2024-08-26T19:15:43+02:00       INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"Process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-08-26T19:15:43+02:00       INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 7839, "Failed to delete": -7839, "Pieces failed to read": 0, "Pieces count": 356549, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "34.175440815s", "Retain Status": "enabled"}

2024-08-26T01:00:40+02:00       INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 10807581, "Failed to delete": -10807581, "Pieces failed to read": 0, "Pieces count": 11078571, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "15h30m45.931357867s", "Retain Status": "enabled"}

Surely that’s not how it should be? The log size is extremely large for all nodes.

Is there anything I can/must do?

edo · August 26, 2024, 5:26pm

Hi @CubeFan, see:

As GC was moving a lot of data to trash, your log might be larger than usual.

CubeFan · August 26, 2024, 5:30pm

ok thank you very much!

but something has to be changed. the log for one node has grown from 0 to 32 gb since yesterday. (because yesterday was a logrotate) that’s really a bit much. or is it foreseeable that it will be initialized once and then no longer or only rarely occur in the future?

I am also surprised that no file can be deleted, or are these bloom filters only made for ttl data?

2024-08-26T01:00:40+02:00       INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 10807581, "Failed to delete": -10807581, "Pieces failed to read": 0, "Pieces count": 11078571, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "15h30m45.931357867s", "Retain Status": "enabled"}

edo · August 26, 2024, 5:33pm

Yes, something did change. The SLC satellite was updated, see:

CubeFan · August 26, 2024, 5:34pm

but the gc does not seem to delete properly if i understand the log entry below correctly

2024-08-26T01:00:40+02:00       INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 10807581, "Failed to delete": -10807581, "Pieces failed to read": 0, "Pieces count": 11078571, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "15h30m45.931357867s", "Retain Status": "enabled"}

CubeFan · August 26, 2024, 5:36pm

It will start a new cycle today, I expect the next SLC BFs (which excludes expired pieces) sent out by the end of this week / early next week… (without error, the generated supposed to be finished by Sunday… )

and if I understand this correctly, the ttl data should not be affected at all

edo · August 26, 2024, 5:38pm

If I understand correctly, the BFs were changed to exclude expired pieces. So now it is possible your node wants to trash an already expired piece. This is only a warning and there is no issue as the file was already deleted by the TTL process.

Does this answer your questions?

edo · August 26, 2024, 5:41pm

It affects only the expired pieces that got not cleaned up.

CubeFan · August 26, 2024, 5:46pm

Then I don’t understand correctly.
excludes does not exploit the ttl data only everything else is touched or not? That’s why I don’t understand the warnings.
If exclusive ttl data is taken then that would make sense because all ttl data was deleted in my case 10807581

arrogantrabbit · August 26, 2024, 6:40pm

I would submit PR to change that message level to INFO. I don’t think it deserves to be WARN — its expected behavior by design.

edo · August 26, 2024, 7:34pm

Ah, I see where the confusion might be coming from! The BF tells the node which pieces to hold. By excluding expired pieces from the BF generation, expired pieces won’t be added to the BF. This results your node wanting to delete these expired pieces. But as the TTL process already removed the files, GC now gives a warning that the file cannot be found. This just means the TTL process was quicker to delete the file than GC, which is a good thing.

As @arrogantrabbit mentioned (and @BrightSilence mentioned for warnings when the opposite happens), these should really be INFO messages instead of WARNING ones. So no need to worry—your node’s just getting a little ahead of itself.

donald.m.motsinger · August 26, 2024, 8:00pm

That’s why I have this in my config.yaml

# custom level overrides for specific loggers in the format NAME1=ERROR,NAME2=WARN,... Only level increment is supported, and only for selected loggers!
log.custom-level: "piecestore=FATAL,collector=FATAL,orders=WARN,reputation=WARN,bandwith=WARN"

Otherwise it is too much noise in the logs.

JWvdV · August 26, 2024, 9:33pm

Won’t do much though…

Alexey · August 27, 2024, 4:43am

GC do not delete pieces, it collects them by batches (1,000 by default), then the retain process moves them to the trash. However, if the TTL collector was faster before the retain is to come to move these pieces to the trash - it will print a WARN and continue. So, actually this outcome is better than another, because the TTL collector deletes expired pieces directly while the GC+retain only would move them to the trash and they will be started to being deleted only 7 days later (the deletion itself sometimes can take days).

CubeFan · August 27, 2024, 8:24am

But if this is the case, won’t we always have such huge logs in the future when the data is deleted via ttl and then the gc runs over it again later?
Or is that because you really want to clean everything?

I had always understood the bloom filter to contain information about which files should be kept, all others are deleted and that there is no information about files to be deleted but only those that should be kept

Alexey · August 28, 2024, 2:59am

I think it should be fixed with the next release. The TTL database would be replaced with plain append only files, so it should be more reliable and we wouldn’t have missing pieces with TTL, so the GC shouldn’t try to move them to the trash.
Also it’s likely a race condition which I described many times: the TTL collector removes pieces right away, meantime the garbage collector collects all pieces, which shouldn’t exist and when it is collected 1,000, it calls a retainer, it moves this list of pieces to the trash. Since these are parallel processes, it’s possible, that the TTL collector could remove the piece in the list from the disk before the retainer would try to move it to the trash.

So, the easiest solution would be to add to your log rotation config also a maximum size.

CubeFan · August 28, 2024, 8:58am

do you mean in version 1.111 or in 1.112?

ok then i’ll just wait and see, but whenever such unusual things happen, i’m always irritated at first.
It also seems that I misunderstood the bloom filter, the GC and retain

I currently rotate weekly and retain for a month, which has always worked so far, maybe I’ll see if I rotate daily or something. Since I want to be able to look back a certain time, the size is secondary but currently already extremely special because I can not run my script because it puts the logs in RAM to evaluate ^^

Thanks for the help and clarification

Alexey · August 29, 2024, 4:34am

You may track it when it would be included to the release and to which one: