Name of trash folder is not corresponding the actual time of collection

Funny thing: name of trash folder is not corresponding the actual (and correct as i understand) time of collection (especially on big nodes 10tb+)
image
as u can see 2024-04-13 was last modified on 15-04-2024 and 2024-04-21 on 22-04-2024
so what is the correct time when we should start counting 7 days? name of folder (and when trash collector started) or last modified time (when its finished)?

The name of the folder should have been the day or date when it is going to be deleted.

Seems like it’s just based on the date of the name of the folder: storj/storagenode/blobstore/filestore/dir.go at 8d6eed6169659aee3388ee68cbff6b1cb4a69c54 · storj/storj · GitHub

What’s the size of that 2024-04-13 folder? Maybe you can check your logs for the daily trash and see what occurred. There is definitely an attempt to remove that folder, but only provided all the subfiles and folders have already been deleted, as seen here: storj/storagenode/blobstore/filestore/dir.go at 8d6eed6169659aee3388ee68cbff6b1cb4a69c54 · storj/storj · GitHub

The delete day should just be the labelled folder + 7 days.

2 Likes

That’s what it is now. But it could be changed any time and different for every satellite I guess.
So to check if it is working properly, it would have been much easier, if the name of the folder would have been the day when deletion is expected to happen.

The deletion period is hardcoded to 7 days: storj/storagenode/peer.go at 8d6eed6169659aee3388ee68cbff6b1cb4a69c54 · storj/storj · GitHub

Either way it’s not a big deal.

1 Like

Great, thanks.
So at least it would be the same for all satellites.
However this is also a bit of a strange decision. Wasn’t the idea of the trash to be able to roll back in case something goes wrong on the satellite side? To have the same period for all satellites implies the same risk for every satellite. But if there are different satellite operators one day the risk is probably not the same but different for every satellite which is not operated by Storj.
Like this:

But still the hard coded value could change with every update.
I would have made it that the name of the folder is the deletion date. Easiest way and no calculation or lookups in the code would be required.

They said that sometime ago, a bug was introduced in the code, that deleted wrong pieces, and they could recover them from trash. Imagine if there was no trash.

1 Like

The directory name reflects the day when the bloom filter was received. It seems that it took your node around two days to process the bloom filter.

Whether it took your node few seconds, or 5 days to process the bloom filter, the Storj network starts counting time in which it believes trash can be recovered from the time the bloom filter was sent to the node.

By marking the date of collection, as opposed to date of removal, you can flexibly change the trash expiration period after garbage collection—which is a good idea in case something really bad happened to the network.

3 Likes

Do you mean by changing the hardcoded 7 day value in the storagenode code to a different value?
Or is there another way?

Now that confuses me.
Don’t we have a bloom filter roughly only once per week? What about the other deletions then?

Yeah, in case Storj decides 7 days is a bad value, they can release a new binary to change it.

There are three timestamps to discuss here.

  1. Timestamp of the database backup used to generate bloom filters.
  2. Timestamp of the moment when a storage node received the bloom filter from the satellite.
  3. Time when the storage node actually moved a file to trash when processing bloom filter.

(1) is important because a bloom filter cannot be applied to files that were uploaded after the database backup was made. Those files will not be in the backup, hence bloom filters will consider these files as non-existing, hence the node would potentially remove them. To guard against this happening, each time a satellite sents a bloom filter, this timestamp is also sent, and node only garbage-collects files uploaded before this timestamp.

(3) is not important, as it simply depends on how fast I/O a storage node has. Fast node will delete files the same day as (2). Slow nodes may delete files even few days later. So what? The outcome of garbage collection should not differ between those cases.

(2) is important. This is the moment the bloom filter is communicated to a node by a satellite. A satellite does not know quickly the bloom filter will be applied. As such, it makes sense to start counting the 7 days from this point, and it makes sense to assume that in case the node turns out to be so fast that the filter was applied instantly, after 7 days the files will not be recoverable.

1 Like

I thought you maybe had a more simply way in mind. Yes of course but still it would require to release a new binary and update all nodes then. So they could do other code changes as well like halting GC altogether or stop sending bloom filters.
But I agree setting the value from 7 to 10 would be a simple change to carry out.

Yes, but we are only talking bloom filters here. I am confused because are the folders only used/created by the bloom filter garbage collection process? This would come down to the conclusion that trash is only used for the files deleted by the bloom filter garbage collection. Due to the code changes in the past, I simply don’t know anymore if that is the only process the trash is being used at the moment.

Sorry for the delay 2024-04-13 is 700gb and still counting… 2024-04-21- 26gb

Yes, that’s correct.

Right now there are only two ways in which pieces are deleted: through explicit expiration or through garbage collection. And given that explicit expiration removes files immediately, the trash directory is only used by garbage collection.

I see. It is still weird to follow for me.

So does that mean:

  1. We don’t have the case of direct deletes anymore? Every deletion is handled by bloom filter and garbage collector? Usually this was meant only for the case when a node is not online when the delete command has been issued to clean up the remaining waste?

  2. We will not see directories for every day of deletion but normally we could expect only 1 or 2 date directories in trash, as the bloom filter currently only gets created every 5 - 7 days?
    I can see currently on one of my nodes 2 folders in the trash, one with date 17th and one with 22th. This would mean they should get deleted on 24th and 29th?

  3. Given these actual folders, does this also mean that if a customer for example has deleted a file on the 23rd it was not included in the 22nd bloom filter. So it will reside in the storage folder unpaid until the next bloom filter has been applied. So let’s say the next filter gets created on the 27th and arrives on the 28th. Bloom filter creates a trash dir with date 27th on the 28th. Deleted file from the 23rd gets moved into the 27th trash folder. It is now the 28th and file is unpaid for 5 days. Deletion time for the 27th trash folder is + 7 days = 4th of May? So the file is unpaid for 5 days + 7 days = 12 days.

Where is my mistake?

1 Like

Correct.

Indeed no longer true. This was communicated here on the forum by @Alexey, I can’t find the thread now though.

Correct.

Probably on the 25th and 30th to account for potential time zone differences, but yeah.

This is my understanding as well.

1 Like

Please use a script - it checks the modification date not the name of the folder

Yes

It’s now a default.

Yes, but I suspect that we do not remove folders actually even if they are empty. Perhaps it’s a bug.

I think you are correct.
We discussing the possibility to send BF more often to meet the deletion queue.

Likely not, I believe that the trash chore still should collect all pieces older than 7 days, otherwise this is not an improvement. I honestly do not support the idea of date-named-folders, this is so much confusing, especially when you do not know what’s that date mean.

amm… 7 out of 13 of my nodes are still busy with last Bloom filter,
i hope they will finish in 1-3 days, before a next one.

Doesn’t really matter, if your nodes are on 1.10x.x. They should take the next one and proceed further.