Trash used space to big

on one node trash is 160 GB and on another one 120 gb trash, what can we do about that data, can we delete it or what ?

We can just wait 7 days and then the trash will get cleaned up by the storage node.

1 Like

Why the time is so long? 7 days of freezed space = loosing ability to get a new data.

1 Like

Worst case we would need 7 days to fix garabge collection and recover the data. Without that safety feature garbage collection has the potential to destroy the entire network with just one bad bloom filter.

Garbage collection is plan b. If you are online the space will get cleaned up immediate. The interesting question would be why do you have a bigger amount in the trash folder? My trash folder on a full 6 TB node is currently 1.69 KB. On a second node that is still collecting new data the trash folder is 7 GB. That is a side effect of high uptime.

Garbage collection launched few days ago filled the trash folder. Mine is over 1TB atm.
Next saturday it will be empty so don’t worry.

mine 407G :slight_smile: on 6Tb node

TBH my suspicion is that stefan-benten doesn’t delete the data the normal way. My oldest node had a lot of data from that satellite with 4.9TB filled and 1.1TB in trash this morning. And this server is online virtually 100%.

I noticed today, that all subfolders in stefan-benten’s blobs folder starting with a number had ~1GB each in them, then from aa down to xz only a few MB, from ya to yz some had ~1GB and some a few MB and again from za to zz all had ~1GB.

You can’t tell me that these are leftovers from normal delete operations from the client. It looks like the satellite just takes a shortcut with deleting pieces in alphabetical order of the pieceID’s. I have ~250GB in the blobs and 830GB in the corresponding trash folder for stefan-benten. My server is online almost 100% with the odd maintenance shutdown/reboot. It’s impossible that this many delete operations failed.

1 Like

it was a data wipe, of course. 1TB of zombie segments is not possible.çç

I have a single folder in /trash with 1TB filled with aa… xz folders

Is trash paid or unpaid during the 7 days it sits in the trash folder?

1 Like

I am going to have to assume that we are not being paid for the trash sitting in the trash folder since Storj is not getting paid by the customer once they request the file to be deleted from the nodes.

From reading the implementation of the Garbage Collection design it seems that the use of the bloom filter is not 100% and can generate false positives. So Storj is making SNOs store trash for up to 7 days to make sure and hope that the bloom filters from multiple bloom filters generated can catch any false positives and restore any files that might have been thrown away to trash by accident.

Correct me if I’m wrong, but it seems like SNOs are on the short end of the stick here because of an implementation that is not 100%.

1 Like

I once heard it’s the opposite. I believe the bloom filter tells the node which files it should keep, and a false positive results in trash being kept instead of going to the trash folder. The missed files are cleaned during the next garbage collection, when a new bloom filter is sent out, assuming they don’t become a false positive again which is unlikely.

2 Likes

The risk with the bloomfilter is not the false negative rate. The risk has to do with the way the satellite generates the bloomfilter. The satellite goes through all pointers and adds the pieceIDs to a bloomfilter. We call that part metainfo loop. We use the same metainfo loop for the repair checker, accounting, audit reservoir sampling and a few more. For performance it is great to have only one metainfo loop but the downside is that the additional complexity comes with a tradeoff. Every time we touch the metainfo loop we risk breaking the bloomfilter creation. Worst case the satellite will send out empty bloomfilter. We had a bug like that a few month ago on the master branch but we noticed it before deploying in production. Next time we might not be lucky and it gets deployed in production. An empty bloomfilter would mean all nodes delete all pieces and by the time we notice it, it is already to late to stop it. We can close business and go home. The 7 days are needed to mitigate that risk. That gives us 7 days to send a rollback command to all storage nodes, disable garbage collection for the moment and one release later rollout a bugfix. The 7 days are just there to have a escape plan for the worst case.

4 Likes

It is if it was never cleaned up before. This satellite was used for a LOT of testing with random noise. I can imagine that upload errors wouldn’t be such a big deal, since nobody cares about the specific data anyway. Ignore those long enough and don’t clean the zombie segments and you end up with this.
A while ago this satellite went through tons of normal deletes, so most of the data was cleaned up the normal way. Most of what remained after that was apparently zombie segments and stuff. I can see that happening.

No, but there normally shouldn’t be a large amount in there anyway. This is kind of an outlier situation that probably won’t happen again. If you look at it another way, you’ve been paid for zombie data for many months. I’ll forgive them these 7 days as a trade off.

5 Likes

Hi @littleskunk, can I point you to this thread… I have 850GB of trash :frowning:
ERROR pieces:trash emptying trash failed

I have a lot of trash as well (700GB+), we could not all have been offline for such a long time to have generated this much missed deletes.

I think the assumption of SNO being offline is flawed otherwise this would be reflected in the uptime dashboard metric. Also you would be suspended very quickly if this was the case.

There has to be a more logical explanation and it might point to the stefan b satellite deletes possibly not performing the correct delete procedure OR this new thing they’ve discovered “zombie segments”, OR something else. But it’s definitely not the SNO being offline when they’re online.

You are not. Please, read the blueprint for suspension

well my trash just droped from over 160 gb to almost nothing on all my nodes…

mine still have 400Gb…

Mine got deleted earlier today. Congratulations on having a minimum of 20 characters to post.