We are aware of multiple issues related to drive space inconsistencies. The engineers are currently heads down working on these issues.
The TTL issue of data being cleared but ingress not resuming has a fix being tested now.
It looks like testing uploads will be paused over the holiday to allow garbage collection to catch up. They are discussing this tomorrow, as I understand it.
There are various other conversations going on around work and ideas to address the additional issues that SNO’s are seeing.
The team cares about these issues and are working to resolve them as soon as possible. We ask for your patience as testing and changes can take time to make sure they don’t introduce other unforseen issues.
If you have any general concerns about drive space being freed up, garbage collection and other delete processes, please comment here. Especially if you feel there are some areas the team is not aware of.
Thank You for being SNO’s and participating and working with us as we stress test parts of Storj.
Garbage collection doesn’t benefit from an upload pause. Repair is the one that is competing with the current uploads. Pausing the uploads would allow the repair worker to drain the repair queue faster.
Edit: Rest of your statement is correct. It will be discused tomorrow. I would give this a 50/50 chance.
Thank you for the update, I share the frustrations.
Hopefully the complaints include:
large amounts of data considered deleted by the satellite but still “used” by our nodes. aka “uncollected rubbish”. Speculated to be a problem with inadequate or not enough bloom filters.
large amounts of filewalkers failing on setups (I’ve seen it with just “context canceled” as the only error message).
more of a design choice than a bug, but trash remaining on the node too damn long. Between the delay before it is garbage collected plus a week of retention in trash, it seems excessive.
Roll the fixes out faster to the Docker nodes. Make storagenode only versions if necessary. It should not take weeks to reach them.
If fixes require to perform a full used space filewalker run to correct numbers, nodes may not be able to to do so until the save state and resume feature has been deployed: https://review.dev.storj.io/c/storj/storj/+/12806 . So don’t delay this feature any further.