Bloom filter (strange behavior)

https://review.dev.storj.io/c/storj/storj/+/13214

storagenode/retain: do not remove bloomfilter on retain failure

:thinking:

I am not a coder. And I donā€™t know much about how all this hangs together.
But from what we have seen is that a node restart led to deletion of the bloomfilter.

Now with that line above it is safe to assume that a node restart resulted in a retain failure?
If so, why? Interruption, yes. Failure no.
But with change to not remove on failure, hopefully after that change we wonā€™t see another possible outcome that if there is a real failure caused by bloomfilter or other defects like not accessible files or something that it keeps trying to process over and over again and always errors out.
Again: If the processing is working keep resuming after a node restart.
But if the processing is not working and it keeps quitting for the same error over and over again, not making any progress, then it should better not resume at some point to not wast IOPS.
And again the suggestion not to delete the bloomfilter but to move it to trash, so it could be recovered and inspected in case this is necessary. At least for some time until finally deleted by GC.

2 Likes

Thatā€™s correct. It was a previous behavior before implementation saving to the disk and never changed since than until now.

As far as I understand if the retain will fail or the node is restarted, it shouldnā€™t delete the BF.

Not sure that this is needed. But if it would be required - it would be implemented Iā€™m sure.

We will continue to try and improve our code thanks to your feedback!

Keep in mind there are two conversations happening:

  • How the code is today.
  • How the code should be.

To solve your problem we can only discuss how the code is today. For how the code is today, it is a simple fact that if your node isnā€™t able to run very long, it will not be able to complete many tasks. So, please donā€™t be upset with answers that say this, they are simple facts. When I say something like ā€œthereā€™s your problemā€, I am not saying that it is your fault, but I am identifying what seems to be unique about your situation that is causing more urgent complaints than from other node operators.

For how the code should be, we agree that we need improvements! We are working on making nodes better able to handle shutdown. There are a number of changes coming soon to hopefully improve this situation. The storage node software remains a work in progress. Itā€™s much better than it used to be but still has room for improvement. Thanks for your patience!

9 Likes

Hopefully we see dynamic adjustments on node load:

1 Like

I can confirm that on version v1.105.0-rc, when the node is restarted, the bloom filter file is NOT DELETED!

Can all nodes be transferred to version v1.105.0-rc or wait for version v1.105.1?

2 Likes

Wait for the final version. No need to rush. There are steps to be taken.

1 Like

Can we safely delete BFs stored on disk?

Why not? The next one will do the same thingā€¦ more or less.

3 Likes

Likely yes, but the garbage would live at least half of a week longer on your node.

1 Like