Storage node accepts more data than allocated resulting in "upload failed" log messages

Since update 0.28.4, my storge node seems to accept more data, even if the node is completely filled.
This results in lots of “upload failed” error messages in the logs:

2020-01-07T20:54:36.427Z INFO piecestore upload started {“Piece ID”: “KDLKPUKQFFRFIU7LKAJXJ6OPOPM6V6DIP76AMRFGESQF7NXHOXAQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”}
2020-01-07T20:54:36.998Z INFO piecestore upload failed {“Piece ID”: “KDLKPUKQFFRFIU7LKAJXJ6OPOPM6V6DIP76AMRFGESQF7NXHOXAQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “error”: “piecestore protocol: out of space”, “errorVerbose”: "piecestore protocol: out of space\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:422\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:268\n\tstorj.io/storj/pkg/pb.DRPCPiecestoreDescription.Method.func1:1064\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/

The Storage Node Dashboard shows negative available disk space (e.g. -12 MB).
The node is configured with 600GB of disk space, which is completely used.
The hard drive the node uses to store data is not filled and has about 140GB free space.

Restarting the node’s docker container fixes the problem.
I assume that is, because the calculation of available storage space is incorrect.
I assume the node will run into this error again, once garbage collection happened and free space becomes available.

My temporary solution is, to reduce the node’s allocated storage space to 500GB, once the 600GB was reached. I hope, this way I can avoid reputation loss of my node.

Pretty sure this is intended behavior and won’t cause reputation loss at all. Your node doesn’t immediately update the satellites with new storage usage numbers, so satellites will try to have data sent to your node for a little while. Your node is just rejecting those transfers because the storage is full. I think the -12MB is just because when your node reached the limit it still finished uploads that were ongoing. In this case that’s probably 5 pieces. After that it starts rejecting and once the satellites have updated info on your node they stop sending uploads your way until you have space again. A restart prevents this error because it forces your node to send updated storage stats to the satellites, but it isn’t necessary. This problem should normally resolve itself.

6 Likes

Thank you for your answer. This sounds plausible to me.

Would any dev please confirm that these thoughts are correct?

  • The error message does not mean that an unexpected situation happened and it does not need any attention from my side
  • The node’s reputation is not negatively affected by failing to store the data in this scenario.

@BrightSilence is right and both sentences are true.

1 Like

I am getting -1.91GB is this is normal?

If you are referring to the free space, yes this is normal. It just means that your node is full and has used more space than allocated, which can happen for several reasons. Once enough pieces are deleted from your node, it will show a positive free space again. The system has been tweaked since OP’s post, so their reason for negative space is a bit different but related to what you are seeing.

1 Like

This is an incorrect approach what if I don’t have more space then I allocated.
do I need to take in account the possibility of using more than allocated.
in this case I will allocate less space in the first place and that should be written somewhere.
I assumed in the first place that only what I allocated will be used and no more.

Accordingly documentation - yes. You should keep a reserve of 10% of a free space.

4 Likes