Storage node accepts more data than allocated resulting in "upload failed" log messages

Since update 0.28.4, my storge node seems to accept more data, even if the node is completely filled.
This results in lots of “upload failed” error messages in the logs:

2020-01-07T20:54:36.427Z INFO piecestore upload started {“Piece ID”: “KDLKPUKQFFRFIU7LKAJXJ6OPOPM6V6DIP76AMRFGESQF7NXHOXAQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”}
2020-01-07T20:54:36.998Z INFO piecestore upload failed {“Piece ID”: “KDLKPUKQFFRFIU7LKAJXJ6OPOPM6V6DIP76AMRFGESQF7NXHOXAQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “error”: “piecestore protocol: out of space”, “errorVerbose”: "piecestore protocol: out of space\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:422\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:268\n\tstorj.io/storj/pkg/pb.DRPCPiecestoreDescription.Method.func1:1064\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/

The Storage Node Dashboard shows negative available disk space (e.g. -12 MB).
The node is configured with 600GB of disk space, which is completely used.
The hard drive the node uses to store data is not filled and has about 140GB free space.

Restarting the node’s docker container fixes the problem.
I assume that is, because the calculation of available storage space is incorrect.
I assume the node will run into this error again, once garbage collection happened and free space becomes available.

My temporary solution is, to reduce the node’s allocated storage space to 500GB, once the 600GB was reached. I hope, this way I can avoid reputation loss of my node.

Pretty sure this is intended behavior and won’t cause reputation loss at all. Your node doesn’t immediately update the satellites with new storage usage numbers, so satellites will try to have data sent to your node for a little while. Your node is just rejecting those transfers because the storage is full. I think the -12MB is just because when your node reached the limit it still finished uploads that were ongoing. In this case that’s probably 5 pieces. After that it starts rejecting and once the satellites have updated info on your node they stop sending uploads your way until you have space again. A restart prevents this error because it forces your node to send updated storage stats to the satellites, but it isn’t necessary. This problem should normally resolve itself.

6 Likes

Thank you for your answer. This sounds plausible to me.

Would any dev please confirm that these thoughts are correct?

  • The error message does not mean that an unexpected situation happened and it does not need any attention from my side
  • The node’s reputation is not negatively affected by failing to store the data in this scenario.

@BrightSilence is right and both sentences are true.

1 Like