I’ve currently started the graceful exit process on one of my nodes, due to a predicted upcoming hardware failure (I will be rejoining the network with new hardware in a few months). I’ve started with a GE on only the stefan-benten satellite just to see how it goes before I start on the rest.
Most piece transfers have been successful, only a small number have failed with the errors below.
I know graceful exit is a relatively new feature with not a lot of use yet so I am a little uncertain about these issus:
1.I seem to be having this strange issue where a few pieces will fail to transfer because the “database is locked”, but using grep on the logs, the piece had already been transferred?
2020-05-12T19:14:54.659Z INFO gracefulexit:chore piece transferred to new storagenode {"Storagenode ID": "1YwgqyPxA6n7enqJ3dRPhEasHMbuCSBacoFJeMerGr3DMgBmYz", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Piece ID": "4XZO7Y7HD3P63J7V7RAJMGEVY23GSHH3P75ZKFJIX6TX2E76MFVQ"}
2020-05-12T19:26:40.846Z ERROR gracefulexit:chore failed to put piece. {"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Piece ID": "4XZO7Y7HD3P63J7V7RAJMGEVY23GSHH3P75ZKFJIX6TX2E76MFVQ", "error": "protocol: usedserialsdb error: database is locked", "errorVerbose": "protocol: usedserialsdb error: database is locked\n\tstorj.io/uplink/private/piecestore.(*Upload).Write:160\n\tbufio.(*Writer).Flush:593\n\tbufio.(*Writer).Write:629\n\tstorj.io/uplink/private/piecestore.(*BufferedUpload).Write:32\n\tstorj.io/uplink/private/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:240\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43"}
2.I am also curious about the “storage node overloaded” error when transferring pieces for GE, is it hitting the limit on my node, or on the receiving node?
2020-05-12T19:28:18.128Z ERROR gracefulexit:chore failed to transfer piece. {"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "error": "protocol: storage node overloaded, request limit: 8", "errorVerbose": "protocol: storage node overloaded, request limit: 8\n\tstorj.io/uplink/private/piecestore.(*Upload).Write:160\n\tbufio.(*Writer).Flush:593\n\tbufio.(*Writer).Write:629\n\tstorj.io/uplink/private/piecestore.(*BufferedUpload).Write:32\n\tstorj.io/uplink/private/piecestore.(*LockingUpload).Write:89\n\tio.copyBuffer:404\n\tio.Copy:364\n\tstorj.io/common/sync2.Copy:22\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:240\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43"}
3.I’ve also noticed a few failures where the commonality seems to be that there is an ipv6 address involved with a “dial tcp” failure, but I’m not really sure what this means. If it matters, my network and ISP both support ipv6, and the node does have an ipv6 address assigned.
2020-05-12T19:46:51.426Z ERROR gracefulexit:chore failed to put piece. {"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Piece ID": "XPRSKNTQB25EQMGLSIJIJ2VDG3GMYETYDKKNX7UXRAG4EQYOZPBA", "error": "piecestore: rpccompat: dial tcp [2a03:10c3:259a:4::10]:28967: connect: cannot assign requested address", "errorVerbose": "piecestore: rpccompat: dial tcp [2a03:10c3:259a:4::10]:28967: connect: cannot assign requested address\n\tstorj.io/common/rpc.Dialer.dialTransport:264\n\tstorj.io/common/rpc.Dialer.dial:241\n\tstorj.io/common/rpc.Dialer.DialNode:140\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:68\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:198\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43"}
2020-05-12T19:46:51.429Z ERROR gracefulexit:chore failed to transfer piece. {"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "error": "piecestore: rpccompat: dial tcp [2a03:10c3:259a:4::10]:28967: connect: cannot assign requested address", "errorVerbose": "piecestore: rpccompat: dial tcp [2a03:10c3:259a:4::10]:28967: connect: cannot assign requested address\n\tstorj.io/common/rpc.Dialer.dialTransport:264\n\tstorj.io/common/rpc.Dialer.dial:241\n\tstorj.io/common/rpc.Dialer.DialNode:140\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:68\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:198\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43"}
Would any of these transfer failures affect whether the graceful exit is successful or not?
Thanks for everyone’s help, and I apologize in advance if I have misinterpreted any of these errors, or if they aren’t important.