Context cancelled on all uploads/downloads

encabn · February 4, 2020, 11:44am

Hello,

Since a few months i was running a node that was working perfectly, now few days ago its throwing “Context cancelled” errors massive, for uploads and for downloads, i tried to reinstall and rebuild the databases, and even no work, still the same problem,

Its any Known Bugs or problems about this?

I dont find any paste-service for paste the code, so i use Google Drive https://drive.google.com/open?id=1ZHmVlHBEGF2MU1ZJqkgstNX6bPjta78Z

nerdatwork · February 4, 2020, 12:12pm

Welcome to the forum @encabn!

Its how the system is designed. Its perfectly normal. Your node lost the race for the piece hence ‘context canceled’.

Don’t do anything to jeopardize your node by messing with the databases. Whenever you face any issue, visit forum & search for the issue. If you search for ‘context canceled’ you would find many posts that explain in detail.

If your node is not failing audits then you are good.

kaloyan · February 4, 2020, 5:14pm

@encabn If you see log messages like this:

2020-02-04T01:30:19.306+0100 INFO piecestore upload failed {“Piece ID”: “OQQBKVMCGVFXTHVSRTHQJINZZUXBWM5DPL327IJRP7WPEALDWRUQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “error”: “context canceled”, “errorVerbose”: “context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:79\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:483\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:257\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”}

it does not always mean that your node failed to get the piece.

Some time ago, we changed the uplink to aggressively close the connection after upload without waiting for the last acknowledgment from the storage node. This significantly improved the upload performance.

As a result, we have this pollution in the storagenode logs with “upload failed” messages due to “context canceled”. For this specific piece OQQBKVMCGVFXTHVSRTHQJINZZUXBWM5DPL327IJRP7WPEALDWRUQ you can see in the logs that it is deleted later. This cannot happen if it wasn’t uploaded successfully in the first hand.

We are now working on cleaning up the logs. There is a change in review: https://review.dev.storj.io/c/storj/storj/+/821

BrightSilence · February 4, 2020, 5:36pm

It’s not entirely clear to me whether this is just about the remaining transfers when the success threshold is reached or actual finished transfers as well. Has this been a recent change?

The change in the code seems to be mostly a change in log terminology for all cancelled transfers, which suggests this would change the message for both incomplete as well as complete uploads like the example you gave with the piece that was later deleted. Could you clarify this?

encabn · February 4, 2020, 11:55pm

Thanks for taking time to review.

Im not working on Docker either (due to linked post)

cdhowie · February 5, 2020, 6:52am

Isn’t this also incredibly dangerous? Like issuing a COMMIT statement to a database server and then immediately closing the connection, and assuming your data is safe? Or is there some sort of three-way final acknowledgement and the uploader received the first one then doesn’t wait to hear the third?

kaloyan · February 5, 2020, 9:33am

This change is still a work in progress. It will be improved to clearly distinguish between successful and canceled uploads.

After the piece is completely uploaded, the storage node returns a signed hash of the piece to the uplink to confirm it got the expected data. The optimization I mentioned is related to closing the connection afterwards. So, it’s not dangerous at this stage.

encabn · February 10, 2020, 10:36pm

It closes CONNECTION (TCP) not internal commit, that explains all