Workaround for order sender errors (execute this before they expire)

Step 1: Am I affected?

If you see something like this in your logfile you should worry.

orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	sending	{"count": 1582}
orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	sending	{"count": 7666}
orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	sending	{"count": 2611}
orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	gRPC client when sending new orders settlements	{"error": "order: sending settlement agreements returned an error: EOF", "errorVerbose": "order: sending settlement agreements returned an error: EOF\n\tstorj.io/storj/storagenode/orders.(*Service).settle.func2:283\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57", "request": {"limit":{"serial_number":"...","satellite_id":"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S","uplink_public_key":{},"storage_node_id":"...","piece_id":"...","limit":1303808,"action":2,"piece_expiration":"0001-01-01T00:00:00Z","order_expiration":"2019-09-05T16:35:52.596900042Z","order_creation":"2019-08-29T16:35:52.714474718Z","satellite_signature":"...","satellite_address":{"address":"us-central-1.tardigrade.io:7777"}},"order":{"serial_number":"...","amount":1303552,"uplink_signature":"..."}}}
orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	gRPC client error when receiveing new order settlements	{"error": "order: failed to receive settlement response: rpc error: code = Unknown desc = pq: deadlock detected", "errorVerbose": "order: failed to receive settlement response: rpc error: code = Unknown desc = pq: deadlock detected\n\tstorj.io/storj/storagenode/orders.(*Service).settle:310\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:190\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:169\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	finished
orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	failed to settle orders	{"error": "order: sending settlement agreements returned an error: EOF; order: failed to receive settlement response: rpc error: code = Unknown desc = pq: deadlock detected", "errorVerbose": "group:\n--- order: sending settlement agreements returned an error: EOF\n\tstorj.io/storj/storagenode/orders.(*Service).settle.func2:283\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- order: failed to receive settlement response: rpc error: code = Unknown desc = pq: deadlock detected\n\tstorj.io/storj/storagenode/orders.(*Service).settle:310\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:190\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:169\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	gRPC client when sending new orders settlements	{"error": "order: sending settlement agreements returned an error: EOF", "errorVerbose": "order: sending settlement agreements returned an error: EOF\n\tstorj.io/storj/storagenode/orders.(*Service).settle.func2:283\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57", "request": {"limit":{"serial_number":"...","satellite_id":"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW","uplink_public_key":{},"storage_node_id":"...","piece_id":"...","limit":2317056,"action":1,"piece_expiration":"0001-01-01T00:00:00Z","order_expiration":"2019-09-04T20:28:24.492326966Z","order_creation":"2019-08-28T20:28:24.498725899Z","satellite_signature":"...","satellite_address":{"address":"satellite.stefan-benten.de:7777"}},"order":{"serial_number":"...","amount":2317056,"uplink_signature":"..."}}}
orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	gRPC client error when receiveing new order settlements	{"error": "order: failed to receive settlement response: rpc error: code = Unavailable desc = transport is closing", "errorVerbose": "order: failed to receive settlement response: rpc error: code = Unavailable desc = transport is closing\n\tstorj.io/storj/storagenode/orders.(*Service).settle:310\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:190\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:169\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	finished
orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	failed to settle orders	{"error": "order: sending settlement agreements returned an error: EOF; order: failed to receive settlement response: rpc error: code = Unavailable desc = transport is closing", "errorVerbose": "group:\n--- order: sending settlement agreements returned an error: EOF\n\tstorj.io/storj/storagenode/orders.(*Service).settle.func2:283\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- order: failed to receive settlement response: rpc error: code = Unavailable desc = transport is closing\n\tstorj.io/storj/storagenode/orders.(*Service).settle:310\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:190\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:169\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	gRPC client when sending new orders settlements	{"error": "order: sending settlement agreements returned an error: EOF", "errorVerbose": "order: sending settlement agreements returned an error: EOF\n\tstorj.io/storj/storagenode/orders.(*Service).settle.func2:283\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57", "request": {"limit":{"serial_number":"...","satellite_id":"121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6","uplink_public_key":{},"storage_node_id":"...","piece_id":"...","limit":362240,"action":2,"piece_expiration":"0001-01-01T00:00:00Z","order_expiration":"2019-09-05T07:07:52.399750425Z","order_creation":"2019-08-29T07:07:52.507404358Z","satellite_signature":"...","satellite_address":{"address":"asia-east-1.tardigrade.io:7777"}},"order":{"serial_number":"...","amount":361984,"uplink_signature":"..."}}}
orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	gRPC client error when receiveing new order settlements	{"error": "order: failed to receive settlement response: rpc error: code = Unavailable desc = transport is closing", "errorVerbose": "order: failed to receive settlement response: rpc error: code = Unavailable desc = transport is closing\n\tstorj.io/storj/storagenode/orders.(*Service).settle:310\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:190\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:169\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	finished
orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	failed to settle orders	{"error": "order: sending settlement agreements returned an error: EOF; order: failed to receive settlement response: rpc error: code = Unavailable desc = transport is closing", "errorVerbose": "group:\n--- order: sending settlement agreements returned an error: EOF\n\tstorj.io/storj/storagenode/orders.(*Service).settle.func2:283\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- order: failed to receive settlement response: rpc error: code = Unavailable desc = transport is closing\n\tstorj.io/storj/storagenode/orders.(*Service).settle:310\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:190\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:169\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	sending	{"count": 2667}
orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	sending	{"count": 1613}
orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	sending	{"count": 7793}
...
orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	sending	{"count": 1645}
orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	sending	{"count": 7958}
orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	sending	{"count": 2731}

Step 2: Do I need to panic?

You have 7 days to submit the orders before they expire. The expire date is part of the error message. In this example it is "order_expiration":"2019-09-04T20:28:24.492326966Z" which means 2 days remaining until the orders expire, can’t be submitted and will not get paid. In this case it would be reasonable to panic.

Step 3: What is the workaround?

The current issue is that the order sender has a timeout of 20 seconds. We are working on a fix but we might not be able to deploy it in time. So lets take about a possible workaround you can execute on your end.
By default the storage node will submit orders only once per hour and for some reason it is hitting the 20 second timeout all the time. If you restart the storage node it will try to submit the orders as well and this time with a different timing. If you are lucky you can submit at least a few of them. Repeat it a few times and hopefully they can all be submitted.

3 Likes

Thanks a lot @littleskunk !
I successfully applied this workaround for one of my node (restarting node twice time and look into log immediately), all orders successfully sent!

1 Like

if we are not currently seeing the issue, how often should we check?
I did the following command and got nothing:
docker logs storagenode 2>&1 | grep “failed to settle orders”

I did the following command and everything appears to have finished:
docker logs storagenode 2>&1 | grep “orders”

but I had to restart my node this weekend so I only have 40 hours of logs

1 Like

I hope the fix is issued soon because the workaround does not seem to help my node due to the slowness of the hard drive, yes it is connected via SATA and not USB. I noticed that whenever the node is doing the order settlement, HDD I/O gets maxed out and stays like that until the process timesout. I have tried restarting about 4 times and all cases are the same, the order settlement never completes on time.

Every 5 days is fine. The orders expire after 7 days.

The 20s timeout is fixed with v0.19.6 but lets keep this topic open just in case.

2 Likes

How do you check for still not settled orders?

Restart your node and look at the logs. It should send unsettled orders right away.

Thanks, but it’s sometimes very hard to find the right lines in the log.

grep -E "sending|finished" /mnt/storagenode/node/node.log

is what I was looking for.

The lines for order sending happen right after the restart. I usually use tail to follow what’s happening. In your case.

tail -f /mnt/storagenode/node/node.log

I know the tail command. The problem is, that the “finished” doesn’t always come immediately after the “sending”. With all the other logs in-between it’s easy to miss these messages. For this the output of grep is much easier to read.

What do you get when you grep for "order_expiration" ?

With v0.19.7 we also increase the dial timeout (v0.19.6 was the request timeout only) and we reduce the batch size on the satellite side to stop the deadlock errors.

2 Likes
2019-09-03T09:41:30.719Z	ERROR	orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW	gRPC client when sending new orders settlements	{"error": "order: sending settlement agreements returned an error: EOF", "errorVerbose": "order: sending settlement agreements returned an error: EOF\n\tstorj.io/storj/storagenode/orders.(*Service).settle.func2:284\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57", "request": {"limit":{"serial_number":"IIDSL3SGEFEOXKUQKI272WPE3M","satellite_id":"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW","uplink_public_key":{},"storage_node_id":"12PYwN5yzFr9Jsqy1qKdYQd8BvGENKczALx3QtakkwMCU5v55BM","piece_id":"ZH72OSEQNHK74436IALQW5GYTCYBTXV4TKTBSD54LOLUI5NC6RCQ","limit":2317056,"action":1,"piece_expiration":"0001-01-01T00:00:00Z","order_expiration":"2019-09-04T11:13:20.870510971Z","order_creation":"2019-08-28T11:13:20.878207756Z","satellite_signature":"MEQCIDGZqZJS/Qgp1Zu4d43CGnnPDlndEceCTX85GNY9zB0SAiBi0S1upSiJkaGi8dEKsEVE1SB/iE9lIgkbUBTLkn9W/Q==","satellite_address":{"address":"satellite.stefan-benten.de:7777"}},"order":{"serial_number":"IIDSL3SGEFEOXKUQKI272WPE3M","amount":2317056,"uplink_signature":"E2k/UI9pCVw07PebZiLjWDslOKpRF7ywoPXSNWx2fS0dUXdZdxYRosujuJ1quPwS0d4nwlrvOPebl+HQjgKbAQ=="}}}

This is the last one from this morning

Looks like a good reason to panic. Meanwhile you submitted all orders?

Yes, the amount of lines containing “sending” is followed by the same amount of “finished” lines.

@Dylan Well I just got upgraded to v0.20.1 and now orders are again failing to submit after one minute, with:

2019-09-05T21:14:31.980Z        ERROR   orders.118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW       failed to settle orders        {"error": "order: unable to connect to the satellite: transport error: context deadline exceeded", "errorVerbose": "order: unable to connect to the satellite: transport error: context deadline exceeded\n\tstorj.io/storj/storagenode/orders.(*Service).settle:260\n\tstorj.io/storj/storagenode/orders.(*Service).Settle:192\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrders.func2:171\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

So this problem isn’t solved just yet… or a new problem was created.

Edit: I’ve restarted my node a dozen times and it won’t get past this error submitting orders. The current workaround doesn’t work.

@cdhowie Can you please check again?
That specific satellite was under maintenance (setup changes) which can produce that error as well, simply if its offline.

4 Likes

@stefanbenten I just looked at my logs for the last few hours and everything looks good now. Thanks!

1 Like