One of my nodes getting suspension warnings

Hi.
One of my nodes is behaving suspiciously.
I ma getting the following warning from the logs:
WARN ordersfilestore Corrupted order detected in orders file {“error”: “ordersfile corrupt entry: proto: pb.Order: illegal tag 0 (wire type 0)”, “errorVerbose”: “ordersfile corrupt entry: proto: pb.Order: illegal tag 0 (wire type 0)\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV0).ReadOne:115\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:239\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:193\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:389\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:183\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:134\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
panic: runtime error: makeslice: len out of range [recovered]
panic: runtime error: makeslice: len out of range [recovered]
panic: runtime error: makeslice: len out of range [recovered]
panic: runtime error: makeslice: len out of range

The CLI interface does not count the time the node is online. It gets stuck after a few seconds
Screen Shot 2020-11-21 at 12.27.34 PM|690x277

The web interface shows a suspension warning

I rebooted the machine and restarted the node a few times in the last few days.
Any suggestions on how to avoid disqualification?
Thanks
Ilan

What is the version of your node ?

Node Version: v1.16.1

Thanks
Ilan

Tagging @ifraixedes

OK.
Where so I report the issue? IS there a thread for bugs? I did not find it.
Thanks
Ilan

Wait for a Storjling to reply here.

i’ve posted this internally, for attention thank you!

1 Like

Thanks.
I appreciate that.
Ilan

2 Likes

I have the exact same issue with my node. Setup seems to be similar (same version, etc.).

> 2020-11-22T10:21:28.171Z        WARN    ordersfilestore Corrupted order detected in orders file {"error": "ordersfile corrupt entry: proto: pb.OrderLimit: illegal tag 0 (wire type 0)", "errorVerbose": "ordersfile corrupt entry: proto: pb.OrderLimit: illegal tag 0 (wire type 0)\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV0).ReadOne:98\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:239\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:193\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:389\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:183\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:134\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
> panic: runtime error: makeslice: len out of range [recovered]
>         panic: runtime error: makeslice: len out of range [recovered]
>         panic: runtime error: makeslice: len out of range [recovered]
>         panic: runtime error: makeslice: len out of range

Storagenode keeps restarting every ~30 seconds. This seems to be the reason why satellites do not reach the node sometimes and suspension is going down.

Assuming that your node is a 32 bits architecture, the issue should be fixed on the next release v1.17.4 which we’ll start to roll it out today or tomorrow.

We had some fixes on related errors but not for these ones and we couldn’t know until the v1.16.1 that this error could still exist due to a big refactoring that we had right after the previous version and v1.16.1 (ref: Raspberry Pi4 - Node crashes since today... weird GO error - #8 by ifraixedes). With version v1.16.1, we have seen that the problem still persists for order files version 0, which is an old format that isn’t used anymore, and, every day, we have fewer and fewer orders with such format across all the storage nodes.

@nerdatwork :point_up:

What you can do for not having your node crashing until your node gets updated to v1.17.4 is what’s mentioned in Why my container is restarting all the time? - #10 by ifraixedes

Patch reference: After upgrading to version 1.16.1, my node stopped working correctly. Raspberry pi 4 - #21 by ifraixedes

Our apologies for the inconvenience.

3 Likes

Thanks for the feedback! I tested the workaround and it seems that the node is at least stable now again. I will wait for the v1.17.4.

1 Like