V1.16.1 crashed after few seconds

Hi guys,

I’m running the v1.16.1 binary directly on a Armbian device.

Welcome to ARMBIAN 5.67 user-built Debian GNU/Linux 9 (stretch) 3.10.108
System load:   2.31 2.53 2.06   Up time:       55 min
Memory usage:  12 % of 792MB    IP:            192.168.1.6
CPU temp:      52°C
Usage of /:    14% of 6.7G      storage/:      84% of 3.6T

The binary will crash after few seconds. It runs very well after I downgrade to v1.15.3.

Pls point me a direction of troubleshooting. I think the first thing I need to do is a log file but I don’t know how.

Thanks in advance
Chen Wei

Below are the errors displayed before crash.

2020-11-04T10:31:16.354Z WARN ordersfilestore Corrupted order detected in orders file {"error": "ordersfile corrupt entry: ordersfile: unexpected EOF", "errorVerbose": "ordersfile corrupt entry: ordersfile: unexpected EOF\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV0).ReadOne:109\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:239\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:193\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:389\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:183\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:134\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"} 2020-11-04T10:31:16.467Z WARN ordersfilestore Corrupted order detected in orders file {"error": "ordersfile corrupt entry: ordersfile: unexpected EOF", "errorVerbose": "ordersfile corrupt entry: ordersfile: unexpected EOF\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV0).ReadOne:109\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:239\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:193\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:389\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:183\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:134\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"} 2020-11-04T10:31:16.538Z WARN ordersfilestore Corrupted order detected in orders file {"error": "ordersfile corrupt entry: proto: pb.Order: illegal tag 0 (wire type 0)", "errorVerbose": "ordersfile corrupt entry: proto: pb.Order: illegal tag 0 (wire type 0)\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV0).ReadOne:115\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:239\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:193\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:389\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:183\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:134\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"} fatal error: runtime: out of memory

Pls enlighten me how to fix this error? Only v1.16.1 shows this “illegal tag”. v1.5.3 does not have this at all. I have to switch back to v1.15.3 for the time being.

i have similar problem with 1.16.1 on Windows GUI, the error is instoragenode-updater.log:
"INFO Invalid configuration file key {“Key”: “kademlia.operator.email”}
"INFO Invalid configuration file key {“Key”: “kademlia.external-address”}
"INFO Invalid configuration file key {“Key”: “kademlia.operator.wallet”}

also in storagenode.log
The problem is my node ran out of HDD space. Now it has it.
Restarted the computer. But the error still remains and it stops the node short after:

2020-11-10T15:42:20.111+0100 INFO Private server started on 127.0.0.1:7778

2020-11-10T15:42:20.287+0100 INFO trust Scheduling next refresh {“after”: “7h40m40.270676065s”}

2020-11-10T15:42:29.293+0100 WARN console:service unable to get Satellite URL {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “storage node dashboard service error: trust: satellite “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW” is untrusted”, “errorVerbose”: “storage node dashboard service error: trust: satellite “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW” is untrusted\n[tstorj.io/storj/storagenode/trust.(*Pool).getInfo:228\n\tstorj.io/storj/storagenode/trust.(*Pool).GetNodeURL:167\n\tstorj.io/storj/storagenode/console.(*Service).GetDashboardData:168\n\tstorj.io/storj/storagenode/console/consoleapi.(*StorageNode).StorageNode:44\n\tnet/http.HandlerFunc.ServeHTTP:2042\n\tgithub.com/gorilla/mux.(*Router).ServeHTTP:210\n\tnet/http.serverHandler.ServeHTTP:2843\n\tnet/http.(*conn).serve:1925](http://tstorj.io/storj/storagenode/trust.(*Pool).getInfo:228\n\tstorj.io/storj/storagenode/trust.(*Pool).GetNodeURL:167\n\tstorj.io/storj/storagenode/console.(*Service).GetDashboardData:168\n\tstorj.io/storj/storagenode/console/consoleapi.(*StorageNode).StorageNode:44\n\tnet/http.HandlerFunc.ServeHTTP:2042\n\tgithub.com/gorilla/mux.(*Router).ServeHTTP:210\n\tnet/http.serverHandler.ServeHTTP:2843\n\tnet/http.(*conn).serve:1925)”}

2020-11-10T15:42:41.147+0100 WARN ordersfilestore Corrupted order detected in orders file {“error”: “ordersfile corrupt entry: node ID error: not enough bytes to make a node id; have 1, need 32”, “errorVerbose”: “ordersfile corrupt entry: node ID error: not enough bytes to make a node id; have 1, need 32\n[tstorj.io/common/storj.NodeIDFromBytes:91\n\tstorj.io/common/storj.(*NodeID).Unmarshal:178\n\tgithub.com/gogo/protobuf/proto.makeUnmarshalCustom.func1:160\n\tgithub.com/gogo/protobuf/proto.(*unmarshalInfo).unmarshal:175\n\tgithub.com/gogo/protobuf/proto.(*InternalMessageInfo).Unmarshal:63\n\tstorj.io/common/pb.(*OrderLimit).XXX_Unmarshal:163\n\tgithub.com/gogo/protobuf/proto.Unmarshal:337\n\tstorj.io/common/pb.Unmarshal:9\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV0).ReadOne:95\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:239\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:193\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:389\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:183\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:134\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57](http://tstorj.io/common/storj.NodeIDFromBytes:91\n\tstorj.io/common/storj.(*NodeID).Unmarshal:178\n\tgithub.com/gogo/protobuf/proto.makeUnmarshalCustom.func1:160\n\tgithub.com/gogo/protobuf/proto.(*unmarshalInfo).unmarshal:175\n\tgithub.com/gogo/protobuf/proto.(*InternalMessageInfo).Unmarshal:63\n\tstorj.io/common/pb.(*OrderLimit).XXX_Unmarshal:163\n\tgithub.com/gogo/protobuf/proto.Unmarshal:337\n\tstorj.io/common/pb.Unmarshal:9\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV0).ReadOne:95\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:239\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:193\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:389\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:183\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:134\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57)”}Preformatted text

How much free space do you have now ?

around 400MB at C:/
In prevous version ioften come to situation when i bang on 0 MB. and even then node didnt stop. i remember such case.

Edit: i noticed that it was precided by ERRORs “contact:service ping satelite faild” to that exact satelite 118UWpMCHzs6… dated as far as 10-10-2020. Between, the log showes “picestore donwload” and “downloaded” as normal, looked like normal, so not sure was it 1,16,1 back then. But now i dont get that ping satelite error anymore. Just cant run the node.

Edit2:

this is what it does when i try to start the storagenode service:

2020-11-10T16:15:18.871+0100 INFO Public server started on [::]:28888

2020-11-10T16:15:18.871+0100 INFO Private server started on 127.0.0.1:7778

2020-11-10T16:15:18.871+0100 INFO trust Scheduling next refresh {“after”: “5h28m3.043704751s”}

2020-11-10T16:15:19.841+0100 INFO piecestore download started {“Piece ID”: “TM323O6DLSHWZQAEEEFK26WCEG2V5BPLUDBZETAG3XX5VZUJMH6A”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET”}

2020-11-10T16:15:20.322+0100 ERROR piecestore download failed {“Piece ID”: “TM323O6DLSHWZQAEEEFK26WCEG2V5BPLUDBZETAG3XX5VZUJMH6A”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “GET”, “error”: “trust: rpc: context canceled”, “errorVerbose”: “trust: rpc: context canceled\n[tstorj.io/common/rpc.TCPConnector.DialContext:92\n\tstorj.io/common/rpc.Dialer.dialTransport:146\n\tstorj.io/common/rpc.Dialer.dial:116\n\tstorj.io/common/rpc.Dialer.DialNodeURL:80\n\tstorj.io/storj/storagenode/trust.Dialer.func1:51\n\tstorj.io/storj/storagenode/trust.IdentityResolverFunc.ResolveIdentity:43\n\tstorj.io/storj/storagenode/trust.(*Pool).GetSignee:143\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:134\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:462\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:1004\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51](http://tstorj.io/common/rpc.TCPConnector.DialContext:92\n\tstorj.io/common/rpc.Dialer.dialTransport:146\n\tstorj.io/common/rpc.Dialer.dial:116\n\tstorj.io/common/rpc.Dialer.DialNodeURL:80\n\tstorj.io/storj/storagenode/trust.Dialer.func1:51\n\tstorj.io/storj/storagenode/trust.IdentityResolverFunc.ResolveIdentity:43\n\tstorj.io/storj/storagenode/trust.(*Pool).GetSignee:143\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:134\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:462\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:1004\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51)”}

2020-11-10T16:15:46.857+0100 WARN ordersfilestore Corrupted order detected in orders file {“error”: “ordersfile corrupt entry: node ID error: not enough bytes to make a node id; have 1, need 32”, “errorVerbose”: “ordersfile corrupt entry: node ID error: not enough bytes to make a node id; have 1, need 32\n[tstorj.io/common/storj.NodeIDFromBytes:91\n\tstorj.io/common/storj.(*NodeID).Unmarshal:178\n\tgithub.com/gogo/protobuf/proto.makeUnmarshalCustom.func1:160\n\tgithub.com/gogo/protobuf/proto.(*unmarshalInfo).unmarshal:175\n\tgithub.com/gogo/protobuf/proto.(*InternalMessageInfo).Unmarshal:63\n\tstorj.io/common/pb.(*OrderLimit).XXX_Unmarshal:163\n\tgithub.com/gogo/protobuf/proto.Unmarshal:337\n\tstorj.io/common/pb.Unmarshal:9\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV0).ReadOne:95\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:239\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:193\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:389\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:183\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:134\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57](http://tstorj.io/common/storj.NodeIDFromBytes:91\n\tstorj.io/common/storj.(*NodeID).Unmarshal:178\n\tgithub.com/gogo/protobuf/proto.makeUnmarshalCustom.func1:160\n\tgithub.com/gogo/protobuf/proto.(*unmarshalInfo).unmarshal:175\n\tgithub.com/gogo/protobuf/proto.(*InternalMessageInfo).Unmarshal:63\n\tstorj.io/common/pb.(*OrderLimit).XXX_Unmarshal:163\n\tgithub.com/gogo/protobuf/proto.Unmarshal:337\n\tstorj.io/common/pb.Unmarshal:9\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV0).ReadOne:95\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:239\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:193\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:389\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:183\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:134\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57)”}

and then node shuts down.

i reinstaled newest version from documentation.storj.io/setup/gui-windows/storage-node. its 1.16.1 and still

edit:
Nevermind, i had over 3200 unset orders older than 48h, as old as a month, and i deleted them and now my node is online. so yeah, what caused that? ughhh…

1 Like

Dear all,

Just upgraded to v1.17.4, the problem solved. Saw the recent email from the project team, seems a BUG in v1.16.1 due to corrupted order files.

Cheers

4 Likes