0.15.3 arm keeps restarting

2019-07-22T12:09:31.243Z        DEBUG   kademlia:endpoint       Successfully connected with storj.tk.gg:28967
2019-07-22T12:09:34.083Z        INFO    piecestore:monitor      Remaining Bandwidth     {"bytes": 19499148417024}
2019-07-22T12:09:34.083Z        WARN    piecestore:monitor      Disk space is less than requested. Allocating space    {"bytes": 932236040960}
2019-07-22T12:09:48.073Z        DEBUG   kademlia:endpoint       connection to node failed       {"error": "node error: transport error: connection error: desc = \"transport: authentication handshake failed: tls peer certificate verification error: not signed by any CA in the whitelist: CA cert\"", "errorVerbose": "node error: transport error: connection error: desc = \"transport: authentication handshake failed: tls peer certificate verification error: not signed by any CA in the whitelist: CA cert\"\n\tstorj.io/storj/pkg/transport.(*Transport).DialNode:108\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).dialNode:175\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).PingNode:95\n\tstorj.io/storj/pkg/kademlia.(*Kademlia).Ping:253\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).pingback:57\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).Query:42\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler.func1:535\n\tstorj.io/storj/pkg/server.logOnErrorUnaryInterceptor:39\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler:537\n\tgoogle.golang.org/grpc.(*Server).processUnaryRPC:998\n\tgoogle.golang.org/grpc.(*Server).handleStream:1278\n\tgoogle.golang.org/grpc.(*Server).serveStreams.func1.1:717", "nodeID": "1QzDKGHDeyuRxbvZhcwHU3syxTYtU1jHy5duAKuPxja3XC8ttk"}
2019-07-22T12:10:01.737Z        INFO    Got a signal from the OS: "terminated"
2019-07-22T12:10:06.088Z        DEBUG   kademlia:endpoint       connection to node failed       {"error": "node error: transport error: context deadline exceeded", "errorVerbose": "node error: transport error: context deadline exceeded\n\tstorj.io/storj/pkg/transport.(*Transport).DialNode:108\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).dialNode:175\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).PingNode:95\n\tstorj.io/storj/pkg/kademlia.(*Kademlia).Ping:253\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).pingback:57\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).Query:42\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler.func1:535\n\tstorj.io/storj/pkg/server.logOnErrorUnaryInterceptor:39\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler:537\n\tgoogle.golang.org/grpc.(*Server).processUnaryRPC:998\n\tgoogle.golang.org/grpc.(*Server).hand

Im also getting
2019-07-22T12:11:40.413Z INFO vouchers Voucher request denied. Vetting process not yet complete

has my certificate been revoked?

Is this a new node, or if it was previously working how long has your node been running?

its been running for about two months

Okay. I would suggest you try removing the container and creating it again. Before you recreate the container please check the latest setup guide to make sure your run command includes the latest recommended commands. After you recreate the node, if you are still having issues, please post the complete log from the beginning showing the startup sequence.

I don’t see any updates to the docker container startup options. I run on arm, so I’ve already added the explicit connection specifier storage2.max-concurrent-requests: 7. When I can catch the crash debug output b4 the container restarts, it goes like:
2019-07-22T16:02:08.888Z INFO running on version v0.15.3
2019-07-22T16:02:08.892Z DEBUG Initialized telemetry batcher with id = “1JQ4fDfUDViEjUprEenGjEwXLNgsc7W8aVdfJgQzbrtTvrTmgq”
2019-07-22T16:02:08.898Z INFO db.migration Latest Version {“version”: 13}
2019-07-22T16:02:08.900Z INFO vouchers Checking vouchers
2019-07-22T16:02:08.901Z INFO Node 1JQ4fDfUDViEjUprEenGjEwXLNgsc7W8aVdfJgQzbrtTvrTmgq started
2019-07-22T16:02:08.901Z INFO Public server started on [::]:28967
2019-07-22T16:02:08.901Z INFO Private server started on 127.0.0.1:7778
2019-07-22T16:02:08.904Z DEBUG piecestore:orderssender sending
2019-07-22T16:02:08.905Z INFO vouchers Requesting voucher {“satellite”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2019-07-22T16:02:08.909Z DEBUG piecestore:orderssender no orders to send
2019-07-22T16:02:09.007Z DEBUG allowed minimum version from control server is: v0.15.1
2019-07-22T16:02:09.007Z INFO running on version v0.15.3
2019-07-22T16:02:09.146Z INFO piecestore:monitor Remaining Bandwidth {“bytes”: 19499065291008}
2019-07-22T16:02:09.146Z WARN piecestore:monitor Disk space is less than requested. Allocating space {“bytes”: 932234794752}
2019-07-22T16:02:09.216Z DEBUG kademlia:endpoint connection to node failed {“error”: “node error: transport error: connection error: desc = “transport: authentication handshake failed: tls peer certificate verification error: not signed by any CA in the whitelist: CA cert””, “errorVerbose”: “node error: transport error: connection error: desc = “transport: authentication handshake failed: tls peer certificate verification error: not signed by any CA in the whitelist: CA cert”\n\tstorj.io/storj/pkg/transport.(*Transport).DialNode:108\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).dialNode:175\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).PingNode:95\n\tstorj.io/storj/pkg/kademlia.(*Kademlia).Ping:253\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).pingback:57\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).Query:42\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler.func1:535\n\tstorj.io/storj/pkg/server.logOnErrorUnaryInterceptor:39\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler:537\n\tgoogle.golang.org/grpc.(*Server).processUnaryRPC:998\n\tgoogle.golang.org/grpc.(*Server).handleStream:1278\n\tgoogle.golang.org/grpc.(*Server).serveStreams.func1.1:717”, “nodeID”: “1QzDKGHDeyuRxbvZhcwHU3syxTYtU1jHy5duAKuPxja3XC8ttk”}
2019-07-22T16:02:09.319Z DEBUG kademlia:endpoint Successfully connected with 163.172.98.61:28967
2019-07-22T16:02:13.355Z INFO Got a signal from the OS: “terminated”
2019-07-22T16:02:13.359Z ERROR vouchers Error requesting voucher {“satellite”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “error”: “voucher: unable to find satellite on the network: context canceled”, “errorVerbose”: “voucher: unable to find satellite on the network: context canceled\n\tstorj.io/storj/storagenode/vouchers.(*Service).request:127\n\tstorj.io/storj/storagenode/vouchers.(*Service).Request:116\n\tstorj.io/storj/storagenode/vouchers.(*Service).RunOnce.func1:103\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

Okay. Try stopping the node
docker stop -t 300 storagenode

Then rename the kademlia file (located in your specified storage location) to kademlia-old and then start the node again.
docker start storagenode

This will force the node to rebuild the kademlia routing table. Your neighborhood size start out small and increase as the table is rebuilt.

If this doesn’t work, make sure you don’t have any overly restrictive outbound firewall rules. The devs recently changed the way that the nodes reach the network during the initial contact, so perhaps there is a problem there. If neither of these options help, I am out of ideas and we will have to wait for a member of the dev team to chime in.

I start it up with:
docker run -d --restart unless-stopped -p 28967:28967
-e WALLET="-----"
-e EMAIL="----"
-e ADDRESS="----:28967"
-e BANDWIDTH=“20TB”
-e STORAGE=“2TB”
–mount type=bind,source="/mnt/storagenode",destination=/app/identity
–mount type=bind,source="/mnt/storagenode_appConfig",destination=/app/config
–name storagenode storjlabs/storagenode:arm

I already did the kademila rm/move trick. I’m still getting the same

Is there a reason you called the container “storagenode2”? Are you sure there aren’t two containers running the node software?

no, there arent 2. just my documentation

i guess ill enable connection debugging as well. can you please put in your documentation in plain text what changes outbound connections using ephemeral port-ranges have had? Furthermore, do you leave it to the OS running the container to choose which ephemeral port ranges are used, or is that locked in the app?

They didn’t change any ports so far as I know. Most firewalls by default allow all outgoing traffic. I did not have to set up any outgoing rules on my system (Debian 9 - arm). I think the port is locked for the connection to the Sats, as they listen on a specific port. From the recent change log:

Find trusted satellites without kademlia Every upload request must be signed by a trusted satellite. To verify the signature the storage node needs the satellite identity. Open a direct connection to the satellite instead of searching it in the kademlia network. This will fix GetSignee Trust Error

I’m not sure if this change caused your problems or not. The errors you see lead me to believe that the node cannot make the initial connection to a satellite.

If your node doesn’t have all (non-prescribed) outgoing ports open, you node will probably not be able to make connections to customers since those could happen on any port. Maybe @Alexey can take a look at these logs, as I am not sure how to proceed with troubleshooting from this point.

edit: typo

The inability to connect to satellites popped up last week after an update. Before that, I didn’t have the issue.

Now, the issue has fixed itself and the node is up and running, receiving data. Apparently after trying to connect 1100x times…

2019-07-22T17:25:20.881Z DEBUG kademlia:endpoint connection to node failed {“error”: “node error: transport error: connection error: desc = “transport: error while dialing: dial tcp: address martinbase.duckdns.org: missing port in address””, “errorVerbose”: “node error: transport error: connection error: desc = “transport: error while dialing: dial tcp: address martinbase.duckdns.org: missing port in address”\n\tstorj.io/storj/pkg/transport.(*Transport).DialNode:108\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).dialNode:175\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).PingNode:95\n\tstorj.io/storj/pkg/kademlia.(*Kademlia).Ping:253\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).pingback:57\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).Query:42\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler.func1:535\n\tstorj.io/storj/pkg/server.logOnErrorUnaryInterceptor:39\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler:537\n\tgoogle.golang.org/grpc.(*Server).processUnaryRPC:998\n\tgoogle.golang.org/grpc.(*Server).handleStream:1278\n\tgoogle.golang.org/grpc.(*Server).serveStreams.func1.1:717”, “nodeID”: “11MueRU1FRSkMXCUXcHAW8YAV3iE1W8kmfSuBLCkExTEuXEhe8”}
2019-07-22T17:25:29.895Z DEBUG kademlia:endpoint connection to node failed {“error”: “node error: transport error: context deadline exceeded”, “errorVerbose”: “node error: transport error: context deadline exceeded\n\tstorj.io/storj/pkg/transport.(*Transport).DialNode:108\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).dialNode:175\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).PingNode:95\n\tstorj.io/storj/pkg/kademlia.(*Kademlia).Ping:253\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).pingback:57\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).Query:42\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler.func1:535\n\tstorj.io/storj/pkg/server.logOnErrorUnaryInterceptor:39\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler:537\n\tgoogle.golang.org/grpc.(*Server).processUnaryRPC:998\n\tgoogle.golang.org/grpc.(*Server).handleStream:1278\n\tgoogle.golang.org/grpc.(*Server).serveStreams.func1.1:717”, “nodeID”: “1JWKAstYbmFsdajWcQSvAhr6WBVURpgyD61rCMM7zjUE6PmPoQ”}
2019-07-22T17:25:35.456Z DEBUG kademlia connecting to node failed {“target”: “1GCzFjYg3xqYSHYvhf26op2CTDSWCpqHgV3YXLqR1cyStfQgKw”, “dial-node”: “1GkrWQnkjQRqCM4bcHSrHGTLu4aRZgQcGnXzmLWhR12LbhYCxN”, “dial-address”: “storj.neel.ch:28967”, “error”: “rpc error: code = Unavailable desc = transport is closing”}
2019-07-22T17:25:40.711Z DEBUG kademlia:endpoint Successfully connected with 92.255.183.53:4033 1100x times
2019-07-22T17:26:18.464Z DEBUG kademlia connecting to node failed {“target”: “1KSBN8SGrrWdEF4jDyk5tnqiypkjnfC9gXeXU2ciaarcNgCdQN”, “dial-node”: “1KnPinNLGpMaLCns5NWmR3upifiGt2NiAGAz2Z1yueH9tkBRac”, “dial-address”: “35.228.94.135:28967”, “error”: “transport error: context deadline exceeded”, “errorVerbose”: “transport error: context deadline exceeded\n\tstorj.io/storj/pkg/transport.(*Transport).DialNode:108\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).dialNode:175\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).Lookup:71\n\tstorj.io/storj/pkg/kademlia.(*peerDiscovery).Run.func1:88”}
2019-07-22T17:26:19.043Z DEBUG kademlia:endpoint connection to node failed {“error”: “node error: transport error: context deadline exceeded”, “errorVerbose”: “node error: transport error: context deadline exceeded\n\tstorj.io/storj/pkg/transport.(*Transport).DialNode:108\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).dialNode:175\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).PingNode:95\n\tstorj.io/storj/pkg/kademlia.(*Kademlia).Ping:253\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).pingback:57\n\tstorj.io/storj/pkg/kademlia.(*Endpoint).Query:42\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler.func1:535\n\tstorj.io/storj/pkg/server.logOnErrorUnaryInterceptor:39\n\tstorj.io/storj/pkg/pb._Nodes_Query_Handler:537\n\tgoogle.golang.org/grpc.(*Server).processUnaryRPC:998\n\tgoogle.golang.org/grpc.(*Server).handleStream:1278\n\tgoogle.golang.org/grpc.(*Server).serveStreams.func1.1:717”, “nodeID”: “1Lg44DBVr7ezX8GHN3KfphZAN25RjKWdY2Sbyptswd3HSxau3h”}
2019-07-22T17:26:21.942Z DEBUG kademlia connecting to node failed {“target”: “1KSBN8SGrrWdEF4jDyk5tnqiypkjnfC9gXeXU2ciaarcNgCdQN”, “dial-node”: “1L5GX8rnbm8z3kp9sgLrxEKNEoUd1RDF5YsU58zK6Go3d86qqm”, “dial-address”: “123.184.42.219:28967”, “error”: “rpc error: code = Unavailable desc = transport is closing”}
2019-07-22T17:26:34.391Z DEBUG kademlia connecting to node failed {“target”: “1Qy6n9P8bXJ59YRjSby31gnZsVMSdNAZtdEc4Xtz243vJicDZ7”, “dial-node”: “1QzDKGHDeyuRxbvZhcwHU3syxTYtU1jHy5duAKuPxja3XC8ttk”, “dial-address”: “51.15.216.154:28967”, “error”: “transport error: connection error: desc = “transport: authentication handshake failed: tls peer certificate verification error: not signed by any CA in the whitelist: CA cert””, “errorVerbose”: “transport error: connection error: desc = “transport: authentication handshake failed: tls peer certificate verification error: not signed by any CA in the whitelist: CA cert”\n\tstorj.io/storj/pkg/transport.(*Transport).DialNode:108\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).dialNode:175\n\tstorj.io/storj/pkg/kademlia/kademliaclient.(*Dialer).Lookup:71\n\tstorj.io/storj/pkg/kademlia.(*peerDiscovery).Run.func1:88”}
2019-07-22T17:26:54.255Z DEBUG kademlia connecting to node failed {“target”: “1Qy6n9P8bXJ59YRjSby31gnZsVMSdNAZtdEc4Xtz243vJicDZ7”, “dial-node”: “1QVxXcgVUKu8xKc6gtb9YAFsgo6mjQkLS3FqoAjsU9iPkPTugs”, “dial-address”: “dev.sprinkle.cloud:28967”, “error”: “rpc error: code = Unavailable desc = transport is closing”}
2019-07-22T17:27:11.141Z INFO piecestore upload started {“Piece ID”: “HUKV3WUJKWRAP2U5ZPLO575433GIIMVEVED4M6YINCL47JRUC3JA”, “SatelliteID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”}
2019-07-22T17:27:15.532Z ERROR untrusted: trust:: context canceled
storj.io/storj/storagenode/trust.(*Pool).FetchPeerIdentity:108
storj.io/storj/storagenode/trust.(*Pool).GetSignee:89
storj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:127
storj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimit:59
storj.io/storj/storagenode/piecestore.(*Endpoint).Upload:176
storj.io/storj/pkg/pb._Piecestore_Upload_Handler:701
storj.io/storj/pkg/server.logOnErrorStreamInterceptor:23
google.golang.org/grpc.(*Server).processStreamingRPC:1209
google.golang.org/grpc.(*Server).handleStream:1282
google.golang.org/grpc.(*Server).serveStreams.func1.1:717
2019-07-22T17:27:15.971Z INFO piecestore upload started {"P

So the question is, how long does it have before it gets into a bootloop. The behavior is extremely sporadic.

here’s the full log of the one container in 60 restarts, which remained online, kept on logging the same debug cannot connects for 40 minutes, did that for a 1100x times, then decided to go with an untrusted connection and now im getting data???

https://pastebin.com/raw/rrEwsRTc

Why am I receiving data, given these two are present??

2019-07-22T16:59:41.964Z        INFO    vouchers        Voucher request denied. Vetting process not yet complete


2019-07-22T17:27:15.532Z        ERROR   untrusted: trust:: context canceled

Very strange. Do you see any Put or Get messages in the logs now? Maybe your node has been DQ’d? @stefanbenten or @Knowledge Any ideas on this one?