Docker Container restarting every X seconds after 0.15.2 upgrade

Chris21788 · July 17, 2019, 4:08pm

Hello,

I’m currently getting the following in my docker container logs:

2019-07-17T15:54:43.733898500Z 2019-07-17T15:54:43.733Z INFO 2019-07-17T15:54:43.733902000Z 2019-07-17T15:54:43.733Z INFO 2019-07-17T15:54:43.764094100Z 2019-07-17T15:54:43.764Z INFO 2019-07-17T15:54:58.608310000Z 2019-07-17T15:54:58.608Z INFO 2019-07-17T15:55:03.568580900Z 2019-07-17T15:55:03.568Z INFO 2019-07-17T15:55:03.579248000Z 2019-07-17T15:55:03.579Z INFO 2019-07-17T15:55:03.579265700Z 2019-07-17T15:55:03.579Z INFO 2019-07-17T15:55:04.004487200Z 2019-07-17T15:55:04.004Z INFO 2019-07-17T15:55:04.047156500Z 2019-07-17T15:55:04.047Z INFO 2019-07-17T15:55:04.047306600Z 2019-07-17T15:55:04.047Z INFO 2019-07-17T15:55:04.047693300Z 2019-07-17T15:55:04.047Z INFO 2019-07-17T15:55:04.047739700Z 2019-07-17T15:55:04.047Z INFO 2019-07-17T15:55:04.047790900Z 2019-07-17T15:55:04.047Z INFO 2019-07-17T15:55:04.080621800Z 2019-07-17T15:55:04.080Z INFO 2019-07-17T15:55:17.353479300Z 2019-07-17T15:55:17.353Z INFO 2019-07-17T15:55:22.700088800Z 2019-07-17T15:55:22.699Z INFO 2019-07-17T15:55:22.710743100Z 2019-07-17T15:55:22.710Z INFO 2019-07-17T15:55:22.710872800Z 2019-07-17T15:55:22.710Z INFO 2019-07-17T15:55:23.006364000Z 2019-07-17T15:55:23.005Z INFO 2019-07-17T15:55:23.075473900Z 2019-07-17T15:55:23.075Z INFO 2019-07-17T15:55:23.076533500Z 2019-07-17T15:55:23.076Z INFO 2019-07-17T15:55:23.078147400Z 2019-07-17T15:55:23.078Z INFO 2019-07-17T15:55:23.078166700Z 2019-07-17T15:55:23.078Z INFO 2019-07-17T15:55:23.078171500Z 2019-07-17T15:55:23.078Z INFO 2019-07-17T15:55:23.109352200Z 2019-07-17T15:55:23.109Z INFO 2019-07-17T15:55:37.764466200Z 2019-07-17T15:55:37.764Z INFO 2019-07-17T15:55:43.793401000Z 2019-07-17T15:55:43.793Z INFO 2019-07-17T15:55:43.812908300Z 2019-07-17T15:55:43.812Z INFO 2019-07-17T15:55:43.813027300Z 2019-07-17T15:55:43.812Z INFO 2019-07-17T15:55:44.126199300Z 2019-07-17T15:55:44.126Z INFO 2019-07-17T15:55:44.182618400Z 2019-07-17T15:55:44.182Z INFO 2019-07-17T15:55:44.183293000Z 2019-07-17T15:55:44.183Z INFO 2019-07-17T15:55:44.184615100Z 2019-07-17T15:55:44.184Z INFO 2019-07-17T15:55:44.184638900Z 2019-07-17T15:55:44.184Z INFO 2019-07-17T15:55:44.184645200Z 2019-07-17T15:55:44.184Z INFO 2019-07-17T15:55:44.217831200Z 2019-07-17T15:55:44.217Z INFO 2019-07-17T15:56:00.573527400Z 2019-07-17T15:56:00.573Z INFO 2019-07-17T15:56:05.375435000Z 2019-07-17T15:56:05.375Z INFO 2019-07-17T15:56:05.399833500Z 2019-07-17T15:56:05.399Z INFO 2019-07-17T15:56:05.399886300Z 2019-07-17T15:56:05.399Z INFO 2019-07-17T15:56:05.694238500Z 2019-07-17T15:56:05.694Z INFO 2019-07-17T15:56:05.738579900Z 2019-07-17T15:56:05.737Z INFO 2019-07-17T15:56:05.738607100Z 2019-07-17T15:56:05.738Z INFO 2019-07-17T15:56:05.751971200Z 2019-07-17T15:56:05.751Z INFO 2019-07-17T15:56:05.751989900Z 2019-07-17T15:56:05.751Z INFO 2019-07-17T15:56:05.751993300Z 2019-07-17T15:56:05.751Z INFO 2019-07-17T15:56:05.770000500Z 2019-07-17T15:56:05.769Z INFO 2019-07-17T15:56:21.284246900Z 2019-07-17T15:56:21.284Z INFO 2019-07-17T15:56:29.032344300Z 2019-07-17T15:56:29.032Z INFO 2019-07-17T15:56:29.045259000Z 2019-07-17T15:56:29.045Z INFO 2019-07-17T15:56:29.045350600Z 2019-07-17T15:56:29.045Z INFO 2019-07-17T15:56:29.415674700Z 2019-07-17T15:56:29.415Z INFO 2019-07-17T15:56:29.577496500Z 2019-07-17T15:56:29.569Z INFO 2019-07-17T15:56:29.594719100Z 2019-07-17T15:56:29.594Z INFO 2019-07-17T15:56:29.605761200Z 2019-07-17T15:56:29.605Z INFO 2019-07-17T15:56:29.605793700Z 2019-07-17T15:56:29.605Z INFO 2019-07-17T15:56:29.605802500Z 2019-07-17T15:56:29.605Z INFO 2019-07-17T15:56:29.649829300Z 2019-07-17T15:56:29.649Z INFO Public server started on [::]:28967
Private server started on 127.0.0.1:7778
running on version v0.15.2
piecestore:monitor Remaining Bandwidth {“bytes”: 96429487211008}
Configuration loaded from: /app/config/config.yaml
Operator email: MYEMAIL@gmail.com
Operator wallet: WALLETADDRESS
running on version v0.15.2
db.migration Latest Version {“version”: 13}
vouchers Checking vouchers
Node MYNODEID started
Public server started on [::]:28967
Private server started on 127.0.0.1:7778
running on version v0.15.2
piecestore:monitor Remaining Bandwidth {“bytes”: 96429487211008}
Configuration loaded from: /app/config/config.yaml
Operator email: MYEMAIL@gmail.com
Operator wallet: WALLETADDRESS
running on version v0.15.2
db.migration Latest Version {“version”: 13}
vouchers Checking vouchers
Node MYNODEID started
Public server started on [::]:28967
Private server started on 127.0.0.1:7778
running on version v0.15.2
piecestore:monitor Remaining Bandwidth {“bytes”: 96429487211008}
Configuration loaded from: /app/config/config.yaml
Operator email: MYEMAIL@gmail.com
Operator wallet: WALLETADDRESS
running on version v0.15.2
db.migration Latest Version {“version”: 13}
vouchers Checking vouchers
Node MYNODEID started
Public server started on [::]:28967
Private server started on 127.0.0.1:7778
running on version v0.15.2
piecestore:monitor Remaining Bandwidth {“bytes”: 96429487211008}
Configuration loaded from: /app/config/config.yaml
Operator email: MYEMAIL@gmail.com
Operator wallet: WALLETADDRESS
running on version v0.15.2
db.migration Latest Version {“version”: 13}
vouchers Checking vouchers
Node MYNODEID started
Public server started on [::]:28967
Private server started on 127.0.0.1:7778
running on version v0.15.2
piecestore:monitor Remaining Bandwidth {“bytes”: 96429487211008}
Configuration loaded from: /app/config/config.yaml
Operator email: MYEMAIL@gmail.com
Operator wallet: WALLETADDRESS
running on version v0.15.2
db.migration Latest Version {“version”: 13}
vouchers Checking vouchers
Node MYNODEID started
Public server started on [::]:28967
Private server started on 127.0.0.1:7778
running on version v0.15.2

Also am seeing this pretty frequently:

2019-07-17T15:52:01.810385700Z ERROR: 2019/07/17 15:52:01 pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing

Essentially, I’m launching my dashboard, it runs for 3-5 seconds, then back to the PowerShell prompt. When this happens, I don’t see an error in the logs, with the above one occasionally coming up. Any ideas?

I’ve moved my info.db out of my Data directory, and it has recreated one. Now my node seems stable.

I’m running Windows Server 2016 Datacenter

Chris21788 · July 17, 2019, 4:11pm

As I’ve deleted my info.db (as it was 5GB+), is there any issue with doing so? I see my bandwidth reset, but as a network wipe has occurred, this should be one of the rare cases where that is ok, right?

Chris21788 · July 17, 2019, 4:24pm

Been up for 20 minutes, and haven’t had the “pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing” failure at all. I’ll report back if I get it, but I’m guessing it’s because of the info.db being corrupt.

vedalken254 · July 17, 2019, 4:37pm

I assume you meant you moved the original info.db out rather than flat out deleting it, yes? If you’ve deleted it, I believe that could have a lasting impact for your node regardless of the data wipe.

BrightSilence · July 17, 2019, 4:47pm

In the future, ask before acting. I can’t tell you if there is any lasting effect. Even just moving it and rebuilding one can make it near impossible to merge databases again. So I advise taking a little more caution in these scenarios.

Chris21788 · July 17, 2019, 5:00pm

Yes, I meant moved. I still have the old one, but my node is acting normally now. Who would be able to tell if I’m in a position of problems in the future?

Chris21788 · July 17, 2019, 7:58pm

Well that didn’t resolve it. 3.5 hours later, the container restarted.

BrightSilence · July 17, 2019, 8:35pm

Can you give us some info on what kind of hardware you are using?

Alexey · July 17, 2019, 8:52pm

Please, update to the latest version and copy your new logs docker logs --tail 20 storagenode

Chris21788 · July 17, 2019, 9:38pm

I was already on 0.15.3, but ran my upgrade script to ensure I was on the latest. Here’s the log snippit after the node came back online: https://drive.google.com/file/d/1rP-_x2QZWS74OAbVRZy4nWi7pehUXlOV/view?usp=sharing

As for system specs:
Windows Server 2016 DataCenter
MSI Z170A Gaming M5
Intel i7-6700k
Drives are connected through an LSI HBA card (16e)
Storj Storage is using StableBit DrivePool disks (2x4TB).

Running other storage from server through HBA (Roughly 80TB RAW), with multiple SSD Cache and NVME OS disk.

Not sure what else I can provide. Thanks for investigating

Alexey · July 17, 2019, 9:41pm

Looks like your problem is solved with 0.15.3

Alexey · July 17, 2019, 11:21pm

2 posts were merged into an existing topic: Error Codes: What they mean and Severity Level [READ FIRST]

Alexey · July 17, 2019, 11:22pm

2 posts were merged into an existing topic: Error Codes: What they mean and Severity Level [READ FIRST]