Node suspended - need help

I have tried restarting containers, checking docker logs, but without knowing what to look for I’m not sure how to analyze my problem. I suspect a borked container update. This node is several years old has has been running without a hiccup. I had actually just scored some new drives for free to try and expand the array and storage pool.

It is a Dell PowerEdge tower chassis with a Xeon, 16GB RAM, redundant power, and a large hardware RAID5 for the storage pool. Ubuntu Server 22.04. As far as I know, nothing has changed on the system. I do have unattended upgrades on for security updates.

If someone can tell me what logs to gather etc I will be happy to provide in detail. I had sort of hoped it would resolve itself by now. Also, the node container seems to be pegging out 2 CPU cores nonstop.

I’m also very busy and am pressed for time to fix it so sorry if this has already been covered. Kind of a noob to Docker too so bear with me.

sudo docker logs storagenode --tail 50

output

Summary
farmer@farm1:~$ sudo docker logs storagenode --tail 50
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00006e790 sp=0xc00006e770 pc=0x43be36
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:369
runtime.bgsweep(0x0?)
        /usr/local/go/src/runtime/mgcsweep.go:278 +0x8e fp=0xc00006e7c8 sp=0xc00006e790 pc=0x4264ee
runtime.gcenable.func1()
        /usr/local/go/src/runtime/mgc.go:178 +0x26 fp=0xc00006e7e0 sp=0xc00006e7c8 pc=0x41b1a6
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x46d421
created by runtime.gcenable
        /usr/local/go/src/runtime/mgc.go:178 +0x6b

goroutine 18 [GC scavenge wait]:
runtime.gopark(0xc000092000?, 0x1474ca0?, 0x1?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00006ef70 sp=0xc00006ef50 pc=0x43be36
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:369
runtime.(*scavengerState).park(0x1b99480)
        /usr/local/go/src/runtime/mgcscavenge.go:389 +0x53 fp=0xc00006efa0 sp=0xc00006ef70 pc=0x424593
runtime.bgscavenge(0x0?)
        /usr/local/go/src/runtime/mgcscavenge.go:617 +0x45 fp=0xc00006efc8 sp=0xc00006efa0 pc=0x424b65
runtime.gcenable.func2()
        /usr/local/go/src/runtime/mgc.go:179 +0x26 fp=0xc00006efe0 sp=0xc00006efc8 pc=0x41b146
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x46d421
created by runtime.gcenable
        /usr/local/go/src/runtime/mgc.go:179 +0xaa

goroutine 33 [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xc000072770?)
        /usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000072628 sp=0xc000072608 pc=0x43be36
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:369
runtime.runfinq()
        /usr/local/go/src/runtime/mfinal.go:180 +0x10f fp=0xc0000727e0 sp=0xc000072628 pc=0x41a2af
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x46d421
created by runtime.createfing
        /usr/local/go/src/runtime/mfinal.go:157 +0x45
2023-03-19 18:33:13,077 INFO exited: storagenode (exit status 2; not expected)
2023-03-19T18:33:13.262Z        INFO    Command output. {"Process": "storagenode-updater", "Output": "runtime: newstack sp=0xc0000737d0 stack=[0xc000073800, 0xc000073800]\n\tmorebuf={pc:0x46d421 sp:0xc0000737e0 lr:0x0}\n\tsched={pc:0x586455 sp:0xc0000737d8 lr:0x0 ctxt:0xc000114f10}\nruntime: gp=0xc000007380, goid=20, gp->status=0x2\n runtime: split stack overflow: 0xc0000737d0 < 0xc000073800\nfatal error: runtime: split stack overflow\n\nruntime stack:\nruntime.throw({0x11319cd?, 0x44650c?})\n\t/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0xc000083e10 sp=0xc000083de0 pc=0x43921d\nruntime.newstack()\n\t/usr/local/go/src/runtime/stack.go:1047 +0x765 fp=0xc000083fc8 sp=0xc000083e10 pc=0x453805\nruntime.morestack()\n\t/usr/local/go/src/runtime/asm_amd64.s:570 +0x8b fp=0xc000083fd0 sp=0xc000083fc8 pc=0x46b34b\n\ngoroutine 20 [running]:\ntext/template/parse.lex.func1()\n\t/usr/local/go/src/text/template/parse/lex.go:233 +0x35 fp=0xc0000737e0 sp=0xc0000737d8 pc=0x586455\nruntime.goexit()\n\t/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000737e8 sp=0xc0000737e0 pc=0x46d421\ncreated by text/template/parse.lex\n\t/usr/local/go/src/text/template/parse/lex.go:233 +0x1dd\n\ngoroutine 1 [chan receive, locked to thread]:\nruntime.gopark(0xc0000b00b8?, 0xc0000b4120?, 0xb8?, 0xe5?, 0xc000159170?)\n\t/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000159100 sp=0xc0001590e0 pc=0x43be36\nruntime.chanrecv(0xc00011aa20, 0xc000159430, 0x1)\n\t/usr/local/go/src/runtime/chan.go:583 +0x49b fp=0xc000159190 sp=0xc000159100 pc=0x40807b\nruntime.chanrecv1(0x1021380?, 0x0?)\n\t/usr/local/go/src/runtime/chan.go:442 +0x18 fp=0xc0001591b8 sp=0xc000159190 pc=0x407b78\ntext/template/parse.(*lexer).nextItem(...)\n\t/usr/local/go/src/text/template/parse/lex.go:203\ntext/template/parse.(*Tree).peek(...)\n\t/usr/local/go/src/text/template/parse/parse.go:106\ntext/template/parse.(*Tree).parse(0xc00017e5a0)\n\t/usr/local/go/src/text/template/parse/parse.go:299 +0xe5 fp=0xc000159468 sp=0xc0001591b8 pc=0x591b45\ntext/template/parse.(*Tree).Parse(0xc00017e5a0, {0x1160419, 0x3a4}, {0x0, 0x0}, {0x0, 0x0}, 0xc0001aefc0, {0xc000114f00, 0x2, ...})\n\t/usr/local/go/src/text/template/parse/parse.go:250 +0x51c fp=0xc000159568 sp=0xc000159468 pc=0x5914dc\ntext/template/parse.Parse({0x11162ef, 0x4}, {0x1160419, 0x3a4}, {0x0, 0x0}, {0x0, 0x0}, {0xc000114f00, 0x2, ...})\n\t/usr/local/go/src/text/template/parse/parse.go:66 +0x130 fp=0xc0001595d8 sp=0xc000159568 pc=0x590050\ntext/template.(*Template).Parse(0xc000137340, {0x1160419, 0x3a4})\n\t/usr/local/go/src/text/template/template.go:210 +0x7ca fp=0xc0001596c8 sp=0xc0001595d8 pc=0x5b138a\n\ngoroutine 2 [force gc (idle)]:\nruntime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)\n\t/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000072fb0 sp=0xc000072f90 pc=0x43be36\nruntime.goparkunlock(...)\n\t/usr/local/go/src/runtime/proc.go:369\nruntime.forcegchelper()\n\t/usr/local/go/src/runtime/proc.go:302 +0xad fp=0xc000072fe0 sp=0xc000072fb0 pc=0x43bccd\nruntime.goexit()\n\t/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc000072fe8 sp=0xc000072fe0 pc=0x46d421\ncreated by runtime.init.6\n\t/usr/local/go/src/runtime/proc.go:290 +0x25\n\ngoroutine 3 [GC sweep wait]:\nruntime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)\n\t/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000073790 sp=0xc000073770 pc=0x43be36\nruntime.goparkunlock(...)\n\t/usr/local/go/src/runtime/proc.go:369\nruntime.bgsweep(0x0?)\n\t/usr/local/go/src/runtime/mgcsweep.go:278 +0x8e fp=0xc0000737c8 sp=0xc000073790 pc=0x4264ee\nruntime.gcenable.func1()\n\t/usr/local/go/src/runtime/mgc.go:178 +0x26 fp=0xc0000737e0 sp=0xc0000737c8 pc=0x41b1a6\nruntime.goexit()\n\t/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000737e8 sp=0xc0000737e0 pc=0x46d421\ncreated by runtime.gcenable\n\t/usr/local/go/src/runtime/mgc.go:178 +0x6b\n\ngoroutine 17 [GC scavenge wait]:\nruntime.gopark(0xc000092000?, 0x1474ca0?, 0x1?, 0x0?, 0x0?)\n\t/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc00006e770 sp=0xc00006e750 pc=0x43be36\nruntime.goparkunlock(...)\n\t/usr/local/go/src/runtime/proc.go:369\nruntime.(*scavengerState).park(0x1b99480)\n\t/usr/local/go/src/runtime/mgcscavenge.go:389 +0x53 fp=0xc00006e7a0 sp=0xc00006e770 pc=0x424593\nruntime.bgscavenge(0x0?)\n\t/usr/local/go/src/runtime/mgcscavenge.go:617 +0x45 fp=0xc00006e7c8 sp=0xc00006e7a0 pc=0x424b65\nruntime.gcenable.func2()\n\t/usr/local/go/src/runtime/mgc.go:179 +0x26 fp=0xc00006e7e0 sp=0xc00006e7c8 pc=0x41b146\nruntime.goexit()\n\t/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x46d421\ncreated by runtime.gcenable\n\t/usr/local/go/src/runtime/mgc.go:179 +0xaa\n\ngoroutine 18 [finalizer wait]:\nruntime.gopark(0x0?, 0xc000072670?, 0x4b?, 0x86?, 0xc000072770?)\n\t/usr/local/go/src/runtime/proc.go:363 +0xd6 fp=0xc000072628 sp=0xc000072608 pc=0x43be36\nruntime.goparkunlock(...)\n\t/usr/local/go/src/runtime/proc.go:369\nruntime.runfinq()\n\t/usr/local/go/src/runtime/mfinal.go:180 +0x10f fp=0xc0000727e0 sp=0xc000072628 pc=0x41a2af\nruntime.goexit()\n\t/usr/local/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x46d421\ncreated by runtime.createfing\n\t/usr/local/go/src/runtime/mfinal.go:157 +0x45\n"}
2023-03-19T18:33:13.262Z        ERROR   Error updating service. {"Process": "storagenode-updater", "Service": "storagenode", "error": "exit status 2", "errorVerbose": "exit status 2\n\tmain.update:19\n\tmain.loopFunc:27\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tmain.cmdRun:136\n\tstorj.io/private/process.cleanup.func1.4:377\n\tstorj.io/private/process.cleanup.func1:395\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:92\n\tmain.main:20\n\truntime.main:250"}
2023-03-19T18:33:13.278Z        INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.73.4"}
2023-03-19T18:33:13.278Z        INFO    New version is being rolled out but hasn't made it to this node yet     {"Process": "storagenode-updater", "Service": "storagenode-updater"}
2023-03-19 18:33:14,280 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-03-19 18:33:14,283 INFO spawned: 'storagenode' with pid 43
2023-03-19 18:33:14,284 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-03-19 18:33:15,285 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-03-19T18:48:13.034Z        INFO    Downloading versions.   {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}

sudo docker logs watchtower --tail 50

output

Summary
farmer@farm1:~$ sudo docker logs watchtower --tail 50
time="2022-07-05T10:26:48Z" level=info msg="Creating /storagenode"
time="2022-07-15T22:14:29Z" level=info msg="Waiting for running update to be finished..."
time="2022-07-15T22:15:22Z" level=info msg="Waiting for running update to be finished..."
time="2022-07-17T14:01:20Z" level=info msg="Found new storjlabs/storagenode:latest image (sha256:e70bf82ab67eca98bff27796ddd4befe7d9872fd172dff6cbd04f8e506d9b4c4)"
time="2022-07-17T14:01:21Z" level=info msg="Stopping /storagenode (76b981e86d8f04a25a5596afa7c39103c1d1c0e8e51eb60cd361b19fa14ac3be) with SIGTERM"
time="2022-07-17T14:01:24Z" level=info msg="Creating /storagenode"
time="2022-07-30T20:07:51Z" level=info msg="Found new storjlabs/storagenode:latest image (sha256:892135ee7acb00211330aba7b6dd070509ca0c754ae4a2d8b0c55a5e6f238453)"
time="2022-07-30T20:07:52Z" level=info msg="Stopping /storagenode (b53c92de8b02402c6b51fd3f0d93005c5bd77ac670e5f0ec76e2ea753d6ca062) with SIGTERM"
time="2022-07-30T20:07:56Z" level=info msg="Creating /storagenode"
time="2022-08-08T17:41:00Z" level=info msg="Waiting for running update to be finished..."
time="2022-08-08T18:05:05Z" level=info msg="Waiting for running update to be finished..."
time="2022-08-09T19:39:41Z" level=info msg="Found new storjlabs/storagenode:latest image (sha256:bbf9d146dbd16152fe779391a78cc57945dc94f692befef4fd1f7556ecf45da8)"
time="2022-08-09T19:39:42Z" level=info msg="Stopping /storagenode (0eb34e635208dc9df50fd91ac182c2e6c8007947ede05993a05cdd84506b99a2) with SIGTERM"
time="2022-08-09T19:39:44Z" level=info msg="Creating /storagenode"
time="2022-08-14T18:24:48Z" level=info msg="Waiting for running update to be finished..."
time="2022-08-19T22:52:04Z" level=info msg="Found new storjlabs/storagenode:latest image (sha256:4c62d7559e9d83ba2d2f8dd2cb90c1d11bf92e8d0bd06095ccda7e639c9a8a74)"
time="2022-08-19T22:52:05Z" level=info msg="Stopping /storagenode (991e7af42d1f1a0d3866c0a8982176ce66967003bc92c27594e00fe7e0e991c2) with SIGTERM"
time="2022-08-19T22:52:10Z" level=info msg="Creating /storagenode"
time="2022-09-01T21:30:54Z" level=info msg="Found new storjlabs/storagenode:latest image (sha256:551282d6da5fc0cac219acd05c7ea86b0b07742df1058def236e6e372b53448d)"
time="2022-09-01T21:30:55Z" level=info msg="Stopping /storagenode (0ee12dc712a9c82aaf576ea89e6d00ffaced99b647493c0f15773441ea33a206) with SIGTERM"
time="2022-09-01T21:30:59Z" level=info msg="Creating /storagenode"
time="2022-09-20T18:19:53Z" level=info msg="Found new storjlabs/storagenode:latest image (sha256:ad6421abebe71a16a71474c698b56f908f65c2b7cd19b560b3884ecf13ee05f0)"
time="2022-09-20T18:19:54Z" level=info msg="Stopping /storagenode (7361b027dd328022164afd86a9be8097edac7b21b03fda0edfcfe5bf12982955) with SIGTERM"
time="2022-09-20T18:19:59Z" level=info msg="Creating /storagenode"
error from daemon in stream: Error grabbing logs: invalid character '\x00' looking for beginning of value

I admit I’m not really experienced in debugging golang problems, but this is definitely not a problem commonly encounted here, your go runtime is crashing hard. Might be anything from misbehaving docker after some system upgrade to a hardware error. I’d probably start from purging docker images and redownloading them from scratch. What exactly model of CPU is in this machine?

1 Like

I’ll try redownloading the images.

It is an E5-2450. According to OpenManage there are no hardware issues. Maybe it’s my boot drive, it’s just booting from a thumb drive.

Need to see the beginning of this go panic dump.
But in general it is usually mean that some system libraries are corrupted or at least an image.
So @Toyoo is right - you need to purge images. But perhaps you also need to reinstall all packages on your thumb drive or re-flash it. Make sure that you moved your identity to the disk with data before wiping your thumb drive.

Out of curiosity, did redownloading the images help?

Unfortunately no. I think my thumb drive is toast.

I have a new, much larger node running now. Using Rocky Linux this time

1 Like

I’ve heard that consumer-grade USB flash drives only survive 20-30 full drive write cycles. Apparently if you want to set up an OS on a device of this form factor, it’s good to look for “SLC USB”, they have high durability.

2 Likes

Good to know! The host OS is now in a RAID1 mirror of some spare HDDs I had