I run 3 nodes in docker images on this machine. 2 of the 3 are working fine. I’ve suddenly begun having an issue with the oldest (and largest) node a couple days ago. It only stays up for about 10 to 15 minutes before crashing with a thread allocation error. Docker restarts it again after about 10 minutes, but then it crashes again after 10-15 minutes. So I end up with about 10 minutes uptime, followed by 10 minutes of downtime in a loop.
Relevant log entries:
2024-04-29T12:37:48Z INFO piecestore upload started {"Process": "storagenode", "Piece ID": "XXX", "Satellite ID": "XXX", "Action": "PUT", "Remote Address": "XXX:35465", "Available Space": 582921609103}
2024-04-29T12:37:48Z INFO piecestore upload started {"Process": "storagenode", "Piece ID": "XXX", "Satellite ID": "XXX", "Action": "PUT", "Remote Address": "XXX:33830", "Available Space": 582921546127}
2024-04-29T12:37:48Z INFO piecestore upload started {"Process": "storagenode", "Piece ID": "XXX", "Satellite ID": "XXX", "Action": "PUT", "Remote Address": "XXX:44408", "Available Space": 582921546127}
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0xf7903f m=8655 sigcode=18446744073709551610
goroutine 0 [idle]:
runtime: g 0: unknown pc 0xf7903f
stack: frame={sp:0x7897114b2508, fp:0x0} stack=[0x789711492e50,0x7897114b2a50)
0x00007897114b2408: 0x0000000000f80422 0x00007897114b24d0
0x00007897114b2418: 0x0000000000000037 0x0000000000000000
0x00007897114b2428: 0x0000000000000000 0x0000000001d079a0
0x00007897114b2438: 0x0000000001d82bd8 0x0000000000000001
0x00007897114b2448: 0x0000000000000037 0x0000000001cf87e0
0x00007897114b2458: 0x00007897114b2520 0x00007897114b24a8
0x00007897114b2468: 0x0000000000f7b5e2 0x00000000004724e0 <runtime.goexit+0x0000000000000000>
0x00007897114b2478: 0x00000000015a121a 0x00007897114b2490
0x00007897114b2488: 0x0000000000000000 0x0000003000000010
0x00007897114b2498: 0x00007897114b26f0 0x00007897114b2620
0x00007897114b24a8: 0x0000000000000000 0x0000000000000000
0x00007897114b24b8: 0x0000000000000000 0x0000000000000000
0x00007897114b24c8: 0x0000000000000000 0x5f64616572687470
0x00007897114b24d8: 0x6620657461657263 0x52203a64656c6961
0x00007897114b24e8: 0x20656372756f7365 0x7261726f706d6574
0x00007897114b24f8: 0x76616e7520796c69 0x00656c62616c6961
0x00007897114b2508: <0x0000000000f79084 0x00007897114b2588
0x00007897114b2518: 0x0000000000000000 0x0000000000000000
0x00007897114b2528: 0x0000000000f76451 0x0000000000023000
0x00007897114b2538: 0x000078971146d000 0x0000000001d83830
0x00007897114b2548: 0x0000000000f80422 0x0000000001d82bd8
0x00007897114b2558: 0x0000000000f80422 0x0000000001d82bd8
0x00007897114b2568: 0x0000000000000000 0x00007897114b25cf
0x00007897114b2578: 0x0000000000000001 0x0000000000000001
0x00007897114b2588: 0x0000000001cf87e0 0x0000000001cf886c
0x00007897114b2598: 0x000000000000000a 0x0000000000000000
0x00007897114b25a8: 0x0000000001cf87e0 0x00000000015a121a
0x00007897114b25b8: 0x0000000000f745f8 0x0000000000000000
0x00007897114b25c8: 0x0a00000001cf87e0 0x0000000001cf87e0
0x00007897114b25d8: 0x0000000000f79a53 0x0000000001cf87e0
0x00007897114b25e8: 0x00000000015a121a 0x0000789760ab7740
0x00007897114b25f8: 0x0000000000f736ec 0x0000000000462938 <runtime.(*unwinder).resolveInternal+0x0000000000000158>
runtime: g 0: unknown pc 0xf7903f
stack: frame={sp:0x7897114b2508, fp:0x0} stack=[0x789711492e50,0x7897114b2a50)
0x00007897114b2408: 0x0000000000f80422 0x00007897114b24d0
0x00007897114b2418: 0x0000000000000037 0x0000000000000000
0x00007897114b2428: 0x0000000000000000 0x0000000001d079a0
0x00007897114b2438: 0x0000000001d82bd8 0x0000000000000001
0x00007897114b2448: 0x0000000000000037 0x0000000001cf87e0
0x00007897114b2458: 0x00007897114b2520 0x00007897114b24a8
0x00007897114b2468: 0x0000000000f7b5e2 0x00000000004724e0 <runtime.goexit+0x0000000000000000>
0x00007897114b2478: 0x00000000015a121a 0x00007897114b2490
0x00007897114b2488: 0x0000000000000000 0x0000003000000010
0x00007897114b2498: 0x00007897114b26f0 0x00007897114b2620
0x00007897114b24a8: 0x0000000000000000 runtime/cgo: pthread_create failed: Resource temporarily unavailable
0x0000000000000000
0x00007897114b24b8: 0x0000000000000000 0x0000000000000000
0x00007897114b24c8: 0x0000000000000000 0x5f64616572687470
0x00007897114b24d8: 0x6620657461657263 0x52203a64656c6961
0x00007897114b24e8: 0x20656372756f7365 0x7261726f706d6574
0x00007897114b24f8: 0x76616e7520796c69 0x00656c62616c6961
0x00007897114b2508: <0x0000000000f79084 0x00007897114b2588
0x00007897114b2518: 0x0000000000000000 0x0000000000000000
0x00007897114b2528: 0x0000000000f76451 0x0000000000023000
0x00007897114b2538: 0x000078971146d000 0x0000000001d83830
0x00007897114b2548: 0x0000000000f80422 0x0000000001d82bd8
0x00007897114b2558: 0x0000000000f80422 0x0000000001d82bd8
0x00007897114b2568: 0x0000000000000000 0x00007897114b25cf
0x00007897114b2578: 0x0000000000000001 0x0000000000000001
0x00007897114b2588: 0x0000000001cf87e0 0x0000000001cf886c
0x00007897114b2598: 0x000000000000000a 0x0000000000000000
0x00007897114b25a8: 0x0000000001cf87e0 0x00000000015a121a
0x00007897114b25b8: 0x0000000000f745f8 0x0000000000000000
0x00007897114b25c8: 0x0a00000001cf87e0 0x0000000001cf87e0
0x00007897114b25d8: 0x0000000000f79a53 0x0000000001cf87e0
0x00007897114b25e8: 0x00000000015a121a 0x0000789760ab7740
0x00007897114b25f8: 0x0000000000f736ec 0x0000000000462938 <runtime.(*unwinder).resolveInternal+0x0000000000000158>
goroutine 1 [semacquire, 7 minutes]:
runtime.gopark(0x4?, 0xc000044480?, 0xa0?, 0x85?, 0xb06980?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0005e3fd0 sp=0xc0005e3fb0 pc=0x43f10e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:404
runtime.semacquire1(0xc000462390, 0x0?, 0x1, 0x0, 0x10?)
/usr/local/go/src/runtime/sema.go:160 +0x218 fp=0xc0005e4038 sp=0xc0005e3fd0 pc=0x450658
sync.runtime_Semacquire(0x1?)
/usr/local/go/src/runtime/sema.go:62 +0x25 fp=0xc0005e4070 sp=0xc0005e4038 pc=0x46e2e5
sync.(*WaitGroup).Wait(0xc000390b10?)
/usr/local/go/src/sync/waitgroup.go:116 +0x48 fp=0xc0005e4098 sp=0xc0005e4070 pc=0x47e7c8
golang.org/x/sync/errgroup.(*Group).Wait(0xc000462380)
/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:56 +0x25 fp=0xc0005e40b8 sp=0xc0005e4098 pc=0x91b1a5
storj.io/storj/storagenode.(*Peer).Run(0xc0005dc000, {0x15601b8, 0xc0002bc730})
/go/src/storj.io/storj/storagenode/peer.go:957 +0x42b fp=0xc0005e4248 sp=0xc0005e40b8 pc=0xd8ac4b
main.cmdRun(0x789760de19b0?, 0xc00020fb00)
/go/src/storj.io/storj/cmd/storagenode/cmd_run.go:123 +0xd65 fp=0xc0005e4e38 sp=0xc0005e4248 pc=0xea45c5
main.newRunCmd.func1(0x1058ec0?, {0xc00021e270?, 0xc000005200?, 0x447440?})
/go/src/storj.io/storj/cmd/storagenode/cmd_run.go:33 +0x17 fp=0xc0005e4e58 sp=0xc0005e4e38 pc=0xea3817
storj.io/common/process.cleanup.func1.4({0x1560430?, 0xc0002bfae0})
/go/pkg/mod/storj.io/common@v0.0.0-20240329051534-e16d36937e83/process/exec_conf.go:393 +0x149 fp=0xc0005e4ee0 sp=0xc0005e4e58 pc=0xab7e49
storj.io/common/process.cleanup.func1(0xc000005200, {0xc000184e10, 0x0, 0x9})
/go/pkg/mod/storj.io/common@v0.0.0-20240329051534-e16d36937e83/process/exec_conf.go:411 +0x1c88 fp=0xc0005e5bc8 sp=0xc0005e4ee0 pc=0xab7448
github.com/spf13/cobra.(*Command).execute(0xc000005200, {0xc000184d80, 0x9, 0x9})
/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983 +0xabc fp=0xc0005e5d68 sp=0xc0005e5bc8 pc=0x5d37bc
github.com/spf13/cobra.(*Command).ExecuteC(0xc000004300)
/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115 +0x3ff fp=0xc0005e5e40 sp=0xc0005e5d68 pc=0x5d407f
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039
storj.io/common/process.ExecWithCustomOptions(0xc000004300, {0x1, 0x1, 0x1, 0x0, 0x144c4e8, 0xc00022b350})
/go/pkg/mod/storj.io/common@v0.0.0-20240329051534-e16d36937e83/process/exec_conf.go:112 +0x1c9 fp=0xc0005e5e90 sp=0xc0005e5e40 pc=0xab46e9
main.main()
/go/src/storj.io/storj/cmd/storagenode/main.go:34 +0x2bf fp=0xc0005e5f40 sp=0xc0005e5e90 pc=0xea657f
runtime.main()
/usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc0005e5fe0 sp=0xc0005e5f40 pc=0x43ec9b
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0005e5fe8 sp=0xc0005e5fe0 pc=0x4724e1
goroutine 2 [force gc (idle), 2 minutes]:
runtime.gopark(0x20cec756722?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000064fa8 sp=0xc000064f88 pc=0x43f10e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:404
runtime.forcegchelper()
/usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc000064fe0 sp=0xc000064fa8 pc=0x43ef73
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000064fe8 sp=0xc000064fe0 pc=0x4724e1
created by runtime.init.6 in goroutine 1
/usr/local/go/src/runtime/proc.go:310 +0x1a
goroutine 3 [GC sweep wait, 2 minutes]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000065778 sp=0xc000065758 pc=0x43f10e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
/usr/local/go/src/runtime/mgcsweep.go:321 +0xdf fp=0xc0000657c8 sp=0xc000065778 pc=0x42925f
runtime.gcenable.func1()
/usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000657e0 sp=0xc0000657c8 pc=0x41e3a5
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000657e8 sp=0xc0000657e0 pc=0x4724e1
created by runtime.gcenable in goroutine 1
/usr/local/go/src/runtime/mgc.go:200 +0x66
(MANY SIMILAR ENTRIES SNIPPED)
goroutine 56934 [select]:
runtime.gopark(0xc06cae4ee0?, 0x3?, 0xfa?, 0xe6?, 0xc06cae4e82?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc06cae4d20 sp=0xc06cae4d00 pc=0x43f10e
runtime.selectgo(0xc06cae4ee0, 0xc06cae4e7c, 0xffffffffffffffff?, 0x0, 0x1?, 0x1)
/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc06cae4e40 sp=0xc06cae4d20 pc=0x44f625
storj.io/drpc/drpcmanager.(*Manager).manageStream(0xc0e50b52c0, {0x1560180, 0xc0e50ecde0}, 0xc0e2462900)
/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmanager/manager.go:332 +0xf1 fp=0xc06cae4f20 sp=0xc06cae4e40 pc=0xaedc71
storj.io/drpc/drpcmanager.(*Manager).manageStreams(0xc0e50b52c0)
/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmanager/manager.go:321 +0x85 fp=0xc06cae4fc8 sp=0xc06cae4f20 pc=0xaeda25
storj.io/drpc/drpcmanager.NewWithOptions.func2()
/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmanager/manager.go:122 +0x25 fp=0xc06cae4fe0 sp=0xc06cae4fc8 pc=0xaeca05
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc06cae4fe8 sp=0xc06cae4fe0 pc=0x4724e1
created by storj.io/drpc/drpcmanager.NewWithOptions in goroutine 56932
/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmanager/manager.go:122 +0x456
rax 0x0
rbx 0x6
rcx 0xf7903f
rdx 0x0
rdi 0x2
rsi 0x7897114b2520
rbp 0x7897114b2520
rsp 0x7897114b2508
r8 0xa
r9 0x1cf886c
r10 0x8
r11 0x246
r12 0x789760ab7740
r13 0x0
r14 0xc0e5445040
r15 0x10
rip 0xf7903f
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
I’ll confess, the server is rather heavily burdened. I did try shutting down the VM that’s also running on the server, but it didn’t help. I’ve tried:
- Reboot the server
- Increasing the kernel thread limit (on Linux, Ubuntu server 6.5.0-28-generic).
- Deleting the docker image and recreating it
Nothing seemed to make a difference.