Filewalker crashes node

node was running for about 24h without problem, guess is was the same problem before but did not check log, restarting the docker container works for now but why filewalker crash the node?

2024-07-11T19:35:49Z    INFO    piecestore      upload canceled {"Process": "storagenode", "Piece ID": "AC3P4EDR4M44ARJU7MWSFDLH5EAW4JYUHGF2MPN3H6I65TS3OTGA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.77:32806", "Size": 65536}
2024-07-11T19:35:49Z    INFO    lazyfilewalker.used-space-filewalker    starting subprocess     {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-07-11T19:35:49Z    ERROR   lazyfilewalker.used-space-filewalker    failed to start subprocess      {"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "context canceled"}
2024-07-11T19:35:49Z    ERROR   pieces  failed to lazywalk space used by satellite      {"Process": "storagenode", "error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:707\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-07-11T19:35:49Z    ERROR   piecestore:cache        error getting current used space:       {"Process": "storagenode", "error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-11T19:35:49Z    INFO    lazyfilewalker.gc-filewalker    subprocess exited with status   {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "status": -1, "error": "signal: killed"}
2024-07-11T19:35:49Z    ERROR   pieces  lazyfilewalker failed   {"Process": "storagenode", "error": "lazyfilewalker: signal: killed", "errorVerbose": "lazyfilewalker: signal: killed\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:85\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkSatellitePiecesToTrash:160\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:561\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:373\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:259\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-11T19:35:49Z    ERROR   filewalker      failed to get progress from database    {"Process": "storagenode"}
2024-07-11T19:35:49Z    ERROR   retain  retain pieces failed    {"Process": "storagenode", "cachePath": "config/retain", "error": "retain: filewalker: context canceled", "errorVerbose": "retain: filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePiecesToTrash:181\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:568\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:373\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:259\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-11T19:35:50Z    WARN    services        service takes long to shutdown  {"Process": "storagenode", "name": "retain"}
2024-07-11T19:35:50Z    INFO    services        slow shutdown   {"Process": "storagenode", "stack": "goroutine 1133 [running]:\nstorj.io/storj/private/lifecycle.(*Group).logStackTrace.func1()\n\t/go/src/storj.io/storj/private/lifecycle/group.go:107 +0x71\nsync.(*Once).doSlow(0xc00c1a20d0?, 0xc0163a7680?)\n\t/usr/local/go/src/sync/once.go:74 +0xbf\nsync.(*Once).Do(...)\n\t/usr/local/go/src/sync/once.go:65\nstorj.io/storj/private/lifecycle.(*Group).logStackTrace(0xc0004516b8?)\n\t/go/src/storj.io/storj/private/lifecycle/group.go:104 +0x3c\nstorj.io/storj/private/lifecycle.(*Group).Run.func1({0x15801c0?, 0xc000661980?})\n\t/go/src/storj.io/storj/private/lifecycle/group.go:77 +0x2a5\nruntime/pprof.Do({0x1580470?, 0xc000dae960?}, {{0xc000da2840?, 0x0?, 0x15801f8?}}, 0xc000db6c80)\n\t/usr/local/go/src/runtime/pprof/runtime.go:51 +0x9d\ncreated by storj.io/storj/private/lifecycle.(*Group).Run in goroutine 1\n\t/go/src/storj.io/storj/private/lifecycle/group.go:64 +0x509\n\ngoroutine 1 [semacquire, 872 minutes]:\nsync.runtime_Semacquire(0x1?)\n\t/usr/local/go/src/runtime/sema.go:62 +0x25\nsync.(*WaitGroup).Wait(0xc0000132f0?)\n\t/usr/local/go/src/sync/waitgroup.go:116 +0x48\ngolang.org/x/sync/errgroup.(*Group).Wait(0xc000db6440)\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:56 +0x25\nstorj.io/storj/storagenode.(*Peer).Run(0xc000138840, {0x15801f8, 0xc000da4550})\n\t/go/src/storj.io/storj/storagenode/peer.go:967 +0x42b\nmain.cmdRun(0x7f2405b92360?, 0xc0001f5200)\n\t/go/src/storj.io/storj/cmd/storagenode/cmd_run.go:123 +0xdc5\nmain.newRunCmd.func1(0x1072260?, {0xc0001fe150?, 0xc000218f00?, 0x447440?})\n\t/go/src/storj.io/storj/cmd/storagenode/cmd_run.go:33 +0x17\nstorj.io/common/process.cleanup.func1.4({0x1580470?, 0xc0002b28c0})\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/exec_conf.go:393 +0x149\nstorj.io/common/process.cleanup.func1(0xc000218f00, {0xc000150e10, 0x0, 0x9})\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/exec_conf.go:411 +0x1c88\ngithub.com/spf13/cobra.(*Command).execute(0xc000218f00, {0xc000150d80, 0x9, 0x9})\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983 +0xabc\ngithub.com/spf13/cobra.(*Command).ExecuteC(0xc000218000)\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115 +0x3ff\ngithub.com/spf13/cobra.(*Command).Execute(...)\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nstorj.io/common/process.ExecWithCustomOptions(0xc000218000, {0x1, 0x1, 0x1, 0x0, 0x146a4b0, 0xc000205960})\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/exec_conf.go:112 +0x1c9\nmain.main()\n\t/go/src/storj.io/storj/cmd/storagenode/main.go:34 +0x2bf\n\ngoroutine 4 [select]:\nstorj.io/monkit-jaeger.(*ThriftCollector).Run(0xc000150cf0, {0x15801f8, 0xc000216730})\n\t/go/pkg/mod/storj.io/monkit-jaeger@v0.0.0-20240221095020-52b0792fa6cd/thrift.go:174 +0x2cf\nstorj.io/common/process.cleanup.func1.2()\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/exec_conf.go:351 +0x1f\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 5 [semacquire, 873 minutes]:\nsync.runtime_Semacquire(0xc0000cc1e0?)\n\t/usr/local/go/src/runtime/sema.go:62 +0x25\nsync.(*WaitGroup).Wait(0x11280a0?)\n\t/usr/local/go/src/sync/waitgroup.go:116 +0x48\ngolang.org/x/sync/errgroup.(*Group).Wait(0xc0001d2140)\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:56 +0x25\nstorj.io/common/debug.(*Server).Run(0xc000400000, {0x1580188, 0x1db7fa0})\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/debug/server.go:203 +0x33c\nstorj.io/common/process.initDebug.func1()\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/debug.go:40 +0xf4\ncreated by storj.io/common/process.initDebug in goroutine 1\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/debug.go:37 +0xf0\n\ngoroutine 50 [syscall, 873 minutes]:\nos/signal.signal_recv()\n\t/usr/local/go/src/runtime/sigqueue.go:152 +0x29\nos/signal.loop()\n\t/usr/local/go/src/os/signal/signal_unix.go:23 +0x13\ncreated by os/signal.Notify.func1.1 in goroutine 1\n\t/usr/local/go/src/os/signal/signal.go:151 +0x1f\n\ngoroutine 66 [chan receive, 873 minutes]:\nstorj.io/common/process.Ctx.func1()\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/exec_conf.go:139 +0x39\ncreated by storj.io/common/process.Ctx in goroutine 1\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/exec_conf.go:138 +0x2c7\n\ngoroutine 27 [chan receive, 873 minutes]:\nstorj.io/common/debug.(*Server).Run.func3()\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/debug/server.go:182 +0x35\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 5\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 28 [chan receive, 873 minutes]:\nstorj.io/drpc/drpcmigrate.(*ListenMux).Run(0xc00011c070, {0x15801f8?, 0xc000410000?})\n\t/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmigrate/mux.go:90 +0x12e\nstorj.io/common/debug.(*Server).Run.func4()\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/debug/server.go:186 +0x31\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 5\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 29 [select, 873 minutes]:\nstorj.io/drpc/drpcmigrate.(*listener).Accept(0xc0001d2100)\n\t/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmigrate/listener.go:37 +0x7e\nnet/http.(*Server).Serve(0xc000400040, {0x157a760, 0xc0001d2100})\n\t/usr/local/go/src/net/http/server.go:3056 +0x364\nstorj.io/common/debug.(*Server).Run.func5()\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/debug/server.go:195 +0x53\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 5\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 82 [chan receive, 873 minutes]:\nstorj.io/drpc/drpcmigrate.(*ListenMux).monitorContext(0xc00011c070, {0x15801f8?, 0xc000622000?})\n\t/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmigrate/mux.go:106 +0x2d\ncreated by storj.io/drpc/drpcmigrate.(*ListenMux).Run in goroutine 28\n\t/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmigrate/mux.go:87 +0xd6\n\ngoroutine 83 [IO wait, 873 minutes]:\ninternal/poll.runtime_pollWait(0x7f2405d6de88, 0x72)\n\t/usr/local/go/src/runtime/netpoll.go:343 +0x85\ninternal/poll.(*pollDesc).wait(0xc000250580?, 0x0?, 0x0)\n\t/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27\ninternal/poll.(*pollDesc).waitRead(...)\n\t/usr/local/go/src/internal/poll/fd_poll_runtime.go:89\ninternal/poll.(*FD).Accept(0xc000250580)\n\t/usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac\nnet.(*netFD).accept(0xc000250580)\n\t/usr/local/go/src/net/fd_unix.go:172 +0x29\nnet.(*TCPListener).accept(0xc0004a3fa0)\n\t/usr/local/go/src/net/tcpsock_posix.go:152 +0x1e\nnet.(*TCPListener).Accept(0xc0004a3fa0)\n\t/usr/local/go/src/net/tcpsock.go:315 +0x30\nstorj.io/drpc/drpcmigrate.(*ListenMux).monitorBase(0xc00011c070)\n\t/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmigrate/mux.go:115 +0x3a\ncreated by storj.io/drpc/drpcmigrate.(*ListenMux).Run in goroutine 28\n\t/go/pkg/mod/storj.io/drpc@v0.0.34/drpcmigrate/mux.go:88 +0x11b\n\ngoroutine 274 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8c410, {0x15801f8, 0xc000d880a0})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 128 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8c1a0, {0x15801f8, 0xc000d88050})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 284 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8cc30, {0x15801f8, 0xc000d88230})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 49 [select]:\nstorj.io/common/time2.Clock.Sleep({{0x0?, 0x0?}}, {0x15801f8, 0xc000c02000}, 0xc005111f40?)\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/time2/clock.go:60 +0xec\nstorj.io/common/time2.Sleep({0x15801f8, 0xc000c02000}, 0xc0207c26e0?)\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/time2/context.go:40 +0x6d\nstorj.io/common/sync2.Sleep(...)\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/sync2/sleep.go:15\nstorj.io/common/telemetry.(*Reporter).Run(0xc000b59c20, {0x15801f8?, 0xc000216780?})\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/telemetry/reporter.go:43 +0x137\nstorj.io/common/telemetry.(*Client).Run(0x7696e5?, {0x15801f8?, 0xc000216780?})\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/telemetry/client.go:124 +0x25\ncreated by storj.io/common/process.InitMetrics in goroutine 1\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/metrics.go:121 +0x3a5\n\ngoroutine 130 [select]:\nstorj.io/eventkit.(*UDPClient).Run(0xc00062e6c0, {0x15801f8?, 0xc000216780})\n\t/go/pkg/mod/storj.io/eventkit@v0.0.0-20240306141230-6cb545e5f892/client.go:209 +0x439\ncreated by storj.io/common/process.UDPDestination in goroutine 1\n\t/go/pkg/mod/storj.io/common@v0.0.0-20240425113201-9815a85cbc32/process/metrics.go:149 +0x85\n\ngoroutine 8 [select]:\nstorj.io/eventkit/utils.(*JitteredTicker).Run(0xc00004e180, {0x15801f8, 0xc000216780})\n\t/go/pkg/mod/storj.io/eventkit@v0.0.0-20240306141230-6cb545e5f892/utils/jitter.go:27 +0x206\nstorj.io/eventkit.(*UDPClient).Run.func2()\n\t/go/pkg/mod/storj.io/eventkit@v0.0.0-20240306141230-6cb545e5f892/client.go:185 +0x1f\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 130\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 146 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8c000, {0x15801f8, 0xc000d88000})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 311 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8cdd0, {0x15801f8, 0xc000d88a50})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 285 [select, 27 minutes]:\ndatabase/sql.(*DB).connectionCleaner(0xc000d8cc30, 0xc00014d880?)\n\t/usr/local/go/src/database/sql/sql.go:1061 +0x9c\ncreated by database/sql.(*DB).startCleanerLocked in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:1048 +0x105\n\ngoroutine 281 [select, 3 minutes]:\ndatabase/sql.(*DB).connectionCleaner(0xc000d8c8f0, 0xc00014dd80?)\n\t/usr/local/go/src/database/sql/sql.go:1061 +0x9c\ncreated by database/sql.(*DB).startCleanerLocked in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:1048 +0x105\n\ngoroutine 282 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8ca90, {0x15801f8, 0xc000d881e0})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 280 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8c8f0, {0x15801f8, 0xc000d88190})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 278 [select, 107 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8c750, {0x15801f8, 0xc000d88140})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 279 [select, 27 minutes]:\ndatabase/sql.(*DB).connectionCleaner(0xc000d8c750, 0xc000b90280?)\n\t/usr/local/go/src/database/sql/sql.go:1061 +0x9c\ncreated by database/sql.(*DB).startCleanerLocked in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:1048 +0x105\n\ngoroutine 1090 [sleep]:\ntime.Sleep(0x8bb2c97000)\n\t/usr/local/go/src/runtime/time.go:195 +0x125\ngithub.com/spacemonkeygo/monkit/v3.(*ticker).run(0x1d86800)\n\t/go/pkg/mod/github.com/spacemonkeygo/monkit/v3@v3.0.22/meter.go:203 +0x26\ncreated by github.com/spacemonkeygo/monkit/v3.(*ticker).register in goroutine 1074\n\t/go/pkg/mod/github.com/spacemonkeygo/monkit/v3@v3.0.22/meter.go:195 +0x89\n\ngoroutine 276 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8c5b0, {0x15801f8, 0xc000d880f0})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 270 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc0006a21a0, {0x15801f8, 0xc000b905f0})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 271 [select, 24 minutes]:\ndatabase/sql.(*DB).connectionCleaner(0xc0006a21a0, 0x0?)\n\t/usr/local/go/src/database/sql/sql.go:1061 +0x9c\ncreated by database/sql.(*DB).startCleanerLocked in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:1048 +0x105\n\ngoroutine 272 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc0006a2340, {0x15801f8, 0xc000b90640})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 273 [select, 16 minutes]:\ndatabase/sql.(*DB).connectionCleaner(0xc0006a2340, 0x0?)\n\t/usr/local/go/src/database/sql/sql.go:1061 +0x9c\ncreated by database/sql.(*DB).startCleanerLocked in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:1048 +0x105\n\ngoroutine 333 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8cf70, {0x15801f8, 0xc000d890e0})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 350 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8d110, {0x15801f8, 0xc000d895e0})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 351 [select, 5 minutes]:\ndatabase/sql.(*DB).connectionCleaner(0xc000d8d110, 0xc000d89590?)\n\t/usr/local/go/src/database/sql/sql.go:1061 +0x9c\ncreated by database/sql.(*DB).startCleanerLocked in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:1048 +0x105\n\ngoroutine 407 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8d2b0, {0x15801f8, 0xc000410c30})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 461 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8d5f0, {0x15801f8, 0xc000411db0})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 454 [select, 872 minutes]:\ndatabase/sql.(*DB).connectionOpener(0xc000d8d450, {0x15801f8, 0xc000411bd0})\n\t/usr/local/go/src/database/sql/sql.go:1218 +0x87\ncreated by database/sql.OpenDB in goroutine 1\n\t/usr/local/go/src/database/sql/sql.go:791 +0x165\n\ngoroutine 1134 [semacquire, 872 minutes]:\nsync.runtime_Semacquire(0xc0004b55a8?)\n\t/usr/local/go/src/runtime/sema.go:62 +0x25\nsync.(*WaitGroup).Wait(0x1104d40?)\n\t/usr/local/go/src/sync/waitgroup.go:116 +0x48\ngolang.org/x/sync/errgroup.(*Group).Wait(0xc000338080)\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:56 +0x25\nstorj.io/storj/storagenode/retain.(*Service).Run(0xc000338000, {0x15801c0, 0xc0006619e0})\n\t/go/src/storj.io/storj/storagenode/retain/retain.go:276 +0x52f\nstorj.io/storj/private/lifecycle.(*Group).Run.func2.1({0x15801c0?, 0xc0006619e0?})\n\t/go/src/storj.io/storj/private/lifecycle/group.go:87 +0x2a\nruntime/pprof.Do({0x1580470?, 0xc000dae960?}, {{0xc00064aa00?, 0xc000451dc8?, 0x44edd0?}}, 0xc0004ece70)\n\t/usr/local/go/src/runtime/pprof/runtime.go:51 +0x9d\nstorj.io/storj/private/lifecycle.(*Group).Run.func2()\n\t/go/src/storj.io/storj/private/lifecycle/group.go:86 +0x285\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 64 [sync.Mutex.Lock]:\nsync.runtime_SemacquireMutex(0xc000665c20?, 0x20?, 0xc00040e120?)\n\t/usr/local/go/src/runtime/sema.go:77 +0x25\nsync.(*Mutex).lockSlow(0xc0000140a0)\n\t/usr/local/go/src/sync/mutex.go:171 +0x15d\nsync.(*Mutex).Lock(0xc000451f38?)\n\t/usr/local/go/src/sync/mutex.go:90 +0x32\nstorj.io/storj/storagenode/retain.(*Service).Run.func1()\n\t/go/src/storj.io/storj/storagenode/retain/retain.go:217 +0x102\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1134\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 65 [sync.Mutex.Lock, 353 minutes]:\nsync.runtime_SemacquireMutex(0xc000056a00?, 0x38?, 0x46e685?)\n\t/usr/local/go/src/runtime/sema.go:77 +0x25\nsync.(*Mutex).lockSlow(0xc0000140a0)\n\t/usr/local/go/src/sync/mutex.go:171 +0x15d\nsync.(*Mutex).Lock(0xc000338040?)\n\t/usr/local/go/src/sync/mutex.go:90 +0x32\nsync.(*Cond).Wait(0x0?)\n\t/usr/local/go/src/sync/cond.go:71 +0x97\nstorj.io/storj/storagenode/retain.(*Service).Run.func2()\n\t/go/src/storj.io/storj/storagenode/retain/retain.go:246 +0x40c\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1134\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 1170 [runnable]:\nstorj.io/storj/storagenode/retain.(*Service).next(0xc000338000)\n\t/go/src/storj.io/storj/storagenode/retain/retain.go:297 +0x111\nstorj.io/storj/storagenode/retain.(*Service).Run.func2()\n\t/go/src/storj.io/storj/storagenode/retain/retain.go:242 +0x148\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1134\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 1171 [sync.Mutex.Lock, 353 minutes]:\nsync.runtime_SemacquireMutex(0xc000056a00?, 0x38?, 0x46e685?)\n\t/usr/local/go/src/runtime/sema.go:77 +0x25\nsync.(*Mutex).lockSlow(0xc0000140a0)\n\t/usr/local/go/src/sync/mutex.go:171 +0x15d\nsync.(*Mutex).Lock(0xc000338040?)\n\t/usr/local/go/src/sync/mutex.go:90 +0x32\nsync.(*Cond).Wait(0x0?)\n\t/usr/local/go/src/sync/cond.go:71 +0x97\nstorj.io/storj/storagenode/retain.(*Service).Run.func2()\n\t/go/src/storj.io/storj/storagenode/retain/retain.go:246 +0x40c\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1134\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 1172 [sync.Mutex.Lock, 353 minutes]:\nsync.runtime_SemacquireMutex(0xc000058f00?, 0x38?, 0x46e685?)\n\t/usr/local/go/src/runtime/sema.go:77 +0x25\nsync.(*Mutex).lockSlow(0xc0000140a0)\n\t/usr/local/go/src/sync/mutex.go:171 +0x15d\nsync.(*Mutex).Lock(0xc000338040?)\n\t/usr/local/go/src/sync/mutex.go:90 +0x32\nsync.(*Cond).Wait(0x0?)\n\t/usr/local/go/src/sync/cond.go:71 +0x97\nstorj.io/storj/storagenode/retain.(*Service).Run.func2()\n\t/go/src/storj.io/storj/storagenode/retain/retain.go:246 +0x40c\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1134\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 1173 [sync.Mutex.Lock]:\nsync.runtime_SemacquireMutex(0x0?, 0x0?, 0x0?)\n\t/usr/local/go/src/runtime/sema.go:77 +0x25\nsync.(*Mutex).lockSlow(0xc0000140a0)\n\t/usr/local/go/src/sync/mutex.go:171 +0x15d\nsync.(*Mutex).Lock(0x5f932e2cd7e92d7b?)\n\t/usr/local/go/src/sync/mutex.go:90 +0x32\nstorj.io/storj/storagenode/retain.(*Service).Run.func2()\n\t/go/src/storj.io/storj/storagenode/retain/retain.go:267 +0x3a7\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1134\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96\n\ngoroutine 2230207 [select, 15 minutes]:\ndatabase/sql.(*DB).connectionCleaner(0xc000d8c410, 0x1d11430?)\n\t/usr/local/go/src/database/sql/sql.go:1061 +0x9c\ncreated by database/sql.(*DB).startCleanerLocked in goroutine 1148\n\t/usr/local/go/src/database/sql/sql.go:1048 +0x105\n\ngoroutine 432777 [select, 19 minutes]:\ndatabase/sql.(*DB).connectionCleaner(0xc000d8c1a0, 0x1d11430?)\n\t/usr/local/go/src/database/sql/sql.go:1061 +0x9c\ncreated by database/sql.(*DB).startCleanerLocked in goroutine 1148\n\t/usr/local/go/src/database/sql/sql.go:1048 +0x105\n"}

What version are you on? It sounds like 107 was pulled due to crashing issues, and 105 is upgrading direct to 108 now instead.

it is on 105 but looks like the USB HDD has a problem, forgot to set reseved block count when setup the node and now tune2fs -m 0 is stuck… nvm soon i will take the HDD out of the USB enclosure and connect it direcly to mainboard for now i moved DBs to SSD looks like it helped a bit… will post again if node still crashes

It’s unlikely the filewalker crashed your node, it’s a some event before this logs excerpt, which requested to shutdown the node. All these messages just a consequence of the shutdown the node.
Please search for FATAL and/or Unrecoverable errors in your logs, there could be a culprit.

too bad it crashed again and there was not a single error in log, it just stopped logging maybe because logfile is not at standart location (using ramdisk)
using this python script for now to restart the container if no log event for 30 seconds, also it sends the last 50 log lines with mail (using ssmtp)

import os
import time
from datetime import datetime, timedelta
import subprocess

# Konfigurationsparameter
LOG_FILE_PATH = "/path/to/your/logfile.log"
DOCKER_CONTAINER_NAME = "your_docker_container"
EMAIL_RECIPIENT = "recipient_email@example.com"
EMAIL_SENDER = "your_email@example.com"

# Funktion zum Überprüfen der Logdatei auf Änderungen
def get_last_modification_time(file_path):
    return datetime.fromtimestamp(os.path.getmtime(file_path))

# Funktion zum Senden einer E-Mail mit ssmtp
def send_email(subject, body):
    email_text = f"From: {EMAIL_SENDER}\nTo: {EMAIL_RECIPIENT}\nSubject: {subject}\n\n{body}"
    
    process = subprocess.Popen(['ssmtp', EMAIL_RECIPIENT], stdin=subprocess.PIPE)
    process.communicate(email_text.encode('utf-8'))

# Funktion zum Neustarten des Docker-Containers
def restart_docker_container(container_name):
    subprocess.run(["docker", "restart", container_name])

# Funktion zum Abrufen der letzten 50 Zeilen der Logdatei
def get_last_log_lines(file_path, num_lines=50):
    with open(file_path, 'r') as file:
        lines = file.readlines()
    return ''.join(lines[-num_lines:])

def main():
    last_checked = datetime.now()
    last_modification = get_last_modification_time(LOG_FILE_PATH)

    while True:
        time.sleep(10)  # ĂśberprĂĽfen alle 10 Sekunden

        current_modification = get_last_modification_time(LOG_FILE_PATH)
        if current_modification > last_modification:
            last_modification = current_modification
            last_checked = datetime.now()

        if datetime.now() - last_checked > timedelta(seconds=30):
            # Abrufen der letzten 50 Zeilen der Logdatei
            last_log_lines = get_last_log_lines(LOG_FILE_PATH)
            
            # Neustarten des Docker-Containers
            restart_docker_container(DOCKER_CONTAINER_NAME)
            
            # Senden einer E-Mail
            email_body = f"The Docker container '{DOCKER_CONTAINER_NAME}' was restarted due to inactivity.\n\nLast 50 lines of the log:\n{last_log_lines}"
            send_email(f"Docker Container '{DOCKER_CONTAINER_NAME}' Restart Notification", email_body)

            # Aktualisieren des letzten PrĂĽfzeitpunkts
            last_checked = datetime.now()
            last_modification = get_last_modification_time(LOG_FILE_PATH)

if __name__ == "__main__":
    main()

Don’t do that. Fix the underlying issue instead of covering up one problem by creating another.

Your disk subsystem is likely incapable of keeping up with the io pressure causing the node to stockpile the data and get killed by docker or oom watcher. Check our system log.

1 Like

i agree but without error its hard to fix, syslog has about no error at the time of crash:

Jul 12 09:47:08 store ddclient[1036]: WARNING:  found neither ipv4 nor ipv6 address
Jul 12 10:50:25 store snapd[964]: storehelpers.go:923: cannot refresh: snap has no updates available: "core20", "lxd", "snapd"
Jul 12 11:42:21 store ddclient[1036]: WARNING:  found neither ipv4 nor ipv6 address

maybe node stopped working because internet was down it looks like?
for sure something is bad with the HDDs IO load is very high:
load average: 32.90, 96.79, 91.23
no other idea to fix it but hope it will help to connect the HDDs to mainboard and not USB like now

Maybe docker logs will have something.

Or you can use at historical memory usage graphs on your system or get indirect evidence e

HDDs are now internal but they are Seagate BarraCuda 8TB SMR… will try to use storage2.max-concurrent-requests and see what the load average does

You may also try to reduce the allocation below the usage (reported by the dashboard), enable the scan on startup and restart the node. When the filewalkers would finish scans for all trusted satellites, increase the allocated back, disable the scan on startup, save the config and restart the node.

today i woke up and saw a node offline mail on my phone :frowning:
this time the whole system did a reboot and somehow did not mount the ram disks and both nodes could not start
will try now with lowered allocation and scan on startup
edit: is storage2.piece-scan-on-startup: true the right command?
edit: changed storage2.max-concurrent-requests: to 1 if its not working now its over i guess