Hello there.
I run a storj node since a few months.
I run the node with podman quadlets, and it is setup to restart automatically if it crashs.
The problem I encounter is that sometimes it randomly crash, and each restart try fails with a timeout and no app logs are present, and that occurs in a restart loop during ~1-2 hours, and then suddenly the next restart is a success and no more starting timeout.
Some error logs just before the crash of this morning:
févr. 22 07:35:21 fedora-server storj[9206]: 2025-02-22T06:35:21Z ERROR services unexpected shutdown of a runner {"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: timed out after 1m0s while verifying writability of stor>
févr. 22 07:35:21 fedora-server storj[9206]: 2025-02-22T06:35:21Z ERROR piecestore error sending hash and order limit {"Process": "storagenode", "Piece ID": "HRQ4SPX7ECCX3EXUYOUYULWFAUAWPOAUGQX4FFIQYJR7U5GQWCCQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1nt>
févr. 22 07:35:21 fedora-server storj[9206]: 2025-02-22T06:35:21Z ERROR piecestore download failed {"Process": "storagenode", "Piece ID": "HRQ4SPX7ECCX3EXUYOUYULWFAUAWPOAUGQX4FFIQYJR7U5GQWCCQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk>
...
févr. 22 07:37:13 fedora-server storj[9206]: 2025-02-22T06:37:13Z ERROR failure during run {"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed >
févr. 22 07:37:13 fedora-server storj[9206]: Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory
févr. 22 07:37:13 fedora-server storj[9206]: 2025-02-22 06:37:13,348 WARN exited: storagenode (exit status 1; not expected)
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22 06:37:14,397 INFO spawned: 'storagenode' with pid 3858
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22T06:37:14Z INFO Configuration loaded {"Process": "storagenode", "Location": "/app/config/config.yaml"}
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22 06:37:14,516 WARN received SIGQUIT indicating exit request
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22 06:37:14,517 INFO waiting for processes-exit-eventlistener, storagenode, storagenode-updater to die
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22T06:37:14Z INFO Anonymized tracing enabled {"Process": "storagenode"}
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22T06:37:14Z INFO Operator email {"Process": "storagenode", "Address": "XXX"}
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22T06:37:14Z INFO Operator wallet {"Process": "storagenode", "Address": "XXX"}
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22T06:37:14Z INFO Got a signal from the OS: "terminated" {"Process": "storagenode-updater"}
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22 06:37:14,583 INFO stopped: storagenode-updater (exit status 0)
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22T06:37:14Z INFO Got a signal from the OS: "terminated" {"Process": "storagenode"}
févr. 22 07:37:14 fedora-server storj[9206]: 2025-02-22T06:37:14Z ERROR failure during run {"Process": "storagenode", "error": "Error opening database on storagenode: database: info opening file \"config/storage/info.db\" failed: context canceled\n\tstorj.io/stor>
févr. 22 07:37:14 fedora-server storj[9206]: Error: Error opening database on storagenode: database: info opening file "config/storage/info.db" failed: context canceled
The restart timeouts during 2 hours:
févr. 22 07:37:15 fedora-server systemd[1499]: storj.service: Consumed 27min 55.252s CPU time, 2.9G memory peak, 56.8M memory swap peak.
févr. 22 07:37:15 fedora-server systemd[1499]: storj.service: Scheduled restart job, restart counter is at 4.
févr. 22 07:37:15 fedora-server systemd[1499]: Starting storj.service - Podman Storj container...
févr. 22 07:37:15 fedora-server podman[724841]: 2025-02-22 07:37:15.714024054 +0100 CET m=+0.082077070 container create 89f411b3c06286f5408a2dab845574bc9a2480203827821cb272f9ce92ecb1c9 (image=docker.io/storjlabs/storagenode:latest, name=storj, PODMAN_SYSTEMD_UNIT=storj.service)
févr. 22 07:37:15 fedora-server podman[724841]: 2025-02-22 07:37:15.668384797 +0100 CET m=+0.036437824 image pull 4c1e65b147c42b70a513cadd66d660871aa7f3d644fb2dde89a5210535d48c9d docker.io/storjlabs/storagenode:latest
févr. 22 07:42:15 fedora-server systemd[1499]: storj.service: start operation timed out. Terminating.
févr. 22 07:42:16 fedora-server systemd[1499]: storj.service: Main process exited, code=exited, status=1/FAILURE
févr. 22 07:42:16 fedora-server podman[726106]: 2025-02-22 07:42:16.328401658 +0100 CET m=+0.258244429 container remove 89f411b3c06286f5408a2dab845574bc9a2480203827821cb272f9ce92ecb1c9 (image=docker.io/storjlabs/storagenode:latest, name=storj, PODMAN_SYSTEMD_UNIT=storj.service)
févr. 22 07:42:16 fedora-server storj[726106]: 89f411b3c06286f5408a2dab845574bc9a2480203827821cb272f9ce92ecb1c9
févr. 22 07:42:16 fedora-server systemd[1499]: storj.service: Failed with result 'timeout'.
févr. 22 07:42:16 fedora-server systemd[1499]: Failed to start storj.service - Podman Storj container.
févr. 22 07:42:16 fedora-server systemd[1499]: storj.service: Consumed 26.778s CPU time, 3.9G memory peak.
févr. 22 07:42:16 fedora-server systemd[1499]: storj.service: Scheduled restart job, restart counter is at 5.
févr. 22 07:42:16 fedora-server systemd[1499]: Starting storj.service - Podman Storj container...
févr. 22 07:42:16 fedora-server podman[726204]: 2025-02-22 07:42:16.697181174 +0100 CET m=+0.070753503 container create b3baa5447a2276b448b6184278468445219b609c6cc986ed1e00aeed6450703a (image=docker.io/storjlabs/storagenode:latest, name=storj, PODMAN_SYSTEMD_UNIT=storj.service)
févr. 22 07:42:16 fedora-server podman[726204]: 2025-02-22 07:42:16.646537597 +0100 CET m=+0.020109926 image pull 4c1e65b147c42b70a513cadd66d660871aa7f3d644fb2dde89a5210535d48c9d docker.io/storjlabs/storagenode:latest
févr. 22 07:47:16 fedora-server systemd[1499]: storj.service: start operation timed out. Terminating.
févr. 22 07:47:16 fedora-server systemd[1499]: storj.service: Main process exited, code=exited, status=1/FAILURE
févr. 22 07:47:17 fedora-server podman[727469]: 2025-02-22 07:47:17.306881096 +0100 CET m=+0.334068440 container remove b3baa5447a2276b448b6184278468445219b609c6cc986ed1e00aeed6450703a (image=docker.io/storjlabs/storagenode:latest, name=storj, PODMAN_SYSTEMD_UNIT=storj.service)
févr. 22 07:47:17 fedora-server storj[727469]: b3baa5447a2276b448b6184278468445219b609c6cc986ed1e00aeed6450703a
févr. 22 07:47:17 fedora-server systemd[1499]: storj.service: Failed with result 'timeout'.
févr. 22 07:47:17 fedora-server systemd[1499]: Failed to start storj.service - Podman Storj container.
févr. 22 07:47:17 fedora-server systemd[1499]: storj.service: Consumed 37.374s CPU time, 4.6G memory peak.
I put the full logs (huge) in the link below, that contains logs before the crash, the 2 hours restart tries, and the logs when the service succeed to get up again:
https://pastebin.com/raw/zk7t7vep
Thanks a lot for your help