Docker container stops every day and nodes go offline

But you need to update parameters accordingly your setup.
Well, try a different one, when the node would stop again:

sudo docker logs --tail 20 storagenode

and copy result here between two new lines with three backticks like this:

```
logs lines here
```

I’m not an expert with docker and commands Still learning :grinning:. But I really appreciate your efforts to help Alexey.

So yes that one works but in my case it doesn’t make sense. Let’s say the fatal error happens at 3 am and I check the logs at 8am, it’ already 5h. 5h is about 10 or 20 thousands lines in the logs already.
That’s why I’m trying to find a command that isolates a time between h and h.
There is nothing like this can do it.

Thank you,
Pascal

Edit your config.yaml, change “log.level” to “error”, save it and restart your node, this will make storagenode to log only errors.

If you using Linux then when then error occurs you can use “tail -f 100 <your_log_file>” to view last 100 lines of your logs file.

If you using Docker then you can change the log.output in config.yaml to stdout or comment the line to make it go default setting, restart the container, then when the error happens you can use command “docker logs --tail 100 <your_docker_name_or_id>” which also loads the last 100 lines of your container logs.

But I think you can go to the Fatal Error topic, I think you may get the same error as many people.

the docker version of storagenode uses log.output=stderr by default, so if it’s not changed, then docker logs storagenode will show logs out of the box.
The writeable and readable timeouts from the mentioned thread seems affect mostly slow Windows nodes with RAID, low RAM, with USB or network connected drives.

@nyancodex you are correct, the docker node will be restarted after crash.
Let’s try to filter errors in a different way:

docker logs storagenode 2>&1 | grep -i error | grep -vE "GET|PUT" | tail

Yes this one seems working !! But I tried a few things before and restarted the container, now I lost the previous log and have to wait for the next fatal error.

2023-04-16T03:08:34.449Z        INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "error"}
2023-04-16T03:08:34.589Z        INFO    Invalid configuration file key  {"Process": "storagenode", "Key": "error"}
2023-04-16T03:18:17.532Z        ERROR   piecestore:cache        error getting current used space:       {"Process": "storagenode", "error": "context canceled; context canceled; context canceled; context canceled; context canceled; context canceled", "errorVerbose": "group:\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled\n--- context canceled"}
2023-04-16T03:18:17.547Z        ERROR   pieces:trash    emptying trash failed   {"Process": "storagenode", "error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:156\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:316\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:400\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:83\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}

Those errors are normal ?

Thanks,
Pascal

you should look for error with “FATAL”

No, they are not normal: your node cannot get an information about free space from your disk. It either corrupted or too slow to respond.

However, since there are no errors about unexpected stop of the runner, they are likely not fatal.

You have Synology NAS. It’s a piece of cake to check the logs. Access DSM in your browser, go to Docker > Containers > select the storagenode you want to check > Details > Logs. Then you have a search field top right. Enter any word you want to search for and the log lines will be filtered by that word.

Yes but usually it doesnt show me enough to get my log. I finally found it.

2023-04-16T10:33:30.273118575Z	stdout	2023-04-16T10:33:30.272Z ERROR piecestore download failed {"Process": "storagenode", "Piece ID": "5JSW3E7IPRDFYHTHZ74DLLRJX3NX4VN6VTTHZSDJ3AFKNJM4CTKQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET", "Remote Address": "172.17.0.1:47304", "error": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled", "errorVerbose": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:599\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2023-04-16T10:33:08.728936422Z	stdout	2023-04-16T10:33:03.499Z INFO Version is up to date {"Process": "storagenode-updater", "Service": "storagenode-updater"}
2023-04-16T10:33:08.728894626Z	stdout	2023-04-16T10:33:03.499Z INFO Current binary version {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.76.2"}
2023-04-16T10:33:08.728854932Z	stdout	2023-04-16T10:33:03.490Z INFO Version is up to date {"Process": "storagenode-updater", "Service": "storagenode"}
2023-04-16T10:33:08.728806217Z	stdout	2023-04-16T10:33:03.490Z INFO Current binary version {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.76.2"}
2023-04-16T10:33:08.728755391Z	stdout	2023-04-16 10:33:01,846 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-04-16T10:33:08.728671823Z	stdout	2023-04-16 10:33:01,846 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-04-16T10:33:01.846197334Z	stdout	2023-04-16 10:33:01,845 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-04-16T10:33:00.845814356Z	stdout	2023-04-16T10:33:00.845Z INFO Downloading versions. {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2023-04-16T10:33:00.845205578Z	stdout	2023-04-16T10:33:00.844Z INFO Running on version {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.76.2"}
2023-04-16T10:33:00.825020974Z	stdout	2023-04-16T10:33:00.824Z INFO Anonymized tracing enabled {"Process": "storagenode-updater"}
2023-04-16T10:33:00.824979435Z	stdout	2023-04-16T10:33:00.824Z INFO Invalid configuration file value for key {"Process": "storagenode-updater", "Key": "log.stack"}
2023-04-16T10:33:00.824941450Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file value for key {"Process": "storagenode-updater", "Key": "log.output"}
2023-04-16T10:33:00.824887527Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file value for key {"Process": "storagenode-updater", "Key": "log.development"}
2023-04-16T10:33:00.824849413Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "contact.external-address"}
2023-04-16T10:33:00.824811694Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "healthcheck.details"}
2023-04-16T10:33:00.824766809Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "server.address"}
2023-04-16T10:33:00.824728955Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "storage.allocated-bandwidth"}
2023-04-16T10:33:00.824690266Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "console.address"}
2023-04-16T10:33:00.824650093Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "operator.wallet-features"}
2023-04-16T10:33:00.824611539Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "operator.wallet"}
2023-04-16T10:33:00.824570343Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "operator.email"}
2023-04-16T10:33:00.824528181Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "healthcheck.enabled"}
2023-04-16T10:33:00.824475331Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "storage.allocated-disk-space"}
2023-04-16T10:33:00.824424111Z	stdout	2023-04-16T10:33:00.823Z INFO Invalid configuration file key {"Process": "storagenode-updater", "Key": "server.private-address"}
2023-04-16T10:33:00.824263681Z	stdout	2023-04-16T10:33:00.823Z INFO Configuration loaded {"Process": "storagenode-updater", "Location": "/app/config/config.yaml"}
2023-04-16T10:33:00.812256698Z	stdout	2023-04-16 10:33:00,807 INFO spawned: 'storagenode-updater' with pid 14
2023-04-16T10:33:00.812220125Z	stdout	2023-04-16 10:33:00,805 INFO spawned: 'storagenode' with pid 13
2023-04-16T10:33:00.812178457Z	stdout	2023-04-16 10:33:00,803 INFO spawned: 'processes-exit-eventlistener' with pid 12
2023-04-16T10:33:00.812102643Z	stdout	2023-04-16 10:32:59,802 INFO supervisord started with pid 1
2023-04-16T10:32:57.973689213Z	stdout	2023-04-16 10:32:57,973 INFO RPC interface 'supervisor' initialized
2023-04-16T10:32:56.685465916Z	stdout	2023-04-16 10:32:56,685 INFO Set uid to user 0 succeeded
2023-04-16T10:30:39.006107934Z	stdout	2023-04-16 10:30:38,928 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
2023-04-16T10:30:39.006069609Z	stdout	2023-04-16 10:30:38,927 INFO stopped: storagenode (terminated by SIGKILL)
2023-04-16T10:30:39.006018810Z	stdout	2023-04-16 10:30:38,574 WARN killing 'storagenode' (5895) with SIGKILL
2023-04-16T10:30:39.005890267Z	stdout	2023-04-16 10:30:37,573 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-04-16T10:30:34.570765566Z	stdout	2023-04-16 10:30:34,570 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-04-16T10:30:32.540785010Z	stdout	2023-04-16 10:30:31,566 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-04-16T10:30:32.540749885Z	stdout	2023-04-16 10:30:28,563 INFO stopped: storagenode-updater (exit status 0)
2023-04-16T10:30:32.540696374Z	stdout	2023-04-16T10:30:28.561Z INFO Got a signal from the OS: "terminated" {"Process": "storagenode-updater"}
2023-04-16T10:30:32.540654082Z	stdout	2023-04-16 10:30:28,561 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-04-16T10:30:32.540616139Z	stdout	2023-04-16 10:30:28,556 WARN received SIGQUIT indicating exit request
2023-04-16T10:30:32.540575030Z	stdout	2023-04-16 10:30:28,295 INFO spawned: 'storagenode' with pid 5895
2023-04-16T10:30:32.540490885Z	stdout	2023-04-16 10:30:27,235 INFO exited: storagenode (exit status 1; not expected)
2023-04-16T10:30:25.738449022Z	stdout	Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory 

Thanks,
Pascal

This line gets you straight to this Thread Fatal Error on my Node

tl;dr try this

2 Likes

Hi,

I have a question, I changed the write buffer size to 4 MiB, has it some cons to increase this value ? I have 20Go ram in my Nas, so I guess it’s only good.
image

Thanks,
Pasc

I have 10GB and didn’t see any problems.

But what’s the best value to put ? min or max ? Like below 128 or more like 4MiB or even 8.

4 MiB is the best. If you are low on RAM, go lower. Don’t use more than 4. Vadim uses for exemple 2MiB.

1 Like

I turned all to 4, as i have 16 GB mine, and now 24-32 almost everywere

By the way don’t forget to remove the “♯”, otherwise it stays in comment and doesn’t change anything if I understand it right.

I don’t use the config file.

What command do you do ?

You can provide all additional options after the image name in your docker run command.

I have to add the new settings for timeouts.
And… I turned off the FW on all nodes, so you can change the setting in false.