It seems like the node restarts on itself

Hello Alexey,

It’s me again, and unfortunately I do not bring good news for myself.

I have been checking on the status of my node each day through:

docker ps -a

And today I saw this output:

CONTAINER ID   IMAGE                          COMMAND                  CREATED      STATUS        PORTS                                                                            NAMES     
22e3da993205   storjlabs/watchtower           "/watchtower node1 w…"   4 days ago   Up 4 days                                                                                      watchtower
12524c955e58   storjlabs/storagenode:latest   "/entrypoint"            4 days ago   Up 10 hours   127.0.0.1:14002->14002/tcp, 0.0.0.0:28967->28967/tcp, 0.0.0.0:28967->28967/udp   node1  

Check the logs like this:

docker logs 12524c955e58

The entries that grabbed my attention were the following:

2022-04-11T06:37:13.920Z        INFO    piecestore      uploaded        {"Piece ID": "Y6LAR6PLVNJNJW6UN6KRRQK5C6BXCBE6F7MTTY55C7CY6YLVAO6A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"
, "Action": "PUT", "Size": 181504}
2022-04-11T06:37:21.911Z        INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2022-04-11T06:37:22.437Z        INFO    Current binary version  {"Service": "storagenode", "Version": "v1.52.2"}
2022-04-11T06:37:22.437Z        INFO    Version is up to date   {"Service": "storagenode"}
2022-04-11T06:37:22.456Z        INFO    Current binary version  {"Service": "storagenode-updater", "Version": "v1.52.2"}
2022-04-11T06:37:22.456Z        INFO    Version is up to date   {"Service": "storagenode-updater"}

I assume that this must be the watchtower updating my node or am I mistaking here?

Nevertheless, the following is what I believe had caused the node to change stastus:

2022-04-11T06:42:00.487Z        ERROR   services        unexpected shutdown of a runner {"name": "trust", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.
io/storj/storagenode/storagenodedb.(*satellitesDB).SetAddress:40\n\tstorj.io/storj/storagenode/trust.(*Pool).Run:124\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.i
o/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-04-11T06:42:04.111Z        ERROR   piecestore      failed to add bandwidth usage   {"error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenod
e/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:723\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:435\n\tstorj.io/common/pb.DRPCPiec
estoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drp
cserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-04-11T06:42:04.111Z        INFO    piecestore      uploaded        {"Piece ID": "CTJYU4BLHE6CXIY34PZH4UCW2ARDON5ONMGRQOKB3CU6XUQ55FTQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"
, "Action": "PUT", "Size": 768}
2022-04-11T06:42:15.488Z        WARN    services        service takes long to shutdown  {"name": "gracefulexit:chore"}
2022-04-11T06:42:15.488Z        WARN    servers service takes long to shutdown  {"name": "server"}
2022-04-11T06:42:15.512Z        INFO    servers slow shutdown

And after a long string of lines, it seems like the node restarts on itself:

Error: satellitesdb: database is locked
2022-04-11 06:44:46,378 INFO exited: storagenode (exit status 1; not expected)
2022-04-11 06:44:47,386 INFO spawned: 'storagenode' with pid 15276
2022-04-11 06:44:47,387 WARN received SIGQUIT indicating exit request
2022-04-11 06:44:47,387 INFO waiting for processes, storagenode, storagenode-updater to die
2022-04-11T06:44:47.388Z        INFO    Got a signal from the OS: "terminated"
2022-04-11 06:44:47,392 INFO stopped: storagenode-updater (exit status 0)
2022-04-11T06:44:49.002Z        INFO    Telemetry enabled       {"instance ID": "1goJaKCBNoXAcojEDrXqQDm4NJYRGEaPjjDadysXZSeT3syJJH"}
2022-04-11T06:44:49.985Z        INFO    db.migration    Database Version        {"version": 53}
2022-04-11T06:44:50.354Z        INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.
2022-04-11T06:44:50.449Z        INFO    failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-
Size for details.
2022-04-11 06:44:50,449 INFO waiting for processes, storagenode to die
2022-04-11T06:44:50.963Z        INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.
2022-04-11T06:44:50.964Z        INFO    Node 1goJaKCBNoXAcojEDrXqQDm4NJYRGEaPjjDadysXZSeT3syJJH started
2022-04-11T06:44:50.965Z        INFO    Public server started on [::]:28967
2022-04-11T06:44:50.965Z        INFO    Private server started on 127.0.0.1:7778
2022-04-11T06:44:51.272Z        INFO    trust   Scheduling next refresh {"after": "1h32m26.648513425s"}
2022-04-11T06:44:51.272Z        INFO    bandwidth       Performing bandwidth usage rollups
2022-04-11T06:44:52.425Z        INFO    piecestore      upload started  {"Piece ID": "TKUFFQPMHBE6HZIFULY2QZCBVO5LXGK6E4T6NAZWTUC64ULANZCA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"
, "Action": "PUT", "Available Space": 837885647104}
2022-04-11 06:44:54,429 INFO waiting for processes, storagenode to die
2022-04-11T06:44:56.204Z        INFO    orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6      sending {"count": 26}
2022-04-11T06:44:56.204Z        INFO    orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      sending {"count": 391}
2022-04-11T06:44:56.203Z        INFO    orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs      sending {"count": 103}
2022-04-11T06:44:56.204Z        INFO    orders.1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE       sending {"count": 2}
2022-04-11T06:44:56.204Z        INFO    orders.12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB      sending {"count": 8}
2022-04-11T06:44:56.355Z        INFO    orders.12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB      finished
2022-04-11T06:44:56.544Z        INFO    orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs      finished
2022-04-11T06:44:56.616Z        INFO    orders.1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE       finished
2022-04-11T06:44:56.693Z        INFO    piecestore      upload started  {"Piece ID": "4SLQZWDPB56OYTPQ3LQBWBI6QZ2KJY7LYB7UICU46YGSHKN2TETQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"
, "Action": "PUT", "Available Space": 837885647104}
2022-04-11T06:44:56.902Z        INFO    orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6      finished
2022-04-11T06:44:57.030Z        INFO    orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      finished
2022-04-11T06:44:57.208Z        INFO    piecestore      uploaded        {"Piece ID": "4SLQZWDPB56OYTPQ3LQBWBI6QZ2KJY7LYB7UICU46YGSHKN2TETQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"
, "Action": "PUT", "Size": 42752}
2022-04-11 06:44:58,210 WARN killing 'storagenode' (15276) with SIGKILL
2022-04-11 06:44:58,210 INFO waiting for processes, storagenode to die
2022-04-11 06:44:58,435 INFO stopped: storagenode (terminated by SIGKILL)
2022-04-11 06:44:58,436 INFO stopped: processes (terminated by SIGTERM)
2022-04-11 06:44:59,893 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file 
to avoid this message.
2022-04-11 06:44:59,899 INFO RPC interface 'supervisor' initialized
2022-04-11 06:44:59,900 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2022-04-11 06:44:59,900 INFO supervisord started with pid 1
2022-04-11 06:45:00,903 INFO spawned: 'processes' with pid 9
2022-04-11 06:45:00,906 INFO spawned: 'storagenode' with pid 10
2022-04-11 06:45:00,909 INFO spawned: 'storagenode-updater' with pid 11

And afterwards, it started storing and accepting connection as usual, after it “died”, the first one coming in at:

2022-04-11T06:45:07.578Z

Would you be able to provide me any insight based on the above, as I really don’t understand what may be the cause? Internet is running on RJ-45 non-stop, electricity as well.

I really just want to have this setup done once and for all - looking forward to your prompt response as always.

Kind regards,

Not a watchtower, the storagenode-updater. It has been downloaded when the container is started.
We now publishing a docker image, which downloads the latest storagenode-updater, then it’s started and checks for a new version of storagenode, allowed for your NodeID.
So, in short - the image now is self updating.
See

These errors related to the locked databases

This is due the fact that the Docker Desktop for Windows uses the network to mount your disks. SQLite databases cannot work normally on a network drives when there is a concurrent access.
From provided excerpts I can only assume, that since the runner is died, the storagenode has been restarted.

Hello Alexey,

I appreciate the response and information on the storagenode-updater, however, I’m still not able to understand the reason for the errors which I have mentioned. I can confirm that the used disk sole purpose is to run the node, as I have not taken any actions or modifications related to that drive. No other data is being written to the drive except for the one hosted on the node.

And I’m not entirely sure, what I have to do in order to ensure that my node won’t get down each time in roughly 3 days since being started initially.

I hope that you are not suggesting that running nodes on Windows is rather “different” in comparison to other OSs?

Kind regards,
Rumen A.

Just monitor it from time to time. It should not restart without needs to do so for update.

I do appreciate the reply.

I will do.

Hey @Alexey,

Hope you are doing fine.

I recently check on my dashboard and I saw this:

Checked on the status of my node, up and running, referred to the Docker documentations in relation to statuses, this is what I have done:

docker events --since '1m' 

And the output shows that my container all of a sudden just randomly self-destructs, at least for no apparent reason up to my knowledge:

2022-04-22T12:20:46.787963100+03:00 image pull storjlabs/watchtower:latest (io.storj.watchtower=true, name=storjlabs/watchtower)
2022-04-22T12:20:59.444477200+03:00 image pull storjlabs/storagenode:latest (name=storjlabs/storagenode)
2022-04-22T12:20:59.521497500+03:00 container kill 12524c955e58a430844284a1f9e5da257784a840abf384ac28e12b44152b98ee (desktop.docker.io/mounts/0/Source=C:\Users\Rumen\Desktop\identity-hosts\storagenode, desktop.docker.io/mounts/0/SourceKind=hostFile, desktop.docker.io/mounts/0/Target=/app/identity, desktop.docker.io/mounts/1/Source=D:\node1, desktop.docker.io/mounts/1/SourceKind=hostFile, desktop.docker.io/mounts/1/Target=/app/config, image=storjlabs/storagenode:latest, name=node1, signal=15)
2022-04-22T12:21:35.441720300+03:00 container die 12524c955e58a430844284a1f9e5da257784a840abf384ac28e12b44152b98ee (desktop.docker.io/mounts/0/Source=C:\Users\Rumen\Desktop\identity-hosts\storagenode, desktop.docker.io/mounts/0/SourceKind=hostFile, desktop.docker.io/mounts/0/Target=/app/identity, desktop.docker.io/mounts/1/Source=D:\node1, desktop.docker.io/mounts/1/SourceKind=hostFile, desktop.docker.io/mounts/1/Target=/app/config, exitCode=0, image=storjlabs/storagenode:latest, name=node1)
2022-04-22T12:21:35.803023100+03:00 network disconnect 553b6d8c1ed5024e49a255062015a2d0f9fc0319fda9ad546aac58217c9ab620 (container=12524c955e58a430844284a1f9e5da257784a840abf384ac28e12b44152b98ee, name=bridge, type=bridge)
2022-04-22T12:21:36.194041600+03:00 container destroy 12524c955e58a430844284a1f9e5da257784a840abf384ac28e12b44152b98ee (desktop.docker.io/mounts/0/Source=C:\Users\Rumen\Desktop\identity-hosts\storagenode, desktop.docker.io/mounts/0/SourceKind=hostFile, desktop.docker.io/mounts/0/Target=/app/identity, desktop.docker.io/mounts/1/Source=D:\node1, desktop.docker.io/mounts/1/SourceKind=hostFile, desktop.docker.io/mounts/1/Target=/app/config, image=storjlabs/storagenode:latest, name=node1)

Can you kindly clarify or perhaps assist on what the cause for this behavior may be? Let me know if anything else data wise is required.

Thank you for your time as always!

Regarding suspension score: https://support.storj.io/hc/en-us/articles/360042257912-Suspension-mode

Regarding events - could you please post also logs from the watchtower?

docker logs --tail 20 watchtower

And from the storagenode logs on this time:

docker logs --since "2022-04-22T09:20:00Z" --until "2022-04-22T09:22:00Z" storagenode

Hey Alexey,

Watchtower log comes up as empthy, no response to the mentioned command.

This is from the storage node:

2022-04-22 09:21:43 (11.0 MB/s) - '/tmp/storagenode.zip' saved [15675435/15675435]

2022-04-22 09:21:43,487 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2022-04-22 09:21:43,495 INFO RPC interface 'supervisor' initialized
2022-04-22 09:21:43,495 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2022-04-22 09:21:43,496 INFO supervisord started with pid 1
2022-04-22 09:21:44,499 INFO spawned: 'processes' with pid 40
2022-04-22 09:21:44,505 INFO spawned: 'storagenode' with pid 41
2022-04-22 09:21:44,509 INFO spawned: 'storagenode-updater' with pid 42
2022-04-22 09:21:44,511 INFO exited: processes (exit status 0; not expected)
2022-04-22 09:21:45,516 INFO spawned: 'processes' with pid 56
2022-04-22 09:21:45,517 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-04-22 09:21:45,517 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-04-22 09:21:45,526 INFO exited: processes (exit status 0; not expected)
2022-04-22T09:21:45.988Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file key  {"Key": "operator.wallet"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file key  {"Key": "contact.external-address"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file key  {"Key": "storage.allocated-bandwidth"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file key  {"Key": "operator.wallet-features"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file key  {"Key": "operator.email"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file key  {"Key": "server.address"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file key  {"Key": "storage.allocated-disk-space"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file key  {"Key": "server.private-address"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file key  {"Key": "console.address"}
2022-04-22T09:21:45.988Z        INFO    Invalid configuration file value for key        {"Key": "log.level"}
2022-04-22T09:21:46.013Z        INFO    Running on version      {"Service": "storagenode-updater", "Version": "v1.52.2"}
2022-04-22T09:21:46.014Z        INFO    Downloading versions.   {"Server Address": "https://version.storj.io"}
2022-04-22T09:21:46.016Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2022-04-22T09:21:46.027Z        INFO    Operator email  {"Address": "address@gmail.com"}
2022-04-22T09:21:46.027Z        INFO    Operator wallet {"Address": "WALLET"}
2022-04-22T09:21:46.518Z        INFO    Current binary version  {"Service": "storagenode", "Version": "v1.52.2"}
2022-04-22T09:21:46.518Z        INFO    New version is being rolled out but hasn't made it to this node yet     {"Service": "storagenode"}
2022-04-22T09:21:46.535Z        INFO    Current binary version  {"Service": "storagenode-updater", "Version": "v1.52.2"}
2022-04-22T09:21:46.535Z        INFO    New version is being rolled out but hasn't made it to this node yet     {"Service": "storagenode-updater"}
2022-04-22 09:21:47,539 INFO spawned: 'processes' with pid 78
2022-04-22 09:21:47,549 INFO exited: processes (exit status 0; not expected)
2022-04-22 09:21:50,556 INFO spawned: 'processes' with pid 79
2022-04-22 09:21:50,567 INFO exited: processes (exit status 0; not expected)
2022-04-22 09:21:50,567 INFO gave up: processes entered FATAL state, too many start retries too quickly

I believe that watchtower initiated the update, thus it loaded watchtower and storagenode images, then it noticed the update in the storagenode image and killed the container and started another one.
Our storagenode image contain only downloader, so it downloaded the same 1.52.2 storagenode binary and restarted it, did the same for storagenode-updater binary and restarted it too.
Then both processes remain working and checking for when your node would be eligible for the next version to update.

The last part from the logs is related to this issue: Processes entered FATAL state, too many start retries too quickly and it’s not prevent storagenode from running as far as I know.
Please confirm, that your storagenode is running and you did not start it manually.

Hello @Alexey

Thank you for the explanation.

Pardon my ignorance, however, should I assume that the process of having the storagenode killed and then recreated is normal - having in mind the watchtower update? Also, could that be a reason for the reputation to go down as previously shared on my screenshot, as I’m having difficulties understanding how that works exactly.

This is the status of my node as of today:

PS C:\Users\Rumen> docker ps -a
CONTAINER ID   IMAGE                          COMMAND                  CREATED      STATUS      PORTS
                                               NAMES
34993f603333   storjlabs/watchtower           "/watchtower node1 w…"   5 days ago   Up 5 days
                                               watchtower
fa5b29cfba4e   storjlabs/storagenode:latest   "/entrypoint"            6 days ago   Up 6 days   127.0.0.1:14002->14002/tcp, 0.0.0.
0:28967->28967/tcp, 0.0.0.0:28967->28967/udp   node1

Thank you for your time as always!

The update can affect the online score, if it would be long enough to be noted by satellites.
The suspension score can be affected only in case when your node answer on audit request with unknown error. Usually it’s related to your disk subsystem, when the node is unable to read something reliable or update something. If your node would answer on audit requests (repair requests) with requested piece instead of error - the suspension score would grow, if not - falling.

Yes, it’s a normal process.

Hey @Alexey,

I do appreciate the thoughtful explanation on this matter.

With this in mind, is it possible to have the suspension score increased, I would like to ensure that my node has a well-handled and sustained reputation. Furthermore, is there something that I can carry out as improvements on my end in order to ensure that this value won’t drop again?

Thank you in advance.

You just need to fix the underlaying issue, the suspension score will be increased with each successful audit.
See https://support.storj.io/hc/en-us/articles/360042257912-Suspension-mode how to search for errors.

Hello @Alexey,

Will do - no more question for now, thank you so much!

2 Likes