Cut power to disk and now the node won't restart

bagabagabaga · March 24, 2023, 8:31am

I have a pretty unsophisticated setup for my node: an old PC with linux and an external HDD and each of them has their own power plug. Last night I unplugged the HDD (but not the laptop) thus generating a series of errors that is unknown to me. Then i tried to restart the node from docker doing

docker restart -t 300 storagenode

and watching the result through

docker ps

what i saw was that the node started for about 10-11 seconds and then restarted.
I was busy this morning so i went to bed after trying the “docker run…” command (the one used to start the node for the first time), with hope that something would be better.
Issuing the dashboard.sh command gives me this:

$ sudo docker exec -it storagenode /app/dashboard.sh

2023-03-24T08:30:43.279Z	INFO	Anonymized tracing enabled	{"Process": "storagenode"}
2023-03-24T08:30:43.283Z	INFO	Identity loaded.	{"Process": "storagenode", "Node ID": "..."}
Error: rpc: dial tcp 127.0.0.1:7778: connect: connection refused

Got tons of emails this morning saying my node is offline.
What do I do? Delete the image and start again with the same command and same Identity?
I can give the on-screen information that linux produced at the moment of unplugging, but first have to go home.

Thank you

padso · March 24, 2023, 10:06am

Can you start the container and post the last 50 log lines ?

bagabagabaga · March 24, 2023, 5:05pm

Here you go

2023-03-24 17:02:14,305 INFO gave up: storagenode entered FATAL state, too many start retries too quickly
2023-03-24 17:02:15,308 WARN received SIGQUIT indicating exit request
2023-03-24 17:02:15,310 INFO waiting for processes-exit-eventlistener, storagenode-updater to die
2023-03-24T17:02:15.309Z	INFO	Got a signal from the OS: "terminated"	{"Process": "storagenode-updater"}
2023-03-24 17:02:15,315 INFO stopped: storagenode-updater (exit status 0)
2023-03-24 17:02:16,319 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
2023-03-24 17:02:20,083 INFO RPC interface 'supervisor' initialized
2023-03-24 17:02:20,085 INFO supervisord started with pid 1
2023-03-24 17:02:21,091 INFO spawned: 'processes-exit-eventlistener' with pid 12
2023-03-24 17:02:21,099 INFO spawned: 'storagenode' with pid 13
2023-03-24 17:02:21,107 INFO spawned: 'storagenode-updater' with pid 14
2023-03-24T17:02:21.147Z	INFO	Invalid configuration file value for key	{"Process": "storagenode-updater", "Key": "log.output"}
2023-03-24T17:02:21.148Z	INFO	Anonymized tracing enabled	{"Process": "storagenode-updater"}
2023-03-24T17:02:21.153Z	INFO	Running on version	{"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.74.1"}
2023-03-24T17:02:21.154Z	INFO	Downloading versions.	{"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2023-03-24T17:02:21.221Z	INFO	Anonymized tracing enabled	{"Process": "storagenode"}
2023-03-24T17:02:21.224Z	INFO	Operator email	{"Process": "storagenode", "Address": "federicobagattoni61@gmail.com"}
2023-03-24T17:02:21.224Z	INFO	Operator wallet	{"Process": "storagenode", "Address": "0xBC12375F4ba1eE1e4522890b963BFcd083e1620B"}
Error: Error starting master database on storagenode: group:
--- stat config/storage/blobs: no such file or directory
--- stat config/storage/temp: no such file or directory
--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory
2023-03-24 17:02:21,228 INFO exited: storagenode (exit status 1; not expected)
2023-03-24T17:02:21.706Z	INFO	Current binary version	{"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.74.1"}
2023-03-24T17:02:21.706Z	INFO	New version is being rolled out but hasn't made it to this node yet	{"Process": "storagenode-updater", "Service": "storagenode"}
2023-03-24T17:02:21.725Z	INFO	Current binary version	{"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.74.1"}
2023-03-24T17:02:21.725Z	INFO	New version is being rolled out but hasn't made it to this node yet	{"Process": "storagenode-updater", "Service": "storagenode-updater"}
2023-03-24 17:02:22,727 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-03-24 17:02:22,734 INFO spawned: 'storagenode' with pid 41
2023-03-24 17:02:22,736 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-03-24T17:02:22.889Z	INFO	Anonymized tracing enabled	{"Process": "storagenode"}
2023-03-24T17:02:22.894Z	INFO	Operator email	{"Process": "storagenode", "Address": "federicobagattoni61@gmail.com"}
2023-03-24T17:02:22.894Z	INFO	Operator wallet	{"Process": "storagenode", "Address": "0xBC12375F4ba1eE1e4522890b963BFcd083e1620B"}
Error: Error starting master database on storagenode: group:
--- stat config/storage/blobs: no such file or directory
--- stat config/storage/temp: no such file or directory
--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory
2023-03-24 17:02:22,900 INFO exited: storagenode (exit status 1; not expected)
2023-03-24 17:02:24,909 INFO spawned: 'storagenode' with pid 47
2023-03-24T17:02:25.047Z	INFO	Anonymized tracing enabled	{"Process": "storagenode"}
2023-03-24T17:02:25.051Z	INFO	Operator email	{"Process": "storagenode", "Address": "federicobagattoni61@gmail.com"}
2023-03-24T17:02:25.051Z	INFO	Operator wallet	{"Process": "storagenode", "Address": "0xBC12375F4ba1eE1e4522890b963BFcd083e1620B"}
Error: Error starting master database on storagenode: group:
--- stat config/storage/blobs: no such file or directory
--- stat config/storage/temp: no such file or directory
--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory
2023-03-24 17:02:25,056 INFO exited: storagenode (exit status 1; not expected)
2023-03-24 17:02:28,067 INFO spawned: 'storagenode' with pid 54
2023-03-24T17:02:28.229Z	INFO	Anonymized tracing enabled	{"Process": "storagenode"}
2023-03-24T17:02:28.235Z	INFO	Operator email	{"Process": "storagenode", "Address": "federicobagattoni61@gmail.com"}
2023-03-24T17:02:28.235Z	INFO	Operator wallet	{"Process": "storagenode", "Address": "0xBC12375F4ba1eE1e4522890b963BFcd083e1620B"}
Error: Error starting master database on storagenode: group:
--- stat config/storage/blobs: no such file or directory
--- stat config/storage/temp: no such file or directory
--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory
2023-03-24 17:02:28,244 INFO exited: storagenode (exit status 1; not expected)
2023-03-24 17:02:29,246 INFO gave up: storagenode entered FATAL state, too many start retries too quickly
2023-03-24 17:02:30,250 WARN received SIGQUIT indicating exit request
2023-03-24 17:02:30,251 INFO waiting for processes-exit-eventlistener, storagenode-updater to die
2023-03-24T17:02:30.252Z	INFO	Got a signal from the OS: "terminated"	{"Process": "storagenode-updater"}
2023-03-24 17:02:30,258 INFO stopped: storagenode-updater (exit status 0)
2023-03-24 17:02:31,261 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

bagabagabaga · March 24, 2023, 5:07pm

The reply i post just goes away i dont know why

BrightSilence · March 24, 2023, 5:40pm

bagabagabaga:

--- stat config/storage/blobs: no such file or directory
--- stat config/storage/temp: no such file or directory
--- stat config/storage/garbage: no such file or directory
--- stat config/storage/trash: no such file or directory

Looks like it’s unable to find the core folders for your storagenode storage. Make sure your HDD is mounted correctly and at the same mount point it used to be at. You may also need to run fsck on it since you had an unclean shutdown of the HDD which can cause corruption (and sometimes make it impossible to mount).

bagabagabaga · March 24, 2023, 9:59pm

Running

$ findmt --fstab --evaluate

and it returns the right mount point
Then i ran

$ fsck /dev/sdc1

and it returned

fsck from util-linux 2.37.2

and nothing else.
I do not know how to check if the mount point is still valid.

BrightSilence · March 24, 2023, 11:15pm

Make sure to unmount it first then run fsck -C /dev/sdc1 to get output with a progress bar. It doesn’t show anything while it’s working by default.

bagabagabaga · March 25, 2023, 10:15am

I managed to restart the node.
First I unmounted it, but it seemed to be already unmounted, then I ran fsck and I mounted it again in the same location.
After that the node normally restarted without problems.

If you don’t bother I have another question:
I searched for the maximum time the node can stay offline and found different values, Storj says I have to be online for 99.5% of the time whereas in this forum I found 220 hours or so. What is the actual value?
This month I will have to undergo FTTH installation and I don’t know how much I will stay offline.

BrightSilence · March 25, 2023, 10:25am

Both? ToS still say you need to be online for 99.3% of time. But you don’t get suspended unless your uptime drops below 60% at the moment.
There is one other thing to keep in mind. If you’re offline for more than 4 consecutive hours, your data will be marked as unhealthy and repair will start. So you will start losing some data to repair. (about 4% at the start, which then slowly increases over time)