I guess I made a big mistake!
Yesterday I asked when to update. I just updated one of my nodes to 1.16.1 and it locked good so far.
This morning I updated all my other nodes.
Now I check my Dashboards and the Node I updated first is not responding! The Dashboard shows offline and no other info. The CLI Script is not returning anything at all!
Another node crashed / unresponsive!
total used free shared buff/cache available
Mem: 977 617 93 2 267 381
Swap: 0 0 0
This is another node. Updated this morning to 1.16.1
All command related to the node do not return anything!
This time it “crashed” when running: for sat in docker exec -i storagenode wget -qO - localhost:14002/api/sno | jq .satellites.id -r; do docker exec -i storagenode wget -qO - localhost:14002/api/sno/satellite/$sat | jq .id,.audit; done
@Alexey Why does this: docker stop -t 300 storagenode
docker rm storagenode return no audits after a restart?
Also @Alexey another node just restarted itself for no apparent reason.
Docker rm also removes the docker logs for the container, so your local log files get ‘reset’. Your Storj data is fine because it is stored in a separate volume outside of the docker container.
The crash could be cause my an out of memory error from the file walker at startup. You can run on 2 GB, but it can get tight, especially if your hardware is slower. Can your provide more details on your setup?
What model of drives are you using (are they SMR)
How many nodes are you running on the device
What type of system is hosting this (Pi, netbook, etc.)
i would be careful about using the sudo reboot command, it tends to just … well reboot
without shutting anything down at all… i’m kinda new to linux and tho my system haven’t been hurt by this command, that’s more likely due to my setup having plp storage and all written data is forced to be sync, basically making the system mostly unaffected by cuts in power or similar…
ofc it losses power… so not totally unaffected, but i don’t really have a need for a UPS, so i did a more affordable version that would have basically the same end result.
but yeah don’t use sudo reboot, use sudo shutdown so that everything gets shutdown correctly…
then everything will boot up on it’s own on start up, which also means in case of a power outage… if you have to tinker with the device to make it boot or reboot, then it’s just less likely it will recover from whatever problems it may encounter
using sudo shutdown you can even see in the storagenode logs and the storagenode gets the termination signal from the OS and shutsdown and the OS will even wait for the storagenode / docker to shutdown if need be…
so really you shouldn’t have to use anything but the run command when you start the node and then sudo shutdown to stop the system… the node shuts down and when you boot the node simply starts backup because the run command contains the start unless stopped parameter… i forget how it’s defines in the run command tho… something like that… you get the idea.
network drives are not recommended, it might work… but usually it won’t go well long term.
and yeah REALLY REALLY don’t use sudo reboot if you system isn’t protected against random power loss, and even then you shouldn’t use it for for convenience, even tho you in theory could… it’s a bad habit, and it has a very good chance of causing damage or loss of data in the system… sure not much… but enough to make it act all weird.
Can you check your server load looks okay?
And ensure your disk isn’t about to fail?
If your setup cannot keep up (I had this issue with an SMR drive), the node software keeps stacking requests in RAM until the oom killer shuts down the node… Dunno if it could be a similar issue in your case, I haven’t experience with network drives.
The network attached drives are not supported, even if they could work.
Any network attached drives will always have a higher latency compared to local connected.
If this setup is working for you, you should increase the available RAM in 2-4 times against local connected drives (because storagenode will cache more data in RAM because of slow storage).
However, you will have other problems as well, and the corrupted database is a smallest of them: https://forum.storj.io/tag/nfs https://forum.storj.io/tag/smb https://forum.storj.io/tag/iscsi
You are welcome!
If your NAS support docker, you can run the storagenode directly on your NAS.
The online score measures how much your node was online. 60% will suspend your node, until you resolve the issue (for that you will have 7 days), then your node would be under review for a month. If it managed to be suspended again - it will be disqualified.
At the moment this should not be enabled, but will be soon. However, the measurement is already in place and as you can see it is working. The score should recover during next 30 days online.
I’d like to point out with the binary you can run a storagenode on pretty much any NAS with terminal.
Fairly easy to setup as well. I tested on a cheap nas I got for some backups and it worked and it had 512mb of ram. So you could probably do it as well.