Hello,
I guess I made a big mistake!
Yesterday I asked when to update. I just updated one of my nodes to 1.16.1 and it locked good so far.
This morning I updated all my other nodes.
Now I check my Dashboards and the Node I updated first is not responding! The Dashboard shows offline and no other info. The CLI Script is not returning anything at all!
So I forced a reboot and now the node shows āonlineā. I think a question of time until the other nodes run into problems.
Is there any tool to check if the nodes are āonlineā despite having to check the dashboard from time to time? Uptimerrobot reports online either way as the port is reachableā¦
Well as I restarted already I cannot check now, can I?
Will check next time it happens (guess it will). The node that ācrashedā is running on 2 GB RAM.
Another node crashed / unresponsive!
Mem:
total used free shared buff/cache available
Mem: 977 617 93 2 267 381
Swap: 0 0 0
This is another node. Updated this morning to 1.16.1
All command related to the node do not return anything!
This time it ācrashedā when running: for sat in docker exec -i storagenode wget -qO - localhost:14002/api/sno | jq .satellites[].id -r; do docker exec -i storagenode wget -qO - localhost:14002/api/sno/satellite/$sat | jq .id,.audit; done
@Alexey Why does this: docker stop -t 300 storagenode
docker rm storagenode return no audits after a restart?
Also @Alexey another node just restarted itself for no apparent reason.
Docker rm also removes the docker logs for the container, so your local log files get āresetā. Your Storj data is fine because it is stored in a separate volume outside of the docker container.
The crash could be cause my an out of memory error from the file walker at startup. You can run on 2 GB, but it can get tight, especially if your hardware is slower. Can your provide more details on your setup?
What model of drives are you using (are they SMR)
How many nodes are you running on the device
What type of system is hosting this (Pi, netbook, etc.)
This node runs on 1 GB ram and is a little vm in my basement. One Core 1 GB. My other Nodes do have 2 GB. Drives for this one are im my NAS, so network drives.
I guess the problem were, that I sudo rebooted, without unmounting the drives first.
i would be careful about using the sudo reboot command, it tends to just ā¦ well reboot
without shutting anything down at allā¦ iām kinda new to linux and tho my system havenāt been hurt by this command, thatās more likely due to my setup having plp storage and all written data is forced to be sync, basically making the system mostly unaffected by cuts in power or similarā¦
ofc it losses powerā¦ so not totally unaffected, but i donāt really have a need for a UPS, so i did a more affordable version that would have basically the same end result.
but yeah donāt use sudo reboot, use sudo shutdown so that everything gets shutdown correctlyā¦
then everything will boot up on itās own on start up, which also means in case of a power outageā¦ if you have to tinker with the device to make it boot or reboot, then itās just less likely it will recover from whatever problems it may encounter
using sudo shutdown you can even see in the storagenode logs and the storagenode gets the termination signal from the OS and shutsdown and the OS will even wait for the storagenode / docker to shutdown if need beā¦
so really you shouldnāt have to use anything but the run command when you start the node and then sudo shutdown to stop the systemā¦ the node shuts down and when you boot the node simply starts backup because the run command contains the start unless stopped parameterā¦ i forget how itās defines in the run command thoā¦ something like thatā¦ you get the idea.
network drives are not recommended, it might workā¦ but usually it wonāt go well long term.
and yeah REALLY REALLY donāt use sudo reboot if you system isnāt protected against random power loss, and even then you shouldnāt use it for for convenience, even tho you in theory couldā¦ itās a bad habit, and it has a very good chance of causing damage or loss of data in the systemā¦ sure not muchā¦ but enough to make it act all weird.
Can you check your server load looks okay?
And ensure your disk isnāt about to fail?
If your setup cannot keep up (I had this issue with an SMR drive), the node software keeps stacking requests in RAM until the oom killer shuts down the nodeā¦ Dunno if it could be a similar issue in your case, I havenāt experience with network drives.
The network attached drives are not supported, even if they could work.
Any network attached drives will always have a higher latency compared to local connected.
If this setup is working for you, you should increase the available RAM in 2-4 times against local connected drives (because storagenode will cache more data in RAM because of slow storage).
However, you will have other problems as well, and the corrupted database is a smallest of them: https://forum.storj.io/tag/nfs https://forum.storj.io/tag/smb https://forum.storj.io/tag/iscsi
You are welcome!
If your NAS support docker, you can run the storagenode directly on your NAS.
The online score measures how much your node was online. 60% will suspend your node, until you resolve the issue (for that you will have 7 days), then your node would be under review for a month. If it managed to be suspended again - it will be disqualified.
At the moment this should not be enabled, but will be soon. However, the measurement is already in place and as you can see it is working. The score should recover during next 30 days online.
Iād like to point out with the binary you can run a storagenode on pretty much any NAS with terminal.
Fairly easy to setup as well. I tested on a cheap nas I got for some backups and it worked and it had 512mb of ram. So you could probably do it as well.
next up smart watches, toasters, fridgesā¦ actually that might make some pretty good promotional stuffā¦ a competition for who can run a storagenode on the most unexpected device. or something lol