Node Suspension

BrightSilence · August 12, 2020, 2:54pm

You should redirect them to a file on the storage location instead. It’s never a good idea to turn off logs.
Although you may not want to do this if your HDDs are SMR.

Make sure you are updated to 1.9.5. You should see two different scores now. The suspension score should be the one that dropped in your case. Neither are lifetime percentages anymore. They show recent performance.

anon27637763 · August 12, 2020, 3:47pm

I have had this exact problem with about 50% of the RPis I’ve used over the years.

I had a RPi V1 that rebooted if I plugged in a USB thumb drive… after searching all the forums, I used a 3A power supply… and had exactly the same problem. Oddly enough, I’ve had the most stable experience with the RPi Zero W… that’s the $15.00 USD board. This might be because the RPi Zero W does not have an on-board USB hub. The wireless interface is separate and the OTG USB interface is a really neat thing to work with…

I know many people probably run RPi nodes with varying degrees of hardiness. But I, personally, would not recommend the RPi as a platform for any production level application. Even the RPi 4 has partially shared bandwidth through its USB 3 ports. The SoC is subject to overheat conditions if not properly heat sinked and ventilated. And the overall power architecture is not well designed to handle high bandwidth, high I/O conditions.

baker · August 12, 2020, 5:08pm

Are you using uptime robot to check http-get requests for node monitoring? Or are you monitoring the node port (default 28967)? You should monitor the node port, not the web interface. But either way, if the node ends up having continuous restarts, uptime robot might not catch it for a while.

andrew2.hart · August 12, 2020, 5:35pm

The pi 4 can easily run a storagenode.
USB 3 is easily enough bandwidth to run a storagenode, like 10x enough.
but
You only get a few watts on all the usb all together.
If you draw 4 watts, you lose your drives.

I’ve played with the limits with various setups you can see in the pictures of your storagenode thread. One option is to “backpower” the usb slots. Another is a y-lead where the power comes from a separate supply

striker43 · August 12, 2020, 8:56pm

Yes. If you want to run a pi with an hdd, you should use an external hdd with separate power supply.

Pac · August 13, 2020, 12:18pm

Thanks for your feedback, I did not even consider that as an option. If problems keep happening, I will try another enclosure to be sure

I hear you. 2 out of my 3 disks are SMR though. So the only option for me would be to put all logs to the only CMR drive I have. Maybe I should do that, sounds like kind of a good idea in fact.

That is great news!!

I believe you, considering all the issues I faced for the past 9 months of trying out Storj nodes on my RPi 4… I find that platform to be quite unreliable in the end. Which is a real shame because on the other hand its low power consumption is very interesting.
With regards to heat issue, mine is ventilated and never goes beyond 50°C so it’s fine.

I’m monitoring the API call to “[my-node-host-and-web-port]/api/sno/satellites”. By monitoring the Node port (28967), would it catch an issue if a HDD disconnects?

@andrew2.hart & @striker43 :
“You only get a few watts on all the usb all together” => I learned that the hard way, yes. But now everything is powered with an external self-powered HUB delivering way enough power, so unless I’m out of luck and this USB HUB is defective, this should not be the root cause of the issue.

Thanks all for your insightful feedback

baker · August 13, 2020, 1:11pm

The node will only register as offline if the node is down. So in terms of docker, I think the container has to be in a non-running state. If the container is shut down, or in a restart loop it would catch it.

From what I have seen in the forums (as you have also mentioned), most people find that the node continues to run and will respond to pings on the port even when the storage HDD disconnects. I feel like some set of circumstances specific to your node caused it to fail completely. Did you check the kernel logs from that time period?

Pac · August 13, 2020, 1:16pm

@baker : The first time it failed, I did not check anything before rebooting it.

The second time it failed, I did copy what dmesg returns and the content of the syslog, but it’s all Greek to me (no offense intended to any Greek speaker if any).
I still have them at hand if needed.

If you’re not talking about what dmesg returns, where can “kernel logs” be found?

baker · August 13, 2020, 1:24pm

You can check kernel logs in /var/log . On my system (Debian Stretch) there is kern.log, which should have info around drives disconnecting, and probably a lot of other info. On most systems the logs are rotated, so you might have to open an old log file to find what you need. These might be called kern.log.1 or kern.log.1.gz, etc. There are many other logs there which you could poke around in, including syslog.

Messages around disk disconnection should be pretty easy to find by searching for disk identifiers (sda1, sda2, etc) in the logs.

Alexey · August 13, 2020, 6:35pm

You can try to use a journalctl. It exist on many distro. Maybe yours do have it too.
Then you can search for any issues related to disk or ram, maybe container has been terminated