Node Suspension

deathlessdd · April 28, 2020, 11:09pm

If your receiving data again means your out of suspension mode on that satellite.

bdg0296 · April 28, 2020, 11:10pm

Is there a problem with the dashboard? Because I see I am getting stuff from the ids of the satellites I am suspended on but i still get the error on the windows gui dashboard

deathlessdd · April 28, 2020, 11:12pm

I couldn’t tell you that im not running windows GUI, I have not seen the new dashboard yet, Does it actually say its suspended?

bdg0296 · April 28, 2020, 11:13pm

Yep here see

If you look at the sat ids some displayed are the ones that I am suspended on

deathlessdd · April 28, 2020, 11:14pm

Sorry I dont see anything but the dark theme which is nice. But honestly I could not tell you this for sure but maybe try to refresh it a few times or clear cache, For that I dont wanna give you the wrong information.

But from your screen shot your out of suspension mode

bdg0296 · April 28, 2020, 11:15pm

Oh it doesn’t show you the the error right below choose your satellite?

deathlessdd · April 28, 2020, 11:16pm

No im running on linux We havent gotten the update yet. But as far as I can tell you are already out of suspension mode.

bdg0296 · April 28, 2020, 11:21pm

Gotcha yeah in the dashboard for the windows gui it still displays all the satellites I am suspended on even after clear cache. However I see in the latest logs I am receiving and sending to some of those satellites if not all

After about 4 hours I the suspended errors are gone from the dashboard

donald.m.motsinger · April 29, 2020, 10:20am

Why not running the node directly on the Synology NAS?

bdg0296 · April 29, 2020, 10:59am

I was using it for multiple things at one time and had already allocated all the space to a block lun and attached it to a windows server when I found storj back in the very first beta tests. So I’ve just stayed in that configuration.

BrightSilence · April 29, 2020, 12:34pm

Despite the work it would take to change that, I highly recommend taking out that network latency over iscsi. Synology devices are perfectly capable of running the node locally and you would see better success rates on transfers, leading to more income and probably get rid of all such errors you were seeing.

littleskunk · April 29, 2020, 1:03pm

That has been fixed with the satellite deployment last week. You need a few more audits to get into or to get out of suspension mode. The storage node version doesn’t matter in this case.

littleskunk · April 29, 2020, 1:05pm

The dashboard is asking the satellite every once in a while and will then show these information for a few hours. You can force an update by restarting the storage node.

Alexey · May 17, 2020, 4:43pm

10 posts were split to a new topic: Node disqualified on all satellited right after suspension email

Alexey · May 17, 2020, 9:03pm

A post was merged into an existing topic: Node disqualified on all satellites right after suspension email

Pac · August 12, 2020, 1:43pm

Damn, my 3 disks got disconnected from my Raspberry Pi4 and my nodes got suspended.

It is nice to get suspended instead of disqualified (I’m not sure why though, I thought when data on the disk was not available while a node is running it would lead to DQ, I hope I’m not being disqualified as I type - Unless @BrightSilence’s idea of putting a test file on storage devices was implemented in 1.9.5 maybe? That’d be cool)
This said, considering that having disks that disconnect randomly is pretty bad for the network, I would get why my nodes would get DQed, but man that would be a real shame…

With regards to the suspension score on the dashboard: I think there’s room for improvement:

The dashboard does not display any suspension score when we are on the “All satellites” entry. It would be really handy to have the worse score displayed when “All satellites” is selected (same for audit score), so we can quickly glance at our node’s health.
The dashboard keeps responding correctly from an HTTP point of view, because of that uptimerobot does not track any anomaly when data is no longer available, and so we don’t get any notice. Is there a way to get notified when something goes wrong? My nodes got disqualified around 7am GMT today and I still did not get any e-mail from storj ~6 hours later, not sure it’s normal…

Dunno what could cause all disks to disconnect like that at the same time, that’s kind of a big issue if this keeps happening randomly I did not think of checking dmesg unfortunately, I should do so next time it happens…

I keep having tons of issues really with my Raspberry pi4, I’m not sure it’s a great platform for running storage, it is the least robust computer I’ve ever had
I’ll try to update its firmware to the latest version, maybe it’ll improve things…

Pac · August 12, 2020, 2:12pm

Oh wait, it did it again.

Basically my disks are all connected to the same HUB.
So I thought I would disconnect the latest disk I added and connect it to a USB3 port directly to the Raspberry pi4 (as it has 2 USB3 ports), just in case it would be the one to cause the issue, I thought it would isolate it.

But unplugging it from the hub and then plugging it to the second USB3 port on the raspberry pi caused the issue again: all disks got disconnected and weren’t visible anymore in the system (df not listing them anymore).

I took the time to copy dmesg and syslog just in case this time, I’ll need to find someone who knows how to read those now

BrightSilence · August 12, 2020, 2:25pm

Not yet, but it’s being worked on.

Look at this suggestion here: The most important information about the health - #14 by Pac
Perhaps it deserves your vote.

I think the focus right now is on making the node stop with a fatal error if the storage disappears. If that happens, you get the notification from uptime robot as well. The change is visible here: https://review.dev.storj.io/c/storj/storj/+/2272

Are your disks all using external power? If not it could be that the USB ports don’t provide enough power.

I would also recommend checking your logs for lines with both GET_AUDIT and failed in them.

Pac · August 12, 2020, 2:33pm

Hey @BrightSilence, thanks for the link to the idea, I’ve added my vote as it does go in the right direction I think.

The HUB is self-powered, and I never encountered such an issue before when I had only 2x 2.5" disks connected to it. It is supposed to deliver way enough power for 5 or 6 disks, I did the Math.

The third disk I added recently is a 3.5" HDD in an external casing that comes with its own power supply, so I doubt power is the issue.

With regards to logs, I disabled all of them to go easy on the SD card, but maybe I should re-enable them until I figure out what’s wrong. But the audit score on the dashboard still shows 100% everywhere. But… is it still the lifetime score? If it is, that’s pretty useless…

deathlessdd · August 12, 2020, 2:42pm

I had a similar issue with an external powered enclosure, First I thought it had to be the rpi not getting enough power but then I put it on a 3amp usb. Then it constantly happened still. I then came to a conclusion that it was the enclosure itself that was actually failing once I replace it issues stopped happening. I had started a new node and it was DQed within hours of these issues happening.

It also only happened when the node was getting tons of ingress otherwise it never happened.