Screwed up moving the node. Will this recover?

bigonion · November 20, 2020, 4:42pm

Still sort of new to this, been running my node for a month or so without much interaction with it. I recently moved the external drive I am using to a different computer and ran into some issues while setting it up. There was some downtime (a day) and I accidentally pointed it to the wrong folder for about an hour before I realized the error and corrected it.

I’m assuming that’s why my values here dropped, but I was curious about what this means. Will it recover over time, or does it reset? If I have to take it down again, is the 5hr window of downtime I read about still acceptable?

nzdk · November 20, 2020, 5:37pm

Hi bigonion,

I would expect it will go up - and you should be able to watch the numbers rise slowly and answer here accordingly.

However - could one of the SNO experts tell what that average is based on? How many months rolling does the statistics go back? I had 100% on all figures, even though I’ve had some downtime months ago, so it must be a rolling average. But for how long until 100% again (assuming no problems on my end)?

Toyoo · November 20, 2020, 6:18pm

I thought that the recent versions of the storage node should prevent starting up if pointed to a wrong directory?

deathlessdd · November 20, 2020, 6:19pm

I think this only applies if the mount is loss and it can’t write anymore, if you point it to the wrong dir theres no way for it to know if its right or wrong. When in doubt put the node offline first then run it to make sure all the data shows.

twl · November 20, 2020, 6:27pm

If the new (wrong) path was writable while the node was online, this might have screwed you up.

Watch your numbers for a day or so and if there are no signs of recovery, start over

SGC · November 20, 2020, 7:04pm

85% audit score isn’t good thats for sure… think you get DQ at 60, i think you should expect this node’s survival to be basically a coin flip, i would start vetting a new node… it won’t really hurt anything to start up a new one, so long as you have a few drives for it…

this node is damaged and since it’s new, its debatable if you would want to continue it because in theory you could get unlucky and hit missing or damaged files in 6 months or a year and the node could get DQ then…

ofc that might not happen, but in my view i would just regard it as damaged, and unreliable and move on, even if i will keep it running until in 6 months when i could GE it.

however if its a month old node… then it might not even be worth taking up a drive waiting for it to be able to GE…

lots of factors to take into account, i cannot give you an easy answer sorry.
but regard it as damaged and prepare to move on…

bigonion · November 21, 2020, 3:57am

Just checked and some audit numbers have dropped and was disqualified by one satellite. Starting over then. Just as well, where I moved it only had a USB 2.0 connection, with a USB 3.0 card coming Sunday. Will start anew then. Thanks for the responses!

SGC · November 21, 2020, 9:21am

External USB drives are not always viable for running a storagenode, depends a lot on the hardware.
some will work without a problem, others will need power management disabled on either USB or HDD or both, ofc in many cases it works without any signs of problems.

but keep in mind external USB drives aren’t usually made to run 24/7 and thus may also be prone to overheating during extended heavy loads, like wise USB isn’t made for 24/7 operation, it’s hardware designed for easy connections, installs and removal… these features are not complimentary to 24/7 operation, since the USB BUS tend to disconnect and reconnect devices frequently for various reasons.

when using a USB BUS for such 24/7 operations i would recommend avoiding to use that particular USB BUS only for the long term connected devices, and keep all other USB devices on other USB BUS’s
this doesn’t mean using another USB port, but ensuring that the USB BUS you are connecting your misc devices to is a separate chip, with its own drivers, so that it is essentially an isolated system and thus whatever issues that could be caused when connecting and disconnecting various devices to the USB BUS is isolated to that particular BUS and will not disrupt your storagenode.

i cannot say with certainty that using a separate USB port on the same USB controller will have the same effect, it will ofc mitigate some of the problems that can happen, maybe most of them… i’m not sure on that, which is why i would recommend using a dedicated USB BUS / USB controller.

most motherboards have multiple USB controllers anyways, so it’s just a matter of connecting your storagenode USB storage via those ports and then figure out which ports are for which controller and only use those… doing this together with disabling the power management, should allow most USB setups to work without to much trouble.

ofc it may not be a problem, but something to keep in mind if you do run into trouble in the future.
and since storagenodes potentially run for years, i would use a better safe than sorry and implement as many safety measures to avoid disruptions as possible.

ofc these days disconnecting storage won’t get your node DQ, but it will still shutdown the storagenode when it happens even if its just for a blink of an eye.

twl · November 21, 2020, 10:32am

Using SATA instead of USB is the much preferrable solution

nzdk · November 21, 2020, 12:19pm

I made that suggestion in one of my points, that at the minimum a node shouldn’t start in an easy to check, easy to detect, absolutely wrong location, but I was assured that it was a silly suggestion, because it’s somehow better to have the node destroy itself, get itself close to DQ, and … well I couldn’t wrap my head about it at that point so I stopped reading the reasoning.

All I suggested was that the node said’ “Ooops, you started in the wrong location, fix that! Node shutting down…” - but hey, that’s apparantly extremely bad and stupid, and counterintuitive despite IT STILL HAPPENS TODAY, and the proposed ‘fix’ doesn’t work.

Oh well, I guees I’m too basic to understand the apparantly nonworking fix that’s implemented instead of the (suggested my more than me) simple fix of: Simply preventing the node from starting up.

Sorry that a preventable problem also hit you.

edit > I guess there are interests in node churn, for reasons I don’t understand.

nzdk · November 21, 2020, 12:28pm

Remember this? It makes me somewhat sad.

SGC · November 21, 2020, 12:32pm

you should make a feature request vote for it… it’s a good a very useful idea, it certainly has my vote, things should be built to fail gracefully if at all possible.
preferably in multiple failure stages, so that people in ignorance cannot or have difficulty damaging / destroying it.

when i last migrated my main 14tb node, i ended up running the last rsync before shutting down the node, and thus it started complaining about a corrupt database, and i ended up leaving the new files and doing another rsync to fix the storagenode before it would run correctly again…

was my mistake by forgetting that the rsync was started before the node was shutdown… rookie move and that was still like my 4th migration in 8 months… so i should know the procedure…

luckily i went straight into the log after starting it up, so only ran a few minutes at most… so very minimal damage, duno what i would have done if i hadn’t bothered to check the log…
would have been bad thats for sure, really the storage node shouldn’t have started at all i guess since it clearly had a obvious issue plastering the logs with errors…

nzdk · November 22, 2020, 12:24pm

Made a feature request for it:

bigonion · November 22, 2020, 2:49pm

So I have a USB 3.0 card coming to add to this machine (an older ProLiant DL 380 G5). Should I not bother and instead shuck the WD Red drive that’s in the WD Elements enclosure and connect via SATA? I’ll have to do some more research (not sure if the machine will support it) but can look into that certainly. I’ve been considering a headless Rasp Pi to connect it to for hosting the drive (something I also need to research) so that’s a potential option, too.

I should have done a bit of research here before moving it. That’s totally on me. Glad I found this out a month in rather than twelve!

SGC · November 22, 2020, 3:20pm

yeah i would shuck it, if it’s white label you might need to use an adapter cable for power or desolder a resistor … sometimes they use the new power management, something to keep in mind… but it’s often not a problem.

so far as i understand anyways, and if it is worst case you have to remove a resistor

SGC · November 22, 2020, 3:22pm

usually it would go here tho… sorry should have made that more clear

https://forum.storj.io/c/parent-cat/sno-feature-requests-voting

twl · November 23, 2020, 7:07am

Yep, that’s what you should do

nzdk · November 23, 2020, 3:46pm

there you have it, there ya go…

BrightSilence · November 23, 2020, 6:24pm

Kind of surprised no one else suggested it, but you could probably have prevented disqualification if you had move the data in the blobs folder of the new wrong location to the correct blobs folder. You’d miss some metadata, but your node would have survived and stopped failing audits.

Pac · November 23, 2020, 10:13pm

@BrightSilence:

I thought the wrong location got targetted for 1 hour only: this can’t explain why his node got disqualified… ?
Surely something else was wrong?