NAS quits, SN keeps running - DQ x 3

Beddhist · August 18, 2020, 10:37am

Things were going too well for too long. Yesterday, my QNAP NAS decided it was a good idea to drop 2 disks out of a 4 disk array. Unfortunately, the SN software just kept on running. No uptime monitor helps in this situation. The first I found out about the problem was 3 HOURS later via 2 emails, saying my node had been DQ’d on 2 sats.

I had noticed around that time a couple of voltage spikes in the mains (265V), causing the UPS to go to battery for a few seconds. I can only guess that this was the trigger.

As I suspected, a cold restart of the NAS came back with 4 good disks, but the array had to be reassembled manually. (Not the first time, I’m afraid…) After that all is back to normal. All, except for Storj. I thought I was in the clear with europe-north, as I don’t have an email from that sat, but my status screen shows that it has me dq’d, too, but no email. That’s sad, as that was 80% of my 2.4TB.

For $1.70 held amount it’s not even worth starting GE, so I will just pull the plug.

The equipment was originally bought for HD crypto mining. While all this was going on the miner just logged a few errors and kept right on mining. But it only makes $0.25/week.

I am certain that this kind of thing will happen again in the future. Such is the unreliability of modern computer equipment. The question is: do I start over with Storj, or do I just plot all drives for use with the mining rig and accept a very low but continuous return?

BrightSilence · August 18, 2020, 10:49am

Compared to returns of 25 cents a week, Storj is going to be more profitable in no time, so I would definitely start over. You should also know that changes to prevent the issue you ran into are already in the works. After a future update the node will start monitoring the storage location for a specific file with the node ID. If it can’t find it or doesn’t match, the node will be stopped. This would prevent disqualification if the storage becomes unavailable for some reason. It won’t help you right now, but at least once this is released it will prevent repeat occurrences.

Beddhist · August 18, 2020, 11:43am

Thank you! That makes me very happy. Easy decision now.

twl · August 18, 2020, 12:07pm

Please check the internal SATA connectors of your NAS and run a manual SMART self-test on each drive. Also check the drives’ SATA connectors for physical damage/burned contacts. The behaviour you described is not usual business at all. My Syno with 4 HDDs has been running for 5 years straight without such issues.

Beddhist · August 18, 2020, 1:30pm

Thanks, but unfortunately I already have some experience with these units. They and many other QNAP models have a design flaw in their backplanes. A MOSFET overheats and the NAS fails one or more drives. There is a long topic on that in their forum. Both of mine have been patched (by myself, as QNAP refuses to sell parts.) by bypassing the component. I have also had a drive encounter a bad block the very moment I plugged an adjacent drive in. Inside they look pristine.

My confidence in this brand is rapidly approaching zero.

anon27637763 · August 18, 2020, 1:34pm

I’ve been looking at the Helios64 based NAS for a little while now as an addition to my personal network:

https://shop.kobol.io/product/helios64-full-bundle/

Beddhist · August 24, 2020, 8:00am

Postscript: I wiped the storage and changed the config to keep the identity on the data drive. Using the instructions on running multiple Win GUI nodes I tried to restart, but it kept complaining about missing dbs, so I ended up re-installing from scratch. Working now.

Thanks to all,
Peter.

Beddhist · October 11, 2020, 6:31am

PPS: A similar thing happened again, this time the NAS claimed all 4 disks had failed at the same time. After a reboot all disks came back as green and without errors, but SCSI array, Raid group, volume group, volume and iSCSI volume were all gone.

The big difference this time: the SN s/w had been upgraded and it detected loss of access to the drive and shut down. No DQ this time, but it was offline for a week.

Using instructions that I used before I managed to re-assemble the array, but all other data was lost. QNAP support came to the rescue, but even that took a week to resolve. So, all good and I’m back online without losing anything but time.

BrightSilence · October 12, 2020, 11:49am

Good to see that the implemented fallback saved your nodes this time! Unfortunate stuff with your QNAP. I hope it’ll hold up better this time.

Beddhist · October 30, 2020, 11:58pm

The saga continues: disks 3 & 4 failed, but showing all green, no SMART errors. This NAS is definitely faulty. QNAP support tried to fix it, but failed this time, saying disks 1 & 2 were out of sync with 3 & 4. Total failure this time.

In summary, in 3 years 50% of my NAS drives and NAS boxes have failed. Even more, if you consider that the other NAS has suffered the same failure, but bypassing the components seems to have fixed it.

I think I stick with standard PCs from now on. At least I can repair them.

Beddhist · November 1, 2020, 11:56am

Just when I thought I had to start all over again: thinking I had nothing to lose I ran 3 shell commands I found on the net and bang! The NAS tells me that my drive needs checking for error. All data is intact and accessible. It took all of about 10 minutes, including all the typos, but not counting disk error checks. Why QNAP staff this time round could not do this, I don’t know. The techie must have had a bad day.

Storj & HD miner running again.

Beddhist · November 5, 2020, 9:31am

QNAP support have answered this last question for us:

There is a risk that data cannot be recover if we do CfR command(50% get data), it will clear superblock.
After I send to HQ to check, they say it too risk if do CfR, will be better to recovery lab.

I can agree with that. However, they should give the customer the choice to make.