Raid 5 failed. 3TB node dead. Recovered partially and going through GE thanks to forum

Yeah, i got about 3k storj with this node, during this time. Can’t complain, it paid for its datacenter hosting.
I am trying GE just to push data back into the network, and if it works, then it works. Not trying to game the system. Most certainly I’m not going to run a malfunctioning node just to cover my GE.

yeah but at 30$ a month two months is 60$ vs on GE he might get nothing… and most likely takes the same amount of time… and one will have to remake the node which means a month of vetting… well i just think in its present form GE is a bit of a joke, atleast if all the feature are implemented…

i remember getting the chills from thinking of trying go pass a GE when i read through the requirements…

and if the node is a year isn’t it at 15 months that one gets 50% of the held amount paid out anyways…
well lots of math in relation to if its worthwhile or not… but since i don’t expect to use it in its current form because i think GE requirements are ridiculous, then won’t really go check up on it for a good long time…

hopefully it will be more reasonable at that point in time, when i do feel like using it.

I’m not sure where the joke here is… if GE works, he gets 200$, if it doesn’t work, he needs a new node anyway. Sure, he might get 30-60$ before getting DQed if he is extremely lucky… But then he would have to set up a new node anyway. And if GE takes a month, he’d get 30$ for that month too.
So sure, it’s a gamble between 200$ and 30$ but the sooner you know, the sooner you can spin up a new node. (Well you can spin up a new node in any case if you have a spare drive)
In the long run I wouldn’t risk running a node that possibly lost or corrupted lots of data. I’d rather get a new and reliable node up as quickly as possible.

I did 2 GEs on stefan-benten, both without a problem. but of course I did not lose files…

2 Likes

well

because of that the GE requirement has become lower than the 15month 50% withheld amount payout amount… so because of that one would only get 100$ for a successful graceful exit, and if downtime tracking was implemented i believe the limitation is 1hour of dt a month + the audit failure restrictions before DQ goes up, and you are not paid for the GE traffic, thus might in some cases take away from the node earning while in GE.

so really you should be comparing 100$ + much higher chance of DQ and getting nothing at all… versus surfing the dying node onward to earn reliable profit putting less stress on the hardware and not limited by GE bandwidth usage…

but yeah presently it might be fine… but how the final plan looks… well i know i never will use it if GE will look like that, imo the risk vs reward seems to favor crashing and burning…
but i guess this is another part of the network that isn’t finished yet…

Damn, what are the odds of 2 drives failing so close together… Can you post if these were new or used drives? Also what brand and model?

1 Like

It’s RAID5. It’s expected to fail with consumers disks during rebuild:

2 Likes

If they were cheap or Seagate drives, it is expected for them to fail easily. Especially true for drives bought as external HDDs, these are often very bad as manufacturers count on them not being used much and possibly put the worst ones in there.

Do you have more than anecdotal evidence for that claim?

4 failed Seagates, no other brand failed (yet). Also check Backblaze stats.

so, anecdotal?

If I understand it right, they only used HGST, Seagate and Toshiba and no WD in 2019. I wonder why?

1 Like

I know person from NAS forum who loose 30 of 30 purchased seagate st3000dm001.
Personally i loose 5 of 5 purchased disks of this model.

1 Like

Western Digital acquired HGST in 2012. Same company now.

Note sure what definition you mean:

“(of an account) not necessarily true or reliable, because based on personal accounts rather than facts or research.”
It was based on personal and nonpersonal experiences. It was based on facts and I did my research, Seagates fail much more often.

“based on or consisting of reports or observations of usually unscientific observers”
I am a very scientific person.

“of, relating to, or consisting of [anecdotes]”
That is right, any kind of answer consisting loosely of what I said would be considered an anecdote.

HGST is under WD.

Sadly this anecdote is hilarious. =/

you know that just loosing a single file or god-forbid the DBs is bad???

I realize this is way late for the data at hand, but I’ve actually had ZFS save my array - twice! - from failures like this. In both cases I was lucky enough that when I had two drives fail, at least one of them was still returning mostly valid data other than a string of bad blocks. ZFS recovered what it could, and flagged the specific files it could not successfully rebuild.

Seagate had one super bad model that lead to class action lawsuits. I don’t think that’s a reason to ignore the entire brand, but I guess people have trouble letting that one go. Hey, it’s still their f up, so it’s not entirely unfair either. If you exclude this model there is no significant difference in reliability between the major brands. Some models do a little better, some a little worse.

Losing a db you can easily recover from with the worst damage being some stats on the dashboard that are wrong. They contain non-essential data. The only way your node could fail audits is when they are corrupt and can’t be accessed. But you don’t get disqualified for this, just suspended. So unless you ignore issues for at least a week, database loss is recoverable. Same for a single file. With the rate of audits, chances are you will never fail a single audit if you only lose one file. Though losing a significant chunk is obviously bad.

I was using a refurbished HP DL360P Gen8, along with some sh HP SAS (EG0900FBVFQ) drives I got from ebay. The drives had about 4 years spin time on them, no wonder they failed.

@Sasha
often people won’t know that their array has failed and only become aware of it when a drive finally dies… but drives are most often bad long before they die, so people trust their system to keep everything working and then when it breaks its like 1 disk dead and a ½ disk bad, or worn down so much that when the rebuild is started that kicks it over the edge.

raid arrays aren’t backups, they aren’t they are mitigation of the 2% or so odds that a data drive fails… mirror / raid1 is a tank, raid5 is the sports car, raid6 is the SUV, and then raid 0 would be the motorcycle.

either of them will work for what they where intended for, but more often than not when they start to break people don’t notice, if you are running a raid, you should have regular checks of all data, which ensures that, on that day it could all be read within a day and put onto a backup disk…

one should have spare drives on hand, doesn’t have to be perfect… it could be an oversized drive that fits most raid arrays on hand… ofc if you are running a raid 5 and has a spare … then you should be running raid 6 unless if its due to very special performance considerations, and most often that won’t really hold up logical deduction.

@fikros
many of my drives are going to break 10 years of spin time this year…
ofc that cannot keep up forever but thus far quite impressed… but they are enterprise sata drives if memory serves

@Alexey
you really hate raid5 don’t you xD

@Krey
well should have checked the blackblaze data before buying them…
apperently that particular model excels at 30-40% AFR

An update on my GE progress:
I seems satellite.stefan-benten.de:7777 succeeded. It’s weird, and seems too good to be true, but I got a Completion Receipt and all. I don’t know how this was possible. I can provide nodeid if devs want to debug, just ask.

2 Likes

Yes for the reason. I was a system administrator and have had 22 branches with chunks of central database and all of them are used RAID5. Too much failures. Even with enterprise drives. Even with small enough (less than 1TB). They fail during rebuild because of two drives failures.
So - yes. Please, do not use RAID5 or RAID0, except you like adventures and thrills.
Every time when the branch is failed, the only way to recover was to unload a branch’s chunk of database and transfer it to them. Of course we have had backups. But they was useless because of constant flowing of data (like in Storj) between branch’s database an central database (unlike Storj).
But result is the same - the backup is out of sync and recover the sync was a pain. So, simple to unload the chunk and send them.

As result I migrated all branches to RAID10. The problem is gone. Disks are keep failing, but no one branch has been broken since migration (more than 5 years).

4 Likes