Since the unit is still under warranty, would this suffice to request a replacement drive?
This drive has been busy 100% of the time for the last few days, maybe once it gets some room to breathe it will be able to reallocate some of those C5 “pending sectors”?
well if you have bad sectors that’s usually pretty bad… also the temp is kinda high… 50 C is about where you start to feel pain… hdd’s should be like max 40… warm but not painfully so…
granted i have had harddrives run fine for extended periods over years at … hot enough that the damn metal got discolored and no issue… but only got one or two drives that will work in that old overheating laptop… it basically eats anything else i put in it… xD
i would certainly RMA that drive on warranty…
tho you should setup a second node to take the load off your SMR node… 5400RPM is not great for Storj… and 5400 SMR… well it might just work itself to death… or overheat from working to hard… it might not be built for 24/7 operation and especially not that those levels of work load…
if you want to be really good to it… you add 2 extra nodes … then each SMR will only carry 1/3 of the ingress… but two nodes will in and of itself help a ton…
I’d call 44 deg just about fine, judging from experience
I am currently cloning the SMR drive to an CMR on anyways, just noticed the bad sectors. Using this SMR HDD was kind of an experiment in the first place.
→ see my post from another tread
But it is advertised as such since it is a Seagate Exos series unit.
I find it curious to see both C5 and C6 at the same value, that lead me to thinking both vlaues might recover once the drive had time to clean up things. I will opt for RMA still
well then it just eliminates that reason for failure…
I just checked the 11 drives in my server, the highest i got is 32C(ssd) next one is 30C and then rest are below 25… so yeah i think 44 is pretty toasty… not like something that would usually worry me… but for a drive that has to run 24/7, it certainly won’t have helped its durability…
like you can see here…
apperently my drives are to cold… it’s always something lol… not sure how reliable those numbers in that study is tho… some of the graphs look weird… like why are there a big spike in year 3…
but still the trend across the board is higher failure rates in drives as temps go up… would have been interesting to see a bigger thermal spread on the drives tho… so we could see 50-60 maybe even 70 C and if the trend continues… but i suppose with my own drives in my old laptop … i know that at some point most hdd’s will just start to die from temp… even if a few rare ones will not…
think i damaged 4 hdds in the toasty laptop i have… two didn’t mind at all tho…
but still that kinda shows that if we go higher in temp most of them will cook…
my guess would be its the 100% utilization and high temp that is killing it…
interesting graph in the pdf also, look how high AFR becomes with high utilization within the 3 months of using a new drive…
i suppose it’s a bit like an engine, it needs to run for a bit before everything runs smoothly… could very well be the exact same thing…
maybe the temp’s is why i had so many hdd’s die… they sort of got down to -17 last winter seems to run fine tho… but i also setup my system so that it doesn’t have to put much load on either drive… to ensure better temps and durability.
i just found it interesting that most drives die because of the heat in that laptop… tried switching to ssd’s also but those get back sectors within days or even hours… and it was the thing that kind underlined for me just how important hdd temp was for hdd life…
ofc there will be very different temperature ratings of different tech and different design variations / use cases…
before that even tho i worked in IT for a couple of decades with consumer gear then it was never really something i paided or had to pay any attention to… atleast not until recently
ofc everybody knows heat is bad for electronics and generally bad for most stuff in excess… i mean stuff deforms and melts eventually…
nah it’s simply bad thermals and a fan i never bothered replacing + whatever crap that has collected over 8 years… eventually i figured out that one needs to replace the oil from time to time… then the fans will keep running forever it seems. had two of that specific laptop model… this surviving one is a combination of parts from the first one, which cooked itself on a hot summer day…
i think it’s just down to heat… but it gets insanely hot, haven’t measured it exactly… but in the furnace radiators i aim for 65 degrees, and it’s way beyond that level of temp and thats just the parts i can touch when laptop assembled and running for a while… so atleast 70+ and then it’s most likely even hotter inside… but i kinda resigned it to the scrap pile… but it still works for some stuff, so if i need something to run some random machine like laser cutters and 3dp… then i hook it up… so i just want it to kinda work on minimal loads and maybe not burn… but most of the stuff in my workshop won’t burn… so meh… i leave it so it doesn’t catch anything else on fire if it decides to combust one day lol
it was a compaq so no surprise there… worst laptops i ever owned…
smartctl is pretty okay… nothing great tho…
i’m sure there are gui based stuff for ubuntu if you prefer that…
the data / system on the harddisk are called SMART…
and there will be 100 different flavors… i run mine from a terminal usually, so i just apt install smartctl
then you do like smartctl --all /dev/sda
keep in mind that the data can be difficult to read in some program, and some of the more advanced will try and estimate the health of the drive or predict failures…
some like backblaze have had good luck in using a few of the numbers… such as hdd temp, read errors and a few other metrics to predict failures…might have been ibm that did that tho…
not 100% reliable tho… but usually will give warning signs… ofc many hdd’s will die from traumatic damage… killed one a month or so ago because i rather forcefully walked into the big semi unstable table i have my server still located on…
but it was an old one… so… might have happened soon anyways… smart will ofc not give you any warning on that… but i saw that happening a mile away and yet i still haven’t gotten around to getting a proper rack mount for it…
but yeah long story short… look for stuff that deals with smart data or reads smart data… should be tons and tons of them around…
i would just take what ever is in the ubuntu package… i’m sure they got one either installed or half way there atleast…