Recently there have been several reports of SNOs running into massive CPU wait issues. It seems the common thread has been the use of HDD’s in which manufacturers have quietly started using SMR technology.
Because of these overlapping tracks, when data is written to the disk, the drive has to rewrite several additional tracks. This obviously leads to much slower writes. These HDD’s usually deal with that using large caches and rewriting the shingles tracks during quiet times. However recent storj traffic has been constant and fairly heavy. It seems this is causing systems using these SMR based drives to topple.
Unfortunately the manufacturers are incredibly cagey about this whole thing. No mention of the use of SMR on spec sheets or any official communication. Luckily the website blocksandfiles.com has done some research and written articles about the three major HDD manufacturers and their use of SMR.
I guess for now we all need to be very careful and play a little detective before buying HDD’s. And hopefully public pressure will make them change their minds about hiding this vitally important difference between SMR and PMR drives.
Edit: Some manufacturers have now published a list of HDD’s currently using SMR. These lists are likely outdated soon, but it’s a start. They’re promising to be more transparent about the use of the tech as well. Here are links and models.
List provided by Seagate is unfortunately missing model numbers. It also seems to be limited to consumer drives, Exos is missing.
I added that Exos X series are SMR free, but Exos E series use SMR.
Use ls /dev/disk/by-id and check if the disk model number is affected in the articles linked above. Fortunately, my disks are EFRX, PURZ and EMAZ so I don’t seem to be affected. I’ve done several successful rebuilds as well, confirming this.
I linked the relevant articles in my post. WD seems to have used it in some NAS disks, but you can tell from the makeup of their model numbers. Which hopefully also applies to some future models.
For seagate it applies to a desktop model and some barracuda models. Seagate Exos E series also uses SMR.
Toshiba used it in desktop drives as well.
The article lists some model numbers, but I don’t know if those lists are complete. And I doubt they’ll update the articles when new HDD’s are released. Luckily some googling can help out. But I’m intentionally not listing specific models because an incomplete list could be worse than no list at all. Just make sure yourself that any model you want to buy doesn’t use SMR.
Currently, Western Digital’s WD Red 2TB-6TB drives are device-managed SMR (DMSMR). WD Red 8TB-14TB drives are CMR-based.
That’s interesting. Bigger drives are normal, but smaller ones are SMR, kind-of opposite of what I would expect. I guess I won’t be buying Red drives anymore.
Yes it seems to be a cost saving measure more than an actual density problem. I believe all manufacturers have HDD’s up to 16TB without SMR. So it’s not needed to get to such sizes.
What makes it worse is that the SMR versions of these drives aren’t even cheaper for the consumer buying them. It seems they’re pocketing the difference instead.
Looks like some SMR drives can make use of TRIM to help keep the drive ready for writing. TRIM Command Support for WD External Drives I wonder if the OS will automatically use trim on a non SSD drive though?
I have a WD 12TB usb drive and you can actually hear it when it does its SMR housekeeping
For a few seconds it makes a noise, several times a day.
No mention of SMR in the box or description at all.
It’s been blowing up over the past few days since blocksandfiles.com posted their articles a few large publications posted it and got the snowball running. Today one of my podcasts discussed it as well.
So, it’s good to see it getting some exposure as that’s the only way manufacturers might change their mind around the secrecy. And it will definitely cause reviewers to always include this info.
While testing the compression radio I noticed that two drives of my storj array are much slower than the rest. It turns out that they are SMR (ST4000DM004), I have bought them a while ago and used them on Storj v2. So this has been happening for a while now.
At the time I bought the drives, I was looking for cheap drives, but decided to buy Barracuda (desktop) on the assumption that they would not be SMR (Archive drives were a bit cheaper).
However, it turns out that in a raidz2 array with 4 normal drives and two SMRs, L2ARC and ZIL these ones behave acceptably, at least with the current node traffic. I probably would not gain much by replacing them with two fast drives, well, at least while these ones still work. However, if I need to expand the array I’m going to buy normal drives.
hello
what is your “current node traffic” ?
I created a node 3 weeks ago, with a raspberry and an ST4000DM004, a few days ago the node crashed, it was receiving mode than 200GB / day for 3 days, and I’m almost sure the source of the problem is the drive not responding while “reorganizing” the data.
I found a documentation about the ST4000DM004 (and other variants from 2 to 8 GB) which said “Rated workload | Average annualized workload rating: <55 TB/year. …”
55TB / 365 = 0.150TB = 150GB/day
I guess those disks can be used if the traffic is not too high, but exceeding some limit they do not respond for too long and the system disconnect them :s