PSA: Beware of HDD manufacturers submarining SMR technology in HDD's without any public mention

This is really bad rep for WD indeed, I hope it won’t have too much effect on the HDD market (other than the effect of manufacturers learning how to not thread customers, of course).

And yes, it’s Synologys best bet if they don’t want to be flooded with support request once users fill up their SMR drives or have to rebuild an array.

3 Likes

I’ve also noticed that most HDDs (at least NAS ones) listed on Amazon now explicitly say CMR or SMR directly in the product title. You can search for “CMR NAS HDD” and get really useful results.

2 Likes

They probably started getting a lot of refund requests.

Some good news from WD. CMR disks will now be branded as Red Plus. So we can now just buy Red Plus or Red Pro and be sure we get CMR drives.

5 Likes

Of course they’ll charge a premium for the privilege… :roll_eyes:

1 Like

Exactly what i was about to say. The days are numbered where we can buy an external USB enclosure and shuck it for a nice WD Red/Helium CMR drive…

Update: after my node switched to version 1.6.4: The SMR disk does stall eventually, after 3-4 hours of ingress which is an improvement.

But I still have to stop it if I don’t want it to eat all my RAM up, crash completely or whatnot…

So… putting used serials in RAM was definitely a great idea, but unfortunately this enhancement is not enough for some SMR drives :sweat:

This said, I’m sure it is still going to put less pressure on all disks, which is a good thing :relieved:

3 Likes

In the end, I played with the --storage2.max-concurrent-requests, and I had to go down to 4 for the disk to stay responsive with no sign of stalling. It’s been receiving ingress for almost 20 hours now (at a pace of approximately 1.85GB/hour) , and it still has decent response times and the load average of the system is stable (between 1 and 2).

So… for me and after several weeks of tests, tinkering and article reading about SMR disks, that’s the only option I found to make sure my SMR node does not crash: limiting its number of concurrent requests to 4. Which is a sadly low number, especially as the disk can perform way way better than that when it is not reorganizing its own data. But once it has no more CMR space available and starts writing SMR sectors, then it can only handle so many (or should I say so few…) Storj requests per seconds.

And although I really get it’s not ideal especially from a customer perspective, SotrjLabs will have to decide whether this is acceptable to have SMR nodes rejecting massive number of ingress so they stay alive.

As already mentioned somewhere in the forum (even maybe in this thread, don’t remember), I really think an improvement would be for the node to automatically adjust this setting depending on the current responsiveness of the disk, because it really feels like the node should start rejecting requests when the storage device can’t keep up, and not just based on a fixed setting like --storage2.max-concurrent-requests.

Easy said, hard to do probably :slight_smile:

Anyways, I can finally sleep without the fear of my node crashing at any moment now! Pheww :blush:

6 Likes

The idea behind this being that this could cause some uploads to the network to fail, from a customer point of view.

However, surely the Storj network has been designed to reroute/retry/other-magic things so customer requests fail as rarely as possible… ? At worst they should be delayed I guess? :neutral_face:

I haven’t seen those errors posted in a while, likely because most people stopped using this setting. But the upload actually fails with an error. At least at the time there was no automatic retry, zombie segments could also be a problem for that. It might not be possible to immediately retry. For what it’s worth, if your node is the only one or one of the very few for an upload to reject the piece, the upload will be fine. If too many reject, it’ll fail though.

This might be of interest in this thread as well:

Got a bit confused.
As i understood SMR is NOT recommended for RAID arrays.

But what about using single external USB drive, that is SMR? Will it also couse the problems?

SMR causes problems with any sustained write load. For normal external USB drive uses SMR is great. Even for a game disk for example it’s perfectly fine. Reads aren’t any slower and writes are either small or incidental on such HDD’s. SMR is a great tech for many use cases, but sustained load will never give the HDD time to do its internal reshuffling required to settle the CMR cache onto the shingled areas.

Unfortunately, Storj can be considered as a sustained load for these purposes. Additionally the db files get many small writes. As a result the HDD doesn’t get the time to write the CMR cache to the SMR areas and will at some point pretty much stall. Therefor it’s not a good fit for Storj as a single disk solution either.

There are some things you can do to mitigate some of the issue though. Adding more nodes on other disks can spread the load and some SNOs have seen that fix the problem. You can also move the db files to another HDD, though that comes with its own risks and should only be done as a last resort.

1 Like

This means, there are no CMR drives that are 2.5" and larger then 2TB? :expressionless:

That I don’t know, but it wouldn’t surprise me. Obviously density is going to be needed on those smaller disks. I wouldn’t recommend using them for Storj anyway.

I just came across this list. Looks like Seagate eventually started listing their CMR/SMR models. I’ll add the link to the top post as well.

1 Like

This currently seems to be the case, yes.
/EDIT: the Toshiba MQ03ABB300 (3TB/2.5in/5400rpm) is listed as CMR!

Little update (I’m the guy with the Exos 5E8 SMR drive): there is 1TB left on my node and I still can’t see any strange behaviour, ingress is coming as normal and the drive doesn’t seem to be busier than usual.

I did however order a 12TB replacement today, gonna sell the SMR one just to be safe (i bought it used anyways).

2 Likes

Is this the only node you’re running? If so, then I guess the Exos models are a bit more resilient to higher loads. Which would make sense.

I don’t know if you should do that. Once it’s full you’re never looking at large sustained write loads again. Might as well keep it running. though I would make sure you leave a fairly healthy margin of free space. Instead, I’d suggest just running an additional node on the new 12TB.

2 Likes

Nope, but it’s the only node on that machine.

I actually wanted to refrain from doing so until it is officially supported via a convenient interface. Currently the way to go would be using Vadims toolbox, right?

Also, said node is in a pretty small computer case, mounting a 2nd HDD might require a bit of fiddling around. Aaaaand the motherboard actually has to supply power for the SATA devices, I am not yet sure if this solution can handle 2 rather hungry drives.

If you run multiple nodes in the same IP subnet, then you spread the load across them. This is likely why your SMR drive had less of a problem keeping up with the load.

On windows that’s the easiest way to do it right now. I don’t expect that an official interface for this will be available soon. You can also opt for docker, but on windows that is probably not the best way to go.

You can also use the HDD in an external enclosure (with its own power supply) if internal is not an option.

1 Like