Yeah, it’s definitely not a myth. I wanted to respond to that before, but declined because I kind of know myself and didn’t feel like my response would be very kind if I jumped on it in the moment.
Calling it a myth on a forum that is littered with people complaining about nodes grinding to a halt on SMR HDDs is quite a bold statement after all. SMR drives aren’t all made equal. Just because one didn’t cause an issue (yet) doesn’t mean there aren’t many problematic ones out there. However, I think almost all of them will eventually hit a wall. There are two important factors that may help or hurt. The size of the CMR cache and the available space on the HDD. CMR cache gives the HDD a place to write new data to temporarily, without having to rewrite entire sections of tracks, but under constant load, it only provides a buffer that will fill up eventually. Now if there is tons of free space on the HDD, chances are this isn’t really an issue, since the HDD will likely be able to find adjacent tracks that are both free and just write data there.
The big problem though, is that these HDDs are designed for intermittent use. And they perform really well in those cases. The reason is that when there is no use from the OS, the HDD does internal maintenance to optimize the way data is stored. This includes writing CMR cache to SMR areas and rewriting shingled tracks to optimize free space. However, Store never lets the HDD fully relax. As long as either CMR cache or free adjacent tracks are still available, you would see performance curves like the tests posted earlier. They actually still perform quite ok. But when both of those run out, it’s like hitting a wall and both write and read performance plummet to the KBps level, with multi second stalls. At that point, the writes just stack up in memory and transfers start failing on reads as well if it gets bad enough.
It’s annoying… Because you won’t immediately know the HDD is going to be a problem… And when it does become a problem, it’s going to be really hard to migrate all that data.
What might help though is making sure DB’s are moved to a different HDD (or SSD). And if you do hit that wall, lower the nodes capacity to below what is already stored and give the SMR HDD room to breath and do all that stacked up internal maintenance. You could then still keep your node online as writes are very limited at that point.
Since this requires a restart to do, it would be best to disable the file walker at that point to prevent a large read spike on restarts while the HDD is already stalled.
@IsThisOn I hope your SMR drive won’t run into these issues, but with only 3TB filled right now, it’s simply to soon to tell. Should you run into these issues later, I hope this posts provides some useful info to work around it.