Hello community!
I thought of asking this question from you guys as I’m not sure if I’m speculating an issue or it is an actual issue. But bit of background first:
I’ve had my storagenode run over 2x2TB CMR drives on RAID0 array with a capacity of 3.4TB configured for storj on docker in my QNAP NAS. This was smooth sailing for more than a year until I decided its time to upgrade those 2 HDD slots and moved storj node to my seperate 4x4TB SMR RAID1 array with 10TB configured in docker for storj2 node (there is nothing else on this pool anyway). I used RSYNC to sync existing data which took some time but it got done. Everything was up and ok for a few months, it was filling up regularly until up to about 9TB, when about a bit over a month ago, I suddenly found 5+ TB of it in the trash, which eventually was automatically deleted permenently. And since then the used capacity is being hung around the usual 3.4TB, which was the max capacity I had before on my 4TB pool, and coincidently the max capacity of a single disk in the current RAID array. Dashboard and config for storj2 still shows 10TB. Am I missing something, I didn’t do anything particular to affect the node other than a few restarts of the NAS after firmware upgraded couple of times since then. I’ve been monitoring this for about 5 weeks now and it is still flat on 3.4TB, a number which concerns me given the history of the node… Can anyone tell me whether I should give more time or is there an issue to be taken care of?
Edit: as pointed out yes it is in 4x4TB RAID5, my mistake
Welcome back,
4x4 would be 16TB, and RAID1 Mirrored 8TB …
Anyways, Totally normal. You may notice there was an influx of test data over the last three months, which should … be deleted by now. So yes, you may expect your nodes to be back where they were and expect them to grow slowly again.
It was likely plugged up for the last month attempting to delete the previous months’ TTL (time to live) data which had a 30 day expiry. There were ongoing problems with purging nodes, mostly rectified now.
Hope that helps.
2 cents
4x4TB raid1 is 4TB total space. Where do you get 10TB from? Did you mean RAID5?
Use of SMR drives is a mistake.
This was covered extensively on the forum. There is nothing to do.
Yes, get rid of SMR drives to prevent issues in the future.
Right now you don’t have any issues, as long as status is Online in the dashboard.
Thank you both for your quick responses! I was not aware and was not in touch about test data in the past month, looked in the forum with unrelated search terms I think.
Post corrected to indicated it is RAID5 indeed.
Understood that SMR drives suck (I’ve come across a few posts here suggesting they will be problematic only when there is less free space?), but they were purchased a long time back when they first came out just to have media for my plex server on it. Now I’ve moved media to 16TB CMRs and don’t know what else to do with these SMR drives (don’t have any second hand value on them either due to unpopularity around them). Guess I’ll just wait until it gets filled and things slow down so I’d rsync them back to my CMR pool
Perhaps for these SMR drives it would be better to use them separately and run an own storagenode with own generated unique identity and own disk. In that case they would spread the load more evenly making them not so bad. RAID5 wouldn’t survive the load, your node likely would often crash with this error:
see
also i remember a user reporting their SMR did better when they limited concurrent ingress to just a few (like 8) connections at a time.
I second that.
SMR disks are the worst of both worlds: like SSDs, they are read-modify-write devices, but unlike SSDs, they also still have HDD mechanical latency.
So, when they are mostly empty, they work not too much slower than regular HDDs – there is plenty of space to write data to, without the need to read-modify-write. When you fill up enough space or when it runs our of empty sectors — then not only every write becomes much slower, but also prone and to data loss in a most horrible way: power loss while writing new data can corrupt the existing data written long time ago. Such as file allocation bitmap for example, and then all your data is poofff.
There are some precautions in place — like capacitive and mechanical power reserve to be able to complete the write even if power is lost — but how much you would bet on that, with the understanding that the only reason SMR exists is cost cutting.
When your raid member fails the array is rebuilt. This involves write of the entire disk. Sequentially. This usually can take a couple of days. With SMR drives it can take weeks/months. Don’t use them in any redundant array arrangement.
I can’t think of any application where use of SMR can be acceptable, outside of media storage — where you take a month to write huge files sequentially and then only read them. Storj is the opposite usecase in every respect.
My advice — throw them away. I would not even try to sell them to people who should have known better, I would feed awful selling them a turd.
Thanks Alexey, I have considered running multiple nodes on the same physical server but it didn’t quite pan out well - i recall referring to the storj docs article you had shared - not sure where I went wrong but I think I will try again rather than waiting for the only node to die. I’ll start a rsync of existing data back to the CMR pool (4x16TB on RAID 1) which has got plenty of space for the time being.
What’s happened? Did you get a some error?
Yeah 5 years ago I bought them (WD reds) just to store media (honestly I didn’t have any idea on SMR debacle until after I bought them). They were somewhat cheaper, and didn’t have a seperate WD label like now back then for CMRs. I’ll see what i can do with them, would hate to throw them away - perhaps run seperate nodes on them and milk whatever i can until they eventually die on me some day.