Can you explain a little more about what kind of storage you are using? Model of HDD(s), unRAID setup, is there a cache for that unRAID array?
Your previous message seem to suggest you do use a cache for the data array, but it would make nu sense if there is minimum activity on there if a node is running.
I have a array of 9.
2 parity disks of Seagate ironwolf NAS 4TB
Disk 1 Seagate barracuda 4TB xfs (personal use)
Disk 2 Seagate barracuda 4TB xfs (personal use)
Disk 3 Seagate barracuda 4TB xfs (Storj storage use)
Disk 4 Seagate barracuda 3TB xfs (Storj storage use)
Cache Kingston 240GB btrfs (docker.img use)
and 2 unassigned devices
I dont use cache to storj data.
The only thing thats on the cache disk now is docker.img and libvirt.img
Well write speed on unraid without a cache aren’t exactly know to be great. You’re sharing the parity disks among all hdds so load on other hdds will impact it as well.
I’m still a little confused by you calling the SSD cache while not actually using it as a cache?
Do you happen to know whether those barracudas are SMR drives? What’s the exact model?
I don’t think the speed of the drive is the issue though because test copying 4000 files that are only 2MB each and see how much stress it causes a hard drive without a cache.
It was first 50MB’s andthen it went to 18MB’s.
But i also tried yesterday to use cache disk with storj… no difference. cpu 100% wa 98-100%
Is it possible that the docker.img is little corrupt?
When this problem started, i had different hardware. i7-2600 with 16GB ram.
I switched hardware 2 days ago to a ml250 gen5 server and the problem followed.
i7 server is now up and running a new server without any problems.
It’s going to take almost a year to figure that out (it will take a loong time to get 6TB into that node)
This problem happened after a storj node update, database malformed and a unraid upgrade from 6.8.2 to 6.8.3
This happend late 27 mars and my node went down. before that my node work perfectly. No high cpu usage, no wa.
I got the node up and running again 2. april and and right after the container started the cpu and wa immediatley went up.
So you think the issue was cause unraid upgraded and the storj node updated? Possible but l think theres more to it with the way unraid runs. I hope that it works better but I still have this feeling it could happen with this node. I wouldnt use unraid for anything serious though but that is just me. Have you ever tried promox?
Your issues also started right at the moment the current heavy upload load testing started. I highly doubt it has anything to do with the new version and it probably has a lot more to do with high amounts of uploads.
So… Here’s the bad news.
ST4000DM004 uses SMR technology. And during these recent stress tests, we’ve seen many SNOs run into issues with SMR HDD’s. Even though modern SMR disks use large caches to overcome the shortcomings, there is simply no way an HDD like that can deal competently with continuous random write behavior like storagenodes use. I think you may be able to overcome this with SSD cache, but I’m not entirely sure that won’t just postpone the bottleneck.
I really don’t like that HDD manufacturers are now silently sneaking this tech into some of their HDDs. Be very aware to avoid them if you’re buying new HDDs. It’s definitely not worth the tiny cost savings.