Storagenode container using 100% cpu

BrightSilence · April 17, 2020, 9:09am

Can you explain a little more about what kind of storage you are using? Model of HDD(s), unRAID setup, is there a cache for that unRAID array?

Your previous message seem to suggest you do use a cache for the data array, but it would make nu sense if there is minimum activity on there if a node is running.

SRS · April 17, 2020, 11:57am

I have a array of 9.
2 parity disks of Seagate ironwolf NAS 4TB
Disk 1 Seagate barracuda 4TB xfs (personal use)
Disk 2 Seagate barracuda 4TB xfs (personal use)
Disk 3 Seagate barracuda 4TB xfs (Storj storage use)
Disk 4 Seagate barracuda 3TB xfs (Storj storage use)

Cache Kingston 240GB btrfs (docker.img use)

and 2 unassigned devices

I dont use cache to storj data.
The only thing thats on the cache disk now is docker.img and libvirt.img

BrightSilence · April 17, 2020, 10:46pm

Well write speed on unraid without a cache aren’t exactly know to be great. You’re sharing the parity disks among all hdds so load on other hdds will impact it as well.
I’m still a little confused by you calling the SSD cache while not actually using it as a cache?

Do you happen to know whether those barracudas are SMR drives? What’s the exact model?

SRS · April 17, 2020, 11:03pm

It have been said that using cache woth storj is not a good thing, so i only use it to docker.img and to my private fileserver.

The modle name to
seagate ironwolf 4TB: ST4000VN008
seagate barracua 4TB: ST4000DM004
seagate barracuda 3TB: ST3000DM003

But when i copy stuff to my private hdd’s the write speed is 80 MB’s and up without cache.

deathlessdd · April 17, 2020, 11:37pm

I don’t think the speed of the drive is the issue though because test copying 4000 files that are only 2MB each and see how much stress it causes a hard drive without a cache.

SRS · April 17, 2020, 11:51pm

I going to try that right now

SRS · April 18, 2020, 12:03am

It was first 50MB’s andthen it went to 18MB’s.
But i also tried yesterday to use cache disk with storj… no difference. cpu 100% wa 98-100%
Is it possible that the docker.img is little corrupt?

deathlessdd · April 18, 2020, 12:08am

But have you tried to use that same hardware without unraid with storj?

SRS · April 18, 2020, 12:18am

When this problem started, i had different hardware. i7-2600 with 16GB ram.
I switched hardware 2 days ago to a ml250 gen5 server and the problem followed.
i7 server is now up and running a new server without any problems.

deathlessdd · April 18, 2020, 12:36am

Is the i7 2600 running another unraid with storj on it?

SRS · April 18, 2020, 12:43am

Yes. It’s running unraid 6.8.3 with a new storj node

deathlessdd · April 18, 2020, 12:43am

But is it a new storj node though with alot less space used and a smaller database?

SRS · April 18, 2020, 12:48am

Yes. I started that node 21 hours ago

deathlessdd · April 18, 2020, 12:49am

So I have a theory once that node gets alot more data it will do the samething…

SRS · April 18, 2020, 12:55am

It’s going to take almost a year to figure that out (it will take a loong time to get 6TB into that node)
This problem happened after a storj node update, database malformed and a unraid upgrade from 6.8.2 to 6.8.3

This happend late 27 mars and my node went down. before that my node work perfectly. No high cpu usage, no wa.
I got the node up and running again 2. april and and right after the container started the cpu and wa immediatley went up.

deathlessdd · April 18, 2020, 12:57am

So you think the issue was cause unraid upgraded and the storj node updated? Possible but l think theres more to it with the way unraid runs. I hope that it works better but I still have this feeling it could happen with this node. I wouldnt use unraid for anything serious though but that is just me. Have you ever tried promox?

SRS · April 18, 2020, 1:04am

Never heard of promox

deathlessdd · April 18, 2020, 1:05am

Oh well its free to use unlike unraid is. But im guessing you already paid for unraid tho.

BrightSilence · April 18, 2020, 9:51am

Your issues also started right at the moment the current heavy upload load testing started. I highly doubt it has anything to do with the new version and it probably has a lot more to do with high amounts of uploads.

So… Here’s the bad news.
ST4000DM004 uses SMR technology. And during these recent stress tests, we’ve seen many SNOs run into issues with SMR HDD’s. Even though modern SMR disks use large caches to overcome the shortcomings, there is simply no way an HDD like that can deal competently with continuous random write behavior like storagenodes use. I think you may be able to overcome this with SSD cache, but I’m not entirely sure that won’t just postpone the bottleneck.

I really don’t like that HDD manufacturers are now silently sneaking this tech into some of their HDDs. Be very aware to avoid them if you’re buying new HDDs. It’s definitely not worth the tiny cost savings.

SRS · April 18, 2020, 11:38am

But again… When i activate cache disk to storj the problem is still there. Its no activity on the seagate disk’s when the cache is in use.

Is there a way to edit how many requests i can have on that node?