Why storagenodes uses max cpu + takes up tons of memory and how to fix it

SGC · August 12, 2020, 10:47am

The storagenode gets data from the network and stores it… if the disk access time is to high, the excess data will have to be stored somewhere… and thus the ingress data will be stored in memory until the storagenode can get time to write it to disk.

these are some examples of what that will look like…
keep in mind, depending on the graph model not all methods will show the cpu utilization as 100%, but this isn’t real cpu utilization either, it’s IOwait, which is when the cpu is waiting for data from a drive before it continues, which can slow a system to a crawl.

it’s clear that my storagenode memory utilization is running away and it will keep consuming more and more memory until something either goes wrong or until the available hdd bandwidth and or IO exceeds the storagenode requirements.

How to solve the IOwait…

There are a few options… the easy option is shutdown stuff you don’t need using bandwidth or IO on the storagenode hdd…

another option could be to move your database to another drive… not sure how much performance that actually gives tho…

my preferred solution when a software fix isn’t an option because of hardware limitation is the addition of more storagenodes… since data is evenly distributed across the nodes on a network.
then getting an extra hdd and adding an additional node will decrease the load on one node by 50%

There should never be more than one node on one hdd!

After the issue have been dealt with … in my case a local VM i’ve had some trouble with… seems it may interfere with netdata as i think it started just as i restarted my netdata…
both are running some advanced semi diy scripts, so it’s possible some of their coding is similar or whatever…

I digress…
The VM’s was all killed also seemed to have lost access to most of them, maybe because of the IOwait thing or because i set a significant swap allowance on them…

cpu utilization is back in 5-10% or so…

the ram usage is now erratic but the curve has been flattened and ram usage is slowly dropping… will give it a day and see if it’s gone back to normal…
a node restart might be required to purge the excess cache created…

and 5 hours later the memory usage is now starting to be closer to normal,
even if still slightly elevated…

anon68609175 · August 12, 2020, 1:15pm

What HDD do you use? Maybe your HDD SMR, not CMR?

Vadim · August 12, 2020, 1:24pm

dont have this problem on windows GUI at all. I have 5-7 nodes on 1 windows PC. work fine no problem.
Each storagenode take 1-3% of CPU, 30-50MB of RAM

Odmin · August 12, 2020, 1:34pm

I see the root cause is your storage is too slow:
The writes accumulating to memory and waiting for disk time.

iowait is too high, check your disk system, maybe another VM’s make it too busy…

BrightSilence · August 12, 2020, 2:13pm

Might be worth mentioning that this only works if you use different HDD’s for different nodes.

You’ve basically implemented @SGC’s solution in the most extreme form by having massive amounts of nodes on their own HDD on the same IP. So I doubt you will ever run into this limitation.

SGC · August 12, 2020, 3:55pm

you are exactly spot on, i did however also write that it was a VM acting up when i initially posted it.

i think my paravirtualization is mixing stuff from programs running on the host with programs running on the VM, both programs are using some very elaborate java programming.

next step i guess is to try and disable paravirtualization for the VM in question.

haven’t had this problem much myself but it’s a common question on the forum so i figure i would do a bit of show and tell while i was at it…

SGC · August 12, 2020, 4:02pm

yeah, having this many nodes is pretty amazing stuff.

i bet one could run SMR drives like that with no problem, because one adds a full disk worth of IO with each disk added.
ofc one would run into the whole often having errors creep in, eventually…

even my 11 disk zfs raid would not be able to keep up even with a 4 hdd / 4 node setup’s raw hdd IO

TheMightyGreek · August 13, 2020, 3:10pm

You think SMR drives are more susceptible to errors ? Because I was thinking about setting up a few SMR nodes on a raspberry pi I have laying around.
I’ll see if I can find a USB SMR drive that has 3+ years of warranty, because with more drives they fill up slower which increases the ROI time…

SGC · August 13, 2020, 8:51pm

USB have seen a bit of a bad track record… you can start out on usb, but after 6 months to 1 year you will want to migrate to atleast sata, else you should expect the node to crash hard eventually…

ofc there might be a hardware factor in the usb issues… but often its to do with that usb hdd’s wasnt meant to run 24/7 and usb is made to be disconnected and reconnected… so it will do that quite easily… while sata will basically choke the entire system to get contact during high latency or whatever…

usb just isn’t super well suited for the task of running a storagenode… but it will do fine in a pinch and in some cases i’m sure it runs fine for years… and with enough nodes like 2-3 then i’m sure even SMR and usb running 24/7 isn’t an issue because the load is shared and thus less work and less heat…

you will want 7200rpm and if you are buying drives and don’t have a considerable saving on SMR i would recommend going CMR…

CMR are like 10-20% more expensive and in some cases more than 20times faster for some loads… also if you buy enterprise drives they are a lot better quality and the warranty is 5 years.

often for not much more than a total of 20-30% more than what a 5400rpm SMR drive costs…

so SMR … well maybe if you can get some epic deals… but else… run away screaming…

they can be made to work, but should be avoided for storj if possible.

oh right your question… no i don’t think smr has higher odds of errors… but you can work drives to death by giving them to demanding a workload… and SMR will for some workloads only require 5% of what a CMR can handle before it starts to stalls out

TheMightyGreek · August 14, 2020, 6:51am

I’ll take that into consideration, If I go down that road it’ll be with minimal investment and I’ll make sure I have multiple nodes to spread out the load.
Here in Switzerland I can get my hands on a 8TB Seagate Centre Backup plos for 147$ compared to 244$ for a 8TB Seagate IronWolf.
However I managed to find a good deal on the Ironwold the first time I bought it and it only cost me 180$ so I’ll keep my eyes opened for discounts. I think that’s the best way to get good hard drives for a decent price.

naxbc · August 14, 2020, 6:55am

+1, same here but just 1 node per PC.

SGC · August 14, 2020, 7:08am

well i just have one node… eventually i will expect the ram usage will have to go up since it’s required to manage so much data…

there is a reason that sata controllers are limited to their size of disks these days … and it’s called memory… the less and cheaper memory the manufacturer of the controller uses, the less capacity they end up with…

i’m up to 14 TB on one node, i would expect it to start using ram by now…

@Vadim also keep in mind that the specific requiredment per storagenode is 1gb of memory, thus you should expect it to be possible that some programming might at one point make use of the memory that you are suppose to have allocated to each node… and if unlucky they might all require it at one time…

so i wouldn’t assume that one can run a storagenode at 50mb each…

these last few days… i’ve not had much luck keeping my memory usage down… not sure why, but also still have a slight bit of latency on my storage pool… but nothing of note

might have some old scripts that didn’t get closed right which are eating performance causing iowait running in the back ground or the heat is getting to my setup

Vadim · August 14, 2020, 8:06am

One of my node have 2GB order DB, and when it starts some time it use 2GB+ RAM node itself is full 5TB HDD. Otder db is already vacumned, before was 2.5 GB so it can be easyli one node need 2GB+

SGC · August 14, 2020, 8:31am

my orders.db is 717 MB for just about 14TB

but i duno how that works, or why a 5 TB node would have a larger orders.db than mine… ofc my node is fairly new only going into the 6th month now.

never performed any operations on any of my databases

Vadim · August 14, 2020, 8:39am

How many blob files do you have? It can be you have lot of big files, and i have lotof small files, then i will have bigger db
@Alexey have you seen 2gb order db atall?

SGC · August 14, 2020, 8:47am

… WHY WOULD YOU ASK ME THAT!!! you terrible person

counting files now… i bet this will take a while… last time i checked it was about 1mil files per 2tb
so i would suspect about 7million files
but duno, will be interesting to see how quickly my system can count them…

using this command, not sure if there are better… ofc you are on windows … so

find . -type f | wc -l

1 hour later
i knew this was going to take a while, but this is ridiculous maybe not the best command for this… should have used ls and just let it include the folders in the count…

SGC · August 14, 2020, 10:01am

@Vadim
finally done

exactly 8.829.982 files in the blobs folder for a 13.52 TB node

kevink · August 14, 2020, 3:30pm

vacuum will make a difference too.

Vadim · August 14, 2020, 4:01pm

i vacumned it, 2,5gb gone to 2GB

Alexey · August 14, 2020, 8:48pm

Windows docker node (2 TB)

127275008 orders.db
334049280 used_serial.db

RAM usage:

CONTAINER ID        NAME                 CPU %               MEM USAGE / LIMIT     MEM %               NET I/O           BLOCK I/O           PIDS
cc91a9006ec6        storagenode          0.00%               37.07MiB / 7.786GiB   0.46%               1.92GB / 56.1GB   2.58MB / 0B         15

Windows GUI node (7 TB)

 628645888 orders.db
  85725184 used_serial.db

RAM usage

get-process storagenode* | Group-Object -Property ProcessName | Format-Table Name, @{n='Mem (MB)';e={'{0:N0}' -f (($_.Group|Measure-Object WorkingSet -Sum).Sum / 1MB)};a='right'} -AutoSize


Name                Mem (MB)
----                --------
storagenode               46
storagenode-updater       11