How storagenode docker app manage the memory?

brizio71 · May 27, 2020, 10:22am

I have a Ubuntu VM 18.04 with 4 vCPU (core i7) and 8GB of dedicated RAM with 10TB dedicated HDD, where I’m running docker storagenode, I would like to understand how the Memory in manage because the storagenode crash 3-4 time per day with this error:

Can I solve this crash with some kind of configurations ?

thx

Toyoo · May 27, 2020, 12:02pm

In Linux OOM doesn’t necessarily kill the process which is at fault, as this is difficult to establish. It kills the process that happens to require more memory at that specific instant when no free memory is there, even if the process is in general not memory-intensive.

I have never seen storagenode take more than few hundred megabytes of memory, so I’d risk a guess that the storage node is just a victim of some other process. Try figuring out which specific process eats memory on your machine. top/htop might be helpful.

brizio71 · May 27, 2020, 2:00pm

Just now the memory is about full, as u can see about 93% of memory is used by storagenode:

ps aux | awk ‘{print $6/1024 " MB\t\t" $11}’ | sort -n

ps aux | awk ‘{print $2, $4, $11}’ | sort -k2r | head

kevink · May 27, 2020, 3:22pm

by sata, not smb/nfs?

brizio71 · May 27, 2020, 4:03pm

Yes 10TB dedicated HDD

donald.m.motsinger · May 27, 2020, 4:26pm

There is seriously something wrong with your setup. I run 5 nodes on an old i5 2500K with 24GB RAM and my processes look like that

# ps aux | awk '{print $2, $4, $11}' | sort -k2r |grep storagenode
5738 0.2 ./storagenode
7220 0.2 ./storagenode
7751 0.1 ./storagenode
8134 0.1 ./storagenode
8522 0.1 ./storagenode
16591 0.0 ./storagenode
16666 0.0 ./storagenode
16743 0.0 ./storagenode
16821 0.0 ./storagenode
16904 0.0 ./storagenode

May I asked why you’re running it on a VM? If the host is Linux, just run it directly there.

brizio71 · May 27, 2020, 8:44pm

I’m running a VM over QNAP NAS because the directly docker doesn’t work, I think because I have multiple NAT and the node is always shown as offline

Toyoo · May 28, 2020, 1:23pm

This indeed does look unusual. I’d try consulting Storj developers…

brizio71 · May 29, 2020, 7:01am

Looking my container log I find this warning message:

level=warning msg="Your kernel does not support swap memory limit"
level=warning msg="Your kernel does not support cgroup rt period"
level=warning msg="Your kernel does not support cgroup rt runtime"

I’m running docker on ubuntu 18.04 machine

donald.m.motsinger · May 29, 2020, 8:05am

What kernel are you using?

uname -a

brizio71 · May 29, 2020, 8:35am

Linux storagenode 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

donald.m.motsinger · May 29, 2020, 9:27am

You could update the kernel following these instructions

https://wiki.ubuntu.com/Kernel/LTSEnablementStack

But I’m not sure it will solve the problem, as I run the same kernel on my Ubuntu 16.04 server.

brizio71 · May 29, 2020, 10:34am

I have update the kernel:

Linux storagenode 5.3.0-53-generic #47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

but instead of solve the problem other 2 warnings has been added:

level=warning msg="Your kernel does not support swap memory limit"
level=warning msg="Your kernel does not support cgroup rt period"
level=warning msg="Your kernel does not support cgroup rt runtime"
level=warning msg="Your kernel does not support cgroup blkio weight"
level=warning msg="Your kernel does not support cgroup blkio weight_device"

brizio71 · June 8, 2020, 3:20pm

Hi,

I have move all my node to windows CLI with no docker and the problem is still the same, but on docker the node automatically restart on windows the service will stop and then u need manual restart !!!

It takes about 10 mins to full the memory:

Could be some problems on my db files ?

How I can check this strange behavior ?

BrightSilence · June 9, 2020, 7:39am

I’ve never seen it use so much memory in such a short time. What HDD model are you using?

brizio71 · June 9, 2020, 7:43am

I have try many solution from RAID0 on QNAP connected via ISCSI to direct RAID0 connected to VM, then I move on single HDD and now I have try with Windows Storagenode CLI directly connected to an HDD.

I have made all this test using rsync to move all my data, I have about 7TB used

BrightSilence · June 9, 2020, 7:45am

Yes, but all with the same HDD right? I’m wondering if it’s an SMR model, which could contribute to write delays which could in turn lead to high memory use. Although it doesn’t exactly explain the numbers you are seeing. It would be nice to exclude as a possibility if it doesn’t apply.

brizio71 · June 9, 2020, 7:47am

No I have try also different HDD, I have check my model and are all WD RED and WD RED PRO with no SRM.
Now my disk is close to be full, so I will start a new node and I will see if the things will change

BrightSilence · June 9, 2020, 7:58am

Ok, I guess it’ll take some effort to nail this one down. Perhaps this topic can help you find the root cause of the memory issue.

Alexey · June 9, 2020, 9:33pm

Yes, it could be a problem with databases.
Please check them all: