OOM killer every 4 hours

Logged into my node this evening and found that the uptime was only ~20 minutes. After checking my NMS I found memory usage spiking and dipping on the VM that hosts the storj container every 4 hours like clockwork. This just started happening recently:

The only things on this VM at all are the storagenode and watchtower, here’s the current docker stats:

[xxxxxx@storj ~]$ docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
626cb4dca93a watchtower 0.00% 1.656MiB / 3.683GiB 0.04% 1.39kB / 0B 8.29GB / 15MB 9
3aadb5a51a0b storagenode 58.48% 468.3MiB / 3.683GiB 12.42% 185MB / 51.1MB 87.5MB / 377MB 138

Doesn’t look like any crazy usage of my disk/network:

Storage Node Dashboard ( Node Version: v0.20.1 )

======================

ID [redacted]
Last Contact 2s ago
Uptime 56m14s

               Available        Used     Egress     Ingress
 Bandwidth       50.0 TB     26.3 GB     9.0 GB     17.3 GB (since Sep 1)
      Disk        9.0 TB     13.5 GB

Bootstrap bootstrap.storj.io:8888
Internal 127.0.0.1:7778
External xxxxx.org:28967

Neighborhood Size 172

Anyone else seeing extreme memory spikes causing OOM killer to kick in and reap my container?

Hi @mrdotkom

Welcome to the forum!
Could you please share what is your OS? and how many RAM you have?

Thanks for the quick reply,

CentOS 7 and 4Gb of RAM but it’s not like it’s slowly using up RAM over the day, I’m seeing a major spike every so often that brings down the node

Thanks for your information!

Can I ask you also about what a CPU are you using and what your storage system for storagenode?

I have quick workaround for you: try to add --oom-kill-disable to your docker run command, it will disable OOM killer for this container and recreate storagenode container.

It’s a virtual machine but the host has 2x Xeon X5670’s. Storage is a FreeNAS box with a 10 gigabit link to the vHost. Disks are spinning rust. Looking at the monitoring, there does appear to be a correlation between the disk I/O an the OOM kills. I checked and bandwitdth during that period also spiked so looks like a lot of traffic flowing into/out of the node…

I don’t really want to disable OOM killer, maybe I’ll bump memory up to 8Gb and see if that’s able to satiate storj when someone’s doing a lot of work.

Quick question though: when my node gets OOM killed, am I being penalized somehow by the network? Will satellites be less likely to connect with me?

I think root cause is slowness on your storage side, you can easy check it with netdata, freenas have builtin netdata service, just enable it and you can see more details.

image

About quick question: yes, you can have a problem, because killing storage node can easy corrupt database and also you can loose file pieces.
So, I suggest determine bottleneck ASAP. Also you can add more RAM, but if your storage is slow or used any VMs that produce high IO it not help in your case.

Alternatively you could limit the ram usage for the container by adding -m 3500M to the run command.

That does not sound promising. What is the model of that spinning rust?

Are you telling me every node operator should be running SSD? The drives are 7200RPM 3Tb Seagates in ZFS2

Wasn’t responding to the spinning part as much as the rust part. To be honest I’m a little surprised the disks have a hard time keeping up. Did you happen to raise the concurrency setting?

Edit: just to be clear, I’m not suggesting SSDs. It’d be wasted money for the most part.

Nah all settings ought to be the default. I actually neglected to turn my node back on yesterday after upping RAM and CPU to 8Gb/8Core so I’ll have to wait and see if it helps.

netdata seems to show nothing crazy at the moment but i’ll keep an eye on it

Thanks guys. Will report back when I have more data collected

Was still OOM killing even after having 8Gb / 8vCPU so I’m adding the -m 3500M (not G! lol) flag and will see how it goes.

FWIW netdata wasn’t showing any high cpu. might try exporting the storagenode container’s logs to a log analytics tool to see if I can find out whats up

Hi!
What about iowait time on your node?

Whoops, yeah good catch

Added -m flag to limit memory usage to 3500M and still crashing every few hours…

As for ioWait, here’s 100 iterations of iostat with 1 sec interval. TL;DR, less than .3%, usually 0

expand

[xxxxxxx@storj ~]$ iostat 1 100 | grep iowait -A1
avg-cpu: %user %nice %system %iowait %steal %idle
18.33 0.00 1.74 0.25 0.00 79.68

avg-cpu: %user %nice %system %iowait %steal %idle
11.13 0.00 0.77 0.00 0.00 88.11

avg-cpu: %user %nice %system %iowait %steal %idle
14.66 0.00 1.64 0.00 0.00 83.69

avg-cpu: %user %nice %system %iowait %steal %idle
7.78 0.00 0.64 0.13 0.00 91.45

avg-cpu: %user %nice %system %iowait %steal %idle
10.80 0.00 1.76 0.00 0.00 87.44

avg-cpu: %user %nice %system %iowait %steal %idle
13.66 0.00 2.73 0.00 0.00 83.60

avg-cpu: %user %nice %system %iowait %steal %idle
6.74 0.00 0.64 0.00 0.00 92.62

avg-cpu: %user %nice %system %iowait %steal %idle
12.52 0.00 2.25 0.00 0.00 85.23

avg-cpu: %user %nice %system %iowait %steal %idle
12.81 0.00 3.33 0.00 0.00 83.87

avg-cpu: %user %nice %system %iowait %steal %idle
8.04 0.00 0.77 0.00 0.00 91.20

avg-cpu: %user %nice %system %iowait %steal %idle
12.45 0.00 1.02 0.00 0.00 86.53

avg-cpu: %user %nice %system %iowait %steal %idle
10.84 0.00 0.64 0.00 0.00 88.52

avg-cpu: %user %nice %system %iowait %steal %idle
8.35 0.00 1.01 0.00 0.00 90.63

avg-cpu: %user %nice %system %iowait %steal %idle
12.59 0.00 2.49 0.00 0.00 84.91

avg-cpu: %user %nice %system %iowait %steal %idle
12.63 0.00 0.77 0.13 0.00 86.48

avg-cpu: %user %nice %system %iowait %steal %idle
7.77 0.00 0.64 0.00 0.00 91.59

avg-cpu: %user %nice %system %iowait %steal %idle
12.67 0.00 1.27 0.00 0.00 86.06

avg-cpu: %user %nice %system %iowait %steal %idle
10.74 0.00 0.51 0.00 0.00 88.75

avg-cpu: %user %nice %system %iowait %steal %idle
9.13 0.00 1.01 0.00 0.00 89.86

avg-cpu: %user %nice %system %iowait %steal %idle
12.55 0.00 1.39 0.00 0.00 86.06

avg-cpu: %user %nice %system %iowait %steal %idle
10.01 0.00 1.01 0.00 0.00 88.97

avg-cpu: %user %nice %system %iowait %steal %idle
11.18 0.00 4.13 0.12 0.00 84.57

avg-cpu: %user %nice %system %iowait %steal %idle
11.51 0.00 2.85 0.00 0.00 85.64

avg-cpu: %user %nice %system %iowait %steal %idle
9.67 0.00 2.73 0.00 0.00 87.61

avg-cpu: %user %nice %system %iowait %steal %idle
11.79 0.00 1.88 0.00 0.00 86.32

avg-cpu: %user %nice %system %iowait %steal %idle
10.08 0.00 1.51 0.00 0.00 88.41

avg-cpu: %user %nice %system %iowait %steal %idle
10.66 0.00 2.01 0.00 0.00 87.33

avg-cpu: %user %nice %system %iowait %steal %idle
12.34 0.00 0.89 0.00 0.00 86.77

avg-cpu: %user %nice %system %iowait %steal %idle
9.34 0.00 0.51 0.00 0.00 90.15

avg-cpu: %user %nice %system %iowait %steal %idle
9.82 0.00 3.68 0.00 0.00 86.50

avg-cpu: %user %nice %system %iowait %steal %idle
15.16 0.00 3.60 0.00 0.00 81.24

avg-cpu: %user %nice %system %iowait %steal %idle
8.85 0.00 4.30 0.00 0.00 86.86

avg-cpu: %user %nice %system %iowait %steal %idle
9.55 0.00 0.76 0.00 0.00 89.68

avg-cpu: %user %nice %system %iowait %steal %idle
14.07 0.00 1.39 0.00 0.00 84.54

avg-cpu: %user %nice %system %iowait %steal %idle
7.56 0.00 1.64 0.00 0.00 90.81

avg-cpu: %user %nice %system %iowait %steal %idle
10.97 0.00 0.64 0.26 0.00 88.14

avg-cpu: %user %nice %system %iowait %steal %idle
13.47 0.00 2.74 0.00 0.00 83.79

avg-cpu: %user %nice %system %iowait %steal %idle
7.04 0.00 2.96 0.00 0.00 90.00

avg-cpu: %user %nice %system %iowait %steal %idle
12.09 0.00 3.12 0.00 0.00 84.79

avg-cpu: %user %nice %system %iowait %steal %idle
12.25 0.00 1.52 0.00 0.00 86.24

avg-cpu: %user %nice %system %iowait %steal %idle
5.47 0.00 2.36 0.12 0.00 92.04

avg-cpu: %user %nice %system %iowait %steal %idle
12.58 0.00 3.58 0.00 0.00 83.85

avg-cpu: %user %nice %system %iowait %steal %idle
15.88 0.00 2.38 0.00 0.00 81.75

avg-cpu: %user %nice %system %iowait %steal %idle
5.06 0.00 1.01 0.00 0.00 93.93

avg-cpu: %user %nice %system %iowait %steal %idle
11.11 0.00 1.26 0.00 0.00 87.63

avg-cpu: %user %nice %system %iowait %steal %idle
16.79 0.00 0.89 0.00 0.00 82.32

avg-cpu: %user %nice %system %iowait %steal %idle
4.95 0.00 0.76 0.00 0.00 94.29

avg-cpu: %user %nice %system %iowait %steal %idle
9.99 0.00 3.45 0.00 0.00 86.56

avg-cpu: %user %nice %system %iowait %steal %idle
26.39 0.00 10.73 0.00 0.00 62.88

avg-cpu: %user %nice %system %iowait %steal %idle
5.46 0.00 1.02 0.00 0.00 93.53

avg-cpu: %user %nice %system %iowait %steal %idle
12.52 0.00 2.50 0.00 0.00 84.98

avg-cpu: %user %nice %system %iowait %steal %idle
17.53 0.00 1.77 0.13 0.00 80.58

avg-cpu: %user %nice %system %iowait %steal %idle
4.47 0.00 0.51 0.00 0.00 95.02

avg-cpu: %user %nice %system %iowait %steal %idle
10.94 0.00 1.02 0.00 0.00 88.04

avg-cpu: %user %nice %system %iowait %steal %idle
16.81 0.00 1.90 0.00 0.00 81.29

avg-cpu: %user %nice %system %iowait %steal %idle
5.40 0.00 1.76 0.00 0.00 92.85

avg-cpu: %user %nice %system %iowait %steal %idle
10.48 0.00 1.52 0.00 0.00 88.01

avg-cpu: %user %nice %system %iowait %steal %idle
15.76 0.00 1.77 0.00 0.00 82.47

avg-cpu: %user %nice %system %iowait %steal %idle
5.36 0.00 0.51 0.13 0.00 94.01

avg-cpu: %user %nice %system %iowait %steal %idle
10.57 0.00 1.89 0.00 0.00 87.55

avg-cpu: %user %nice %system %iowait %steal %idle
16.56 0.00 3.34 0.12 0.00 79.98

avg-cpu: %user %nice %system %iowait %steal %idle
7.02 0.00 5.83 0.00 0.00 87.14

avg-cpu: %user %nice %system %iowait %steal %idle
10.68 0.00 2.86 0.00 0.00 86.46

avg-cpu: %user %nice %system %iowait %steal %idle
16.03 0.00 4.16 0.00 0.00 79.80

avg-cpu: %user %nice %system %iowait %steal %idle
6.39 0.00 0.38 0.00 0.00 93.23

avg-cpu: %user %nice %system %iowait %steal %idle
11.08 0.00 1.15 0.00 0.00 87.77

avg-cpu: %user %nice %system %iowait %steal %idle
14.27 0.00 0.89 0.00 0.00 84.84

avg-cpu: %user %nice %system %iowait %steal %idle
8.32 0.00 3.92 0.00 0.00 87.76

avg-cpu: %user %nice %system %iowait %steal %idle
10.92 0.00 1.88 0.00 0.00 87.20

avg-cpu: %user %nice %system %iowait %steal %idle
12.86 0.00 1.39 0.00 0.00 85.75

avg-cpu: %user %nice %system %iowait %steal %idle
7.50 0.00 0.89 0.00 0.00 91.61

avg-cpu: %user %nice %system %iowait %steal %idle
12.13 0.00 0.77 0.00 0.00 87.10

avg-cpu: %user %nice %system %iowait %steal %idle
12.05 0.00 0.50 0.00 0.00 87.45

avg-cpu: %user %nice %system %iowait %steal %idle
7.62 0.00 0.50 0.00 0.00 91.89

avg-cpu: %user %nice %system %iowait %steal %idle
11.92 0.00 0.88 0.00 0.00 87.20

avg-cpu: %user %nice %system %iowait %steal %idle
12.16 0.00 0.75 0.00 0.00 87.09

avg-cpu: %user %nice %system %iowait %steal %idle
7.53 0.00 0.50 0.00 0.00 91.97

avg-cpu: %user %nice %system %iowait %steal %idle
10.16 0.00 0.63 0.00 0.00 89.21

avg-cpu: %user %nice %system %iowait %steal %idle
10.23 0.00 1.14 0.00 0.00 88.64

avg-cpu: %user %nice %system %iowait %steal %idle
6.31 0.00 1.77 0.00 0.00 91.93

avg-cpu: %user %nice %system %iowait %steal %idle
9.46 0.00 1.64 0.00 0.00 88.90

avg-cpu: %user %nice %system %iowait %steal %idle
11.41 0.00 3.93 0.00 0.00 84.66

avg-cpu: %user %nice %system %iowait %steal %idle
6.84 0.00 4.03 0.00 0.00 89.13

avg-cpu: %user %nice %system %iowait %steal %idle
8.94 0.00 3.67 0.00 0.00 87.39

avg-cpu: %user %nice %system %iowait %steal %idle
6.70 0.00 1.52 0.00 0.00 91.78

avg-cpu: %user %nice %system %iowait %steal %idle
8.92 0.00 1.02 0.13 0.00 89.94

avg-cpu: %user %nice %system %iowait %steal %idle
11.42 0.00 1.02 0.00 0.00 87.56

avg-cpu: %user %nice %system %iowait %steal %idle
9.30 0.00 1.63 0.00 0.00 89.07

avg-cpu: %user %nice %system %iowait %steal %idle
12.26 0.00 1.39 0.00 0.00 86.35

avg-cpu: %user %nice %system %iowait %steal %idle
9.95 0.00 0.51 0.00 0.00 89.54

avg-cpu: %user %nice %system %iowait %steal %idle
8.53 0.00 1.25 0.13 0.00 90.09

avg-cpu: %user %nice %system %iowait %steal %idle
12.41 0.00 1.27 0.00 0.00 86.33

avg-cpu: %user %nice %system %iowait %steal %idle
9.94 0.00 1.51 0.00 0.00 88.55

avg-cpu: %user %nice %system %iowait %steal %idle
8.69 0.00 0.88 0.00 0.00 90.43

avg-cpu: %user %nice %system %iowait %steal %idle
12.59 0.00 3.21 0.00 0.00 84.20

avg-cpu: %user %nice %system %iowait %steal %idle
8.72 0.00 2.49 0.00 0.00 88.79

avg-cpu: %user %nice %system %iowait %steal %idle
10.81 0.00 1.15 0.00 0.00 88.04

avg-cpu: %user %nice %system %iowait %steal %idle
11.66 0.00 1.01 0.00 0.00 87.33

avg-cpu: %user %nice %system %iowait %steal %idle
8.31 0.00 2.85 0.00 0.00 88.83

avg-cpu: %user %nice %system %iowait %steal %idle
11.13 0.00 0.51 0.00 0.00 88.36