OOM Killer invoked due to slow disk I/O

Hello everybody,

I’ve seen various topics on the same issue (such as OOM killer every 4 hours), but none presented a viable solution for my problem.

The Out-Of-Memory (OOM) killer gets triggered quite often (could be a few hours, could be 2 days) by the enormous memory usage of my docker storagenode container. The RAM occupation build up exponentially on my machine, up to the point where the oom killer starts killing stuff at random. A little inspection revealed that storagenode creates, over time, more than 3 thousand different threads, the vast majority of them in IOWAIT state. I fear that the disk is too slow, and storagenode is trying to handle more data than it can.

My setup is:

  • RASPBERRY PI 4B w/ 4GB RAM
  • TOSHIBA HDTB420EK3AA, Canvio Basics, USB 3.0, 2 TB

From docker ( after 15min uptime ):
➜ ~ docker stats storagenode --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
bc12588c8d95 storagenode 13.21% 506.2MiB / 1.953GiB 25.31% 640MB / 27.2MB 748MB / 956MB 1213
➜ ~

Some stats from the same time:


And a few logs from the OOM killer:

Jun 23 20:43:32 raspberrypi kernel: storagenode invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=1, oom_score_adj=1000
Jun 23 20:43:32 raspberrypi kernel: storagenode cpuset=bc12588c8d95a128c6a36e52f1776e08050daf87c6a7f904a72bc1e41a68801b mems_allowed=0
Jun 23 20:43:32 raspberrypi kernel: CPU: 1 PID: 14121 Comm: storagenode Tainted: G         C        4.19.75-v7l+ #1270
Jun 23 20:43:32 raspberrypi kernel: Hardware name: BCM2835
Jun 23 20:43:32 raspberrypi kernel: [<c0212d10>] (unwind_backtrace) from [<c020d530>] (show_stack+0x20/0x24)
Jun 23 20:43:32 raspberrypi kernel: [<c020d530>] (show_stack) from [<c097fb20>] (dump_stack+0xd4/0x118)
Jun 23 20:43:32 raspberrypi kernel: [<c097fb20>] (dump_stack) from [<c033f6e4>] (dump_header+0x80/0x250)
Jun 23 20:43:32 raspberrypi kernel: [<c033f6e4>] (dump_header) from [<c033ea5c>] (oom_kill_process+0x358/0x3a8)
Jun 23 20:43:32 raspberrypi kernel: [<c033ea5c>] (oom_kill_process) from [<c033f38c>] (out_of_memory+0x134/0x36c)
Jun 23 20:43:32 raspberrypi kernel: [<c033f38c>] (out_of_memory) from [<c0345750>] (__alloc_pages_nodemask+0xfc0/0x1180)
Jun 23 20:43:32 raspberrypi kernel: [<c0345750>] (__alloc_pages_nodemask) from [<c021f6b0>] (copy_process.part.5+0x1f4/0x1ad4)
Jun 23 20:43:32 raspberrypi kernel: [<c021f6b0>] (copy_process.part.5) from [<c0221158>] (_do_fork+0xd8/0x438)
Jun 23 20:43:32 raspberrypi kernel: [<c0221158>] (_do_fork) from [<c02215dc>] (sys_clone+0x34/0x3c)
Jun 23 20:43:32 raspberrypi kernel: [<c02215dc>] (sys_clone) from [<c0201198>] (__sys_trace_return+0x0/0x10)
Jun 23 20:43:32 raspberrypi kernel: Exception stack(0xd11a1fa8 to 0xd11a1ff0)
Jun 23 20:43:32 raspberrypi kernel: 1fa0:                   01456770 0090f2fc 007d0f00 72470f50 72470f88 72470fe4
Jun 23 20:43:32 raspberrypi kernel: 1fc0: 01456770 0090f2fc 72470f58 00000078 7244e000 828bec30 828bed14 00020f6c
Jun 23 20:43:32 raspberrypi kernel: 1fe0: 80000000 828bebc8 0090f718 00913bac
Jun 23 20:43:32 raspberrypi kernel: Mem-Info:
Jun 23 20:43:32 raspberrypi kernel: active_anon:299558 inactive_anon:6911 isolated_anon:0
                                     active_file:54391 inactive_file:161472 isolated_file:32
                                     unevictable:4 dirty:121 writeback:10542 unstable:0
                                     slab_reclaimable:13187 slab_unreclaimable:37702
                                     mapped:40463 shmem:7207 pagetables:2325 bounce:0
                                     free:357504 free_pcp:180 free_cma:55696
Jun 23 20:43:32 raspberrypi kernel: Node 0 active_anon:1198232kB inactive_anon:27644kB active_file:217564kB inactive_file:645888kB unevictable:16kB isolated(anon):0kB isolated(file):128kB mapped:161852kB
Jun 23 20:43:32 raspberrypi kernel: DMA free:238588kB min:16384kB low:20480kB high:24576kB active_anon:0kB inactive_anon:0kB active_file:16688kB inactive_file:16680kB unevictable:0kB writepending:33168kB
Jun 23 20:43:32 raspberrypi kernel: lowmem_reserve[]: 0 0 3188 3188
Jun 23 20:43:32 raspberrypi kernel: HighMem free:1191428kB min:512kB low:18696kB high:36880kB active_anon:1198232kB inactive_anon:27644kB active_file:200804kB inactive_file:629192kB unevictable:16kB writ
Jun 23 20:43:32 raspberrypi kernel: lowmem_reserve[]: 0 0 0 0
Jun 23 20:43:32 raspberrypi kernel: DMA: 126*4kB (UEC) 523*8kB (UEC) 691*16kB (UEC) 7*32kB (UC) 10*64kB (UC) 4*128kB (C) 3*256kB (C) 5*512kB (C) 3*1024kB (C) 1*2048kB (C) 52*4096kB (C) = 238560kB
Jun 23 20:43:32 raspberrypi kernel: HighMem: 85*4kB (U) 40*8kB (U) 13*16kB (UM) 123*32kB (UM) 145*64kB (UM) 100*128kB (UM) 225*256kB (UM) 198*512kB (M) 208*1024kB (M) 13*2048kB (M) 187*4096kB (UM) = 1191
Jun 23 20:43:32 raspberrypi kernel: 113776 total pagecache pages
Jun 23 20:43:32 raspberrypi kernel: 0 pages in swap cache
Jun 23 20:43:32 raspberrypi kernel: Swap cache stats: add 0, delete 0, find 0/0
Jun 23 20:43:32 raspberrypi kernel: Free swap  = 102396kB
Jun 23 20:43:32 raspberrypi kernel: Total swap = 102396kB
Jun 23 20:43:32 raspberrypi kernel: 1012736 pages RAM
Jun 23 20:43:32 raspberrypi kernel: 816128 pages HighMem/MovableOnly
Jun 23 20:43:32 raspberrypi kernel: 12790 pages reserved
Jun 23 20:43:32 raspberrypi kernel: 65536 pages cma reserved
Jun 23 20:43:32 raspberrypi kernel: Tasks state (memory values in pages):
Jun 23 20:43:32 raspberrypi kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jun 23 20:43:32 raspberrypi kernel: [    112]     0   112     9420     2337   159744        0             0 systemd-journal
Jun 23 20:43:32 raspberrypi kernel: [    313]   100   313     5593      718    53248        0             0 systemd-timesyn
Jun 23 20:43:32 raspberrypi kernel: [    356]     0   356     6378      758    53248        0             0 rsyslogd
Jun 23 20:43:32 raspberrypi kernel: [    357]   104   357     1669      904    40960        0          -900 dbus-daemon
Jun 23 20:43:32 raspberrypi kernel: [    364] 65534   364     1080      538    32768        0             0 thd
Jun 23 20:43:32 raspberrypi kernel: [    366]     0   366     1993      588    36864        0             0 cron
Jun 23 20:43:32 raspberrypi kernel: [    367]   108   367     1476      634    40960        0             0 avahi-daemon
Jun 23 20:43:32 raspberrypi kernel: [    369]     0   369      923      175    40960        0             0 alsactl
Jun 23 20:43:32 raspberrypi kernel: [    375]     0   375     3276     1465    53248        0             0 systemd-logind
Jun 23 20:43:32 raspberrypi kernel: [    381]     0   381     2675     1017    45056        0             0 wpa_supplicant
Jun 23 20:43:32 raspberrypi kernel: [    396]     0   396     6914       20    40960        0             0 rngd
Jun 23 20:43:32 raspberrypi kernel: [    412]   108   412     1443       63    36864        0             0 avahi-daemon
Jun 23 20:43:32 raspberrypi kernel: [    427]     0   427      841      602    28672        0             0 dhcpcd
Jun 23 20:43:32 raspberrypi kernel: [    455]     0   455     7995     2385    86016        0             0 nmbd
Jun 23 20:43:32 raspberrypi kernel: [    457]     0   457     9680     3925    94208        0             0 unattended-upgr
Jun 23 20:43:32 raspberrypi kernel: [    459]     0   459    12813     3801    86016        0             0 fail2ban-server
Jun 23 20:43:32 raspberrypi kernel: [    462]     0   462   246168     8721   204800        0             0 containerd
Jun 23 20:43:32 raspberrypi kernel: [    465]     0   465   262214    18536   319488        0          -500 dockerd
Jun 23 20:43:32 raspberrypi kernel: [    467]     0   467     1077      326    28672        0             0 agetty
Jun 23 20:43:32 raspberrypi kernel: [    474]     0   474     2740      509    40960        0             0 wpa_supplicant
Jun 23 20:43:32 raspberrypi kernel: [    480]     0   480     2671     1430    45056        0         -1000 sshd
Jun 23 20:43:32 raspberrypi kernel: [    557]     0   557      535       32    28672        0             0 hciattach
Jun 23 20:43:32 raspberrypi kernel: [    563]     0   563     2452     1090    45056        0             0 bluetoothd
Jun 23 20:43:32 raspberrypi kernel: [    565]     0   565     6676      964    45056        0             0 bluealsa
Jun 23 20:43:32 raspberrypi kernel: [    815]     0   815    12086     4080   122880        0             0 smbd
Jun 23 20:43:32 raspberrypi kernel: [    815]     0   815    12086     4080   122880        0             0 smbd
Jun 23 20:43:32 raspberrypi kernel: [    875]     0   875    11355     1267   106496        0             0 smbd-notifyd
Jun 23 20:43:32 raspberrypi kernel: [    876]     0   876    11353     1016   106496        0             0 cleanupd
Jun 23 20:43:32 raspberrypi kernel: [    883]     0   883    12086     1430   110592        0             0 lpqd
Jun 23 20:43:32 raspberrypi kernel: [   1067]     0  1067   215699      823    90112        0          -500 docker-proxy
Jun 23 20:43:32 raspberrypi kernel: [   1081]     0  1081   213218      820    86016        0          -500 docker-proxy
Jun 23 20:43:32 raspberrypi kernel: [   1094]     0  1094   215523      843    98304        0          -500 docker-proxy
Jun 23 20:43:32 raspberrypi kernel: [   1123]     0  1123   199792     1512    40960        0          -999 containerd-shim
Jun 23 20:43:32 raspberrypi kernel: [   1180]     0  1180       49        1    24576        0             0 s6-svscan
Jun 23 20:43:32 raspberrypi kernel: [   1535]     0  1535       50        1    24576        0             0 s6-supervise
Jun 23 20:43:32 raspberrypi kernel: [   1761]     0  1761       50        1    24576        0             0 s6-supervise
Jun 23 20:43:32 raspberrypi kernel: [   1762]     0  1762       50        1    24576        0             0 s6-supervise
Jun 23 20:43:32 raspberrypi kernel: [   1765]     0  1765      427      143    24576        0             0 bash
Jun 23 20:43:32 raspberrypi kernel: [   1766]     0  1766      317        1    20480        0             0 crond
Jun 23 20:43:32 raspberrypi kernel: [   1776]  1001  1776     1152      506    40960        0             0 transmission-da
Jun 23 20:43:32 raspberrypi kernel: [   6169]     0  6169    11425     1992    81920        0             0 polkitd
Jun 23 20:43:32 raspberrypi kernel: [   8639]     0  8639   199808     1576    49152        0          -999 containerd-shim
Jun 23 20:43:32 raspberrypi kernel: [   8653]     0  8653     4585      969    45056        0         -1000 systemd-udevd
Jun 23 20:43:32 raspberrypi kernel: [   8687]     0  8687   201381     2442    61440        0             0 watchtower
Jun 23 20:43:32 raspberrypi kernel: [   8762]     0  8762   215763      804    86016        0          -500 docker-proxy
Jun 23 20:43:32 raspberrypi kernel: [   8858]     0  8858   199792     1464    45056        0          -999 containerd-shim
Jun 23 20:43:32 raspberrypi kernel: [   8906]     0  8906    60832     6382   159744        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [   9186]    33  9186    60856     1993   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [   9187]    33  9187    60856     1993   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [   9188]    33  9188    60856     1993   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [   9189]    33  9189    60856     1993   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [  10080]    33 10080    60856     1994   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [  10082]    33 10082    60856     1994   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [  10083]    33 10083    60856     2010   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [  10267]    33 10267    60856     1994   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [  10268]    33 10268    60856     1994   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [  10307]   999 10307    75991    17671   290816        0          1000 netdata
Jun 23 20:43:32 raspberrypi kernel: [  10309]   999 10309     4154      722    45056        0          1000 netdata
Jun 23 20:43:32 raspberrypi kernel: [  10524]   999 10524    13378     4444    77824        0          1000 python
Jun 23 20:43:32 raspberrypi kernel: [  10525]   999 10525     2389     1372    45056        0          1000 apps.plugin
Jun 23 20:43:32 raspberrypi kernel: [  10526]   999 10526   203390     4337    81920        0          1000 go.d.plugin
Jun 23 20:43:32 raspberrypi kernel: [  10762]    33 10762    60856     1994   135168        0             0 apache2
Jun 23 20:43:32 raspberrypi kernel: [   6447]     0  6447   213218      849    77824        0          -500 docker-proxy
Jun 23 20:43:32 raspberrypi kernel: [   6460]     0  6460   213218      826    90112        0          -500 docker-proxy
Jun 23 20:43:32 raspberrypi kernel: [   6468]     0  6468   199792     1551    40960        0          -999 containerd-shim
Jun 23 20:43:32 raspberrypi kernel: [   6485]     0  6485   629564   316160  4706304        0          1000 storagenode
Jun 23 20:43:32 raspberrypi kernel: [  12652]   999 12652      832      622    36864        0          1000 bash
Jun 23 20:43:32 raspberrypi kernel: Out of memory: Kill process 6485 (storagenode) score 1308 or sacrifice child
Jun 23 20:43:32 raspberrypi kernel: Killed process 6485 (storagenode) total-vm:2518256kB, anon-rss:1247116kB, file-rss:17524kB, shmem-rss:0kB
Jun 23 20:43:32 raspberrypi kernel: oom_reaper: reaped process 6485 (storagenode), now anon-rss:0kB, file-rss:376kB, shmem-rss:0kB
Jun 23 20:44:36 raspberrypi kernel: kthreadd invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=1, oom_score_adj=0
Jun 23 20:44:36 raspberrypi kernel: kthreadd cpuset=/ mems_allowed=0
Jun 23 20:44:36 raspberrypi kernel: CPU: 0 PID: 2 Comm: kthreadd Tainted: G         C        4.19.75-v7l+ #1270
Jun 23 20:44:36 raspberrypi kernel: Hardware name: BCM2835
Jun 23 20:44:36 raspberrypi kernel: [<c0212d10>] (unwind_backtrace) from [<c020d530>] (show_stack+0x20/0x24)
Jun 23 20:44:36 raspberrypi kernel: [<c020d530>] (show_stack) from [<c097fb20>] (dump_stack+0xd4/0x118)
Jun 23 20:44:36 raspberrypi kernel: [<c097fb20>] (dump_stack) from [<c033f6e4>] (dump_header+0x80/0x250)
Jun 23 20:44:36 raspberrypi kernel: [<c033f6e4>] (dump_header) from [<c033ea5c>] (oom_kill_process+0x358/0x3a8)
Jun 23 20:44:36 raspberrypi kernel: [<c033ea5c>] (oom_kill_process) from [<c033f38c>] (out_of_memory+0x134/0x36c)
Jun 23 20:44:36 raspberrypi kernel: [<c033f38c>] (out_of_memory) from [<c0345750>] (__alloc_pages_nodemask+0xfc0/0x1180)
Jun 23 20:44:36 raspberrypi kernel: [<c0345750>] (__alloc_pages_nodemask) from [<c021f6b0>] (copy_process.part.5+0x1f4/0x1ad4)
Jun 23 20:44:36 raspberrypi kernel: [<c021f6b0>] (copy_process.part.5) from [<c0221158>] (_do_fork+0xd8/0x438)
Jun 23 20:44:36 raspberrypi kernel: [<c0221158>] (_do_fork) from [<c0221528>] (kernel_thread+0x40/0x48)
Jun 23 20:44:36 raspberrypi kernel: [<c0221528>] (kernel_thread) from [<c02453a0>] (kthreadd+0x1f8/0x280)
Jun 23 20:44:36 raspberrypi kernel: [<c02453a0>] (kthreadd) from [<c02010ac>] (ret_from_fork+0x14/0x28)
Jun 23 20:44:36 raspberrypi kernel: Exception stack(0xef909fb0 to 0xef909ff8)
Jun 23 20:44:36 raspberrypi kernel: 9fa0:                                     00000000 00000000 00000000 00000000
Jun 23 20:44:36 raspberrypi kernel: 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Jun 23 20:44:36 raspberrypi kernel: 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Jun 23 20:44:36 raspberrypi kernel: Mem-Info:
Jun 23 20:44:36 raspberrypi kernel: active_anon:47402 inactive_anon:6911 isolated_anon:0
                                     active_file:53450 inactive_file:51515 isolated_file:37
                                     unevictable:4 dirty:8 writeback:8199 unstable:0
                                     slab_reclaimable:13191 slab_unreclaimable:37777
                                     mapped:36107 shmem:7207 pagetables:2325 bounce:0
                                     free:720348 free_pcp:303 free_cma:55677
Jun 23 20:44:36 raspberrypi kernel: Node 0 active_anon:189608kB inactive_anon:27644kB active_file:213800kB inactive_file:206060kB unevictable:16kB isolated(anon):0kB isolated(file):148kB mapped:144428kB
Jun 23 20:44:36 raspberrypi kernel: DMA free:238704kB min:16384kB low:20480kB high:24576kB active_anon:0kB inactive_anon:0kB active_file:16372kB inactive_file:16424kB unevictable:0kB writepending:32368kB
Jun 23 20:44:36 raspberrypi kernel: lowmem_reserve[]: 0 0 3188 3188
Jun 23 20:44:36 raspberrypi kernel: HighMem free:2642688kB min:512kB low:18696kB high:36880kB active_anon:189608kB inactive_anon:27644kB active_file:197284kB inactive_file:189824kB unevictable:16kB write
Jun 23 20:44:36 raspberrypi kernel: lowmem_reserve[]: 0 0 0 0
Jun 23 20:44:36 raspberrypi kernel: DMA: 254*4kB (UEC) 467*8kB (UC) 724*16kB (UEC) 9*32kB (UC) 2*64kB (C) 4*128kB (C) 3*256kB (C) 5*512kB (C) 3*1024kB (C) 1*2048kB (C) 52*4096kB (C) = 238704kB
Jun 23 20:44:36 raspberrypi kernel: HighMem: 11420*4kB (UM) 9422*8kB (UM) 8504*16kB (UM) 7807*32kB (UM) 5901*64kB (UM) 2853*128kB (UM) 1109*256kB (UM) 202*512kB (M) 208*1024kB (M) 13*2048kB (M) 187*4096k
Jun 23 20:44:36 raspberrypi kernel: 107687 total pagecache pages
Jun 23 20:44:36 raspberrypi kernel: 0 pages in swap cache
Jun 23 20:44:36 raspberrypi kernel: Swap cache stats: add 0, delete 0, find 0/0
Jun 23 20:44:36 raspberrypi kernel: Free swap  = 102396kB
Jun 23 20:44:36 raspberrypi kernel: Total swap = 102396kB
Jun 23 20:44:36 raspberrypi kernel: 1012736 pages RAM
Jun 23 20:44:36 raspberrypi kernel: 816128 pages HighMem/MovableOnly
Jun 23 20:44:36 raspberrypi kernel: 12790 pages reserved
Jun 23 20:44:36 raspberrypi kernel: 65536 pages cma reserved
Jun 23 20:44:36 raspberrypi kernel: Tasks state (memory values in pages):
Jun 23 20:44:36 raspberrypi kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jun 23 20:44:36 raspberrypi kernel: [    112]     0   112     9420     2337   159744        0             0 systemd-journal
Jun 23 20:44:36 raspberrypi kernel: [    313]   100   313     5593      718    53248        0             0 systemd-timesyn
Jun 23 20:44:36 raspberrypi kernel: [    356]     0   356     6378      758    53248        0             0 rsyslogd
Jun 23 20:44:36 raspberrypi kernel: [    357]   104   357     1669      904    40960        0          -900 dbus-daemon
Jun 23 20:44:36 raspberrypi kernel: [    364] 65534   364     1080      538    32768        0             0 thd
Jun 23 20:44:36 raspberrypi kernel: [    366]     0   366     1993      588    36864        0             0 cron
Jun 23 20:44:36 raspberrypi kernel: [    367]   108   367     1476      634    40960        0             0 avahi-daemon
Jun 23 20:44:36 raspberrypi kernel: [    369]     0   369      923      175    40960        0             0 alsactl
Jun 23 20:44:36 raspberrypi kernel: [    375]     0   375     3276     1465    53248        0             0 systemd-logind
Jun 23 20:44:36 raspberrypi kernel: [    381]     0   381     2675     1017    45056        0             0 wpa_supplicant
Jun 23 20:44:36 raspberrypi kernel: [    396]     0   396     6914       20    40960        0             0 rngd
Jun 23 20:44:36 raspberrypi kernel: [    412]   108   412     1443       63    36864        0             0 avahi-daemon
Jun 23 20:44:36 raspberrypi kernel: [    427]     0   427      841      602    28672        0             0 dhcpcd
Jun 23 20:44:36 raspberrypi kernel: [    455]     0   455     7995     2385    86016        0             0 nmbd
Jun 23 20:44:36 raspberrypi kernel: [    457]     0   457     9680     3925    94208        0             0 unattended-upgr
Jun 23 20:44:36 raspberrypi kernel: [    459]     0   459    12813     3801    86016        0             0 fail2ban-server
Jun 23 20:44:36 raspberrypi kernel: [    462]     0   462   246168     8721   204800        0             0 containerd
Jun 23 20:44:36 raspberrypi kernel: [    465]     0   465   262214    18536   319488        0          -500 dockerd
Jun 23 20:44:36 raspberrypi kernel: [    467]     0   467     1077      326    28672        0             0 agetty
Jun 23 20:44:36 raspberrypi kernel: [    474]     0   474     2740      509    40960        0             0 wpa_supplicant
Jun 23 20:44:36 raspberrypi kernel: [    480]     0   480     2671     1430    45056        0         -1000 sshd
Jun 23 20:44:36 raspberrypi kernel: [    557]     0   557      535       32    28672        0             0 hciattach
Jun 23 20:44:36 raspberrypi kernel: [    563]     0   563     2452     1090    45056        0             0 bluetoothd
Jun 23 20:44:36 raspberrypi kernel: [    565]     0   565     6676      964    45056        0             0 bluealsa
Jun 23 20:44:36 raspberrypi kernel: [    815]     0   815    12086     4080   122880        0             0 smbd
Jun 23 20:44:36 raspberrypi kernel: [    875]     0   875    11355     1267   106496        0             0 smbd-notifyd
Jun 23 20:44:36 raspberrypi kernel: [    876]     0   876    11353     1016   106496        0             0 cleanupd
Jun 23 20:44:36 raspberrypi kernel: [    883]     0   883    12086     1430   110592        0             0 lpqd
Jun 23 20:44:36 raspberrypi kernel: [   1067]     0  1067   215699      823    90112        0          -500 docker-proxy
Jun 23 20:44:36 raspberrypi kernel: [   1081]     0  1081   213218      820    86016        0          -500 docker-proxy
Jun 23 20:44:36 raspberrypi kernel: [   1094]     0  1094   215523      843    98304        0          -500 docker-proxy
Jun 23 20:44:36 raspberrypi kernel: [   1123]     0  1123   199792     1512    40960        0          -999 containerd-shim
Jun 23 20:44:36 raspberrypi kernel: [   1180]     0  1180       49        1    24576        0             0 s6-svscan
Jun 23 20:44:36 raspberrypi kernel: [   1535]     0  1535       50        1    24576        0             0 s6-supervise
Jun 23 20:44:36 raspberrypi kernel: [   1761]     0  1761       50        1    24576        0             0 s6-supervise
Jun 23 20:44:36 raspberrypi kernel: [   1762]     0  1762       50        1    24576        0             0 s6-supervise
Jun 23 20:44:36 raspberrypi kernel: [   1765]     0  1765      427      143    24576        0             0 bash
Jun 23 20:44:36 raspberrypi kernel: [   1766]     0  1766      317        1    20480        0             0 crond
Jun 23 20:44:36 raspberrypi kernel: [   1776]  1001  1776     1152      506    40960        0             0 transmission-da
Jun 23 20:44:36 raspberrypi kernel: [   6169]     0  6169    11425     1992    81920        0             0 polkitd
Jun 23 20:44:36 raspberrypi kernel: [   8639]     0  8639   199808     1576    49152        0          -999 containerd-shim
Jun 23 20:44:36 raspberrypi kernel: [   8653]     0  8653     4585      969    45056        0         -1000 systemd-udevd
Jun 23 20:44:36 raspberrypi kernel: [   8687]     0  8687   201381     2442    61440        0             0 watchtower
Jun 23 20:44:36 raspberrypi kernel: [   8762]     0  8762   215763      804    86016        0          -500 docker-proxy
Jun 23 20:44:36 raspberrypi kernel: [   8858]     0  8858   199792     1464    45056        0          -999 containerd-shim
Jun 23 20:44:36 raspberrypi kernel: [   8906]     0  8906    60832     6382   159744        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [   9186]    33  9186    60856     1993   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [   9187]    33  9187    60856     1993   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [   9188]    33  9188    60856     1993   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [   9189]    33  9189    60856     1993   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [  10080]    33 10080    60856     1994   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [  10082]    33 10082    60856     1994   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [  10083]    33 10083    60856     2010   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [  10267]    33 10267    60856     1994   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [  10268]    33 10268    60856     1994   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [  10307]   999 10307    75991    17671   290816        0          1000 netdata
Jun 23 20:44:36 raspberrypi kernel: [  10309]   999 10309     4154      722    45056        0          1000 netdata
Jun 23 20:44:36 raspberrypi kernel: [  10524]   999 10524    13378     4444    77824        0          1000 python
Jun 23 20:44:36 raspberrypi kernel: [  10525]   999 10525     2389     1372    45056        0          1000 apps.plugin
Jun 23 20:44:36 raspberrypi kernel: [  10526]   999 10526   203390     4337    81920        0          1000 go.d.plugin
Jun 23 20:44:36 raspberrypi kernel: [  10762]    33 10762    60856     1994   135168        0             0 apache2
Jun 23 20:44:36 raspberrypi kernel: [   6447]     0  6447   213218      849    77824        0          -500 docker-proxy
Jun 23 20:44:36 raspberrypi kernel: [   6460]     0  6460   213218      826    90112        0          -500 docker-proxy
Jun 23 20:44:36 raspberrypi kernel: [   6468]     0  6468   199792     1551    40960        0          -999 containerd-shim
Jun 23 20:44:36 raspberrypi kernel: [   6565]     0  6485   629564       94  4706304        0          1000 storagenode
Jun 23 20:44:36 raspberrypi kernel: [  12652]   999 12652      832      622    36864        0          1000 bash
Jun 23 20:44:36 raspberrypi kernel: Out of memory: Kill process 10307 (netdata) score 1016 or sacrifice child
Jun 23 20:44:36 raspberrypi kernel: Killed process 10524 (python) total-vm:53512kB, anon-rss:11644kB, file-rss:6132kB, shmem-rss:0kB
Jun 23 20:44:36 raspberrypi kernel: oom_reaper: reaped process 10524 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Jun 23 20:44:36 raspberrypi kernel: bash invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=1, oom_score_adj=1000
Jun 23 20:44:36 raspberrypi kernel: bash cpuset=/ mems_allowed=0
Jun 23 20:44:36 raspberrypi kernel: CPU: 2 PID: 12652 Comm: bash Tainted: G         C        4.19.75-v7l+ #1270

From sudo lsusb -v, I checked that the Toshiba is connected to the USB3 port. The filesystem is BTRFS, mounted with noatime and autodefrag.

It happens that storagenode ignores the RAM limit on the container, and just continues to build up ram up until the OOM kills everything. In fact, given the IO latency, the OOM seems to be unable to kill storagenode right away. I’ve also tried to setup a cronjob that restarts the storagenode container if it uses more than 1.5G of ram, but to no use. I can’t continue to run my node in this way anymore, and am forced to consider upgrading the HDD. However… I thought that STORJ was meant to be used with ANY setup, and to reuse old parts. This memory leak problem (caused, I think, by uncontrollable process proliferation) is really a game stopper for some setups. Can’t we think of a way to just pause the node until memory goes back under control?

Thank you in advance.

Hi @belegur

Welcome to our forum!

Please make sure that your installation of the docker has the right configuration to support limits.
You can check your os log for:
WARNING: Your kernel does not support swap limit capabilities. Limitation discarded.

if it true, please follow docker post-installation steps

  1. Log into the Ubuntu or Debian host as a user with sudo privileges.
  2. Edit the /etc/default/grub file. Add or edit the GRUB_CMDLINE_LINUX line to add the following two key-value pairs:
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"

Save and close the file.
3. Update GRUB.

$ sudo update-grub

If your GRUB configuration file has incorrect syntax, an error occurs. In this case, repeat steps 2 and 3.The changes take effect when the system is rebooted.

Also, I can recommend a turn off the swap file if it persists on your system.
swapoff -a and comment line with swap mount option on /etc/fstab

Another (not recommended) option is to add --oom-kill-disable to your docker run.... string

1 Like

the memory issue is likely because of your IOwait…
because you hdd’s cannot keep up with the random IO then it will start to store writes in memory and it will slowly accumulate… or atleast thats what i would assume is what happens…

its not a memory leak problem… its most likely a storage media latency problem…

1 Like

Thanks!

I enabled the cgroups, but without the swap limitation. I only have 100MB of swap anyway, I might as well turn it off. I’ll let you know if this changes anything!

1 Like

That’s what I fear it is. But, is this really an unsolvable problem? Is there no way to tell the process… to wait, until the writes and reads are done?

1 Like

I also can recommend change BTRFS to EXT4 ASAP, it can prevent many issues that you will have in the future…

1 Like

make another node on another drive and the incoming assigned load should be distributed between them… thus making each share 50%

then i would suspect the IOwait would drop… atleast enough so that the drive might have a chance to keep up… also don’t make vm’s swap if at all possible… and only swap to ssd’s obviously…

really swapping is bad… real bad it tells you that you most likely doesn’t have enough ram…
sure it might be needed for some applications but really you don’t want it at all it will slow everything down to a crawl… ofc if ram is to expensive or one rarely does stuff that needs that much ram then swap can be fine…

my storagenode using 200mb ram… even after running for 11 days now…
i’m running docker on debian, the node is 10tb + going on 5 months, and my docker install might be 5 months old xD
can’t remember if i updated it or not… however i see no signs of a memory leak… oh and running zfs here…

are you sure a Rpi is enough to run btrfs well… i mean at times my 16 threads @ 2100ghz can be 50-60% utilized when doing scrubs and such in zfs…

1 Like

I found 2 reviews that mention this HDD being SMR.

2 Likes

since it’s smr then setup a nice bite of ssd cache for it … maybe 16gb
something which btrfs should be perfectly capable at.

and maybe add a second node… which should half the pressure on the smr drive…

Please, check your databases on any case:

I was having an issue with my rock64 (similar to pi4) having huge IOwaits under heavy load. What seemed to happen is some operation would hit the HDD (garbage collection perhaps), and then the requests would start piling up. When there was a heavy load on the network, this backlog would eventually overwhelm the system and OOM killer would kick in. I found that setting the max concurrent requests to 40 alleviated this problem. You would need to add this line to your config.yaml file (or un-comment if it’s already there) then restart the node:

storage2.max-concurrent-requests: 40

With this setting I am no longer having any OOM problems, since the node will stop accepting requests when it gets overwhelmed. It doesn’t happen too often, and only for short periods. With a setting of 40 I was getting an acceptance rate of 92%. I recently upped the limit to 50, and have an acceptance rate of 96%. I think this setting is a good trade off for low powered nodes as a stop-gap measure in case your HDD is getting thrashed. Keep in mind the value of 40 might not be optimal for you, and you should do some testing to figure out where the sweet spot is.

1 Like

i’ve run on 20 for a few months and that worked fine… barely ever rejected anything anyways… but still helped my system run much better on avg… sometimes satellites sort of just decide to drown a node with unreasonable numbers of requests.

In order to speed everything up, I moved the dbs to the SD card: but this turned out only to worsen the situation unfortunately! Good advice though, keeping the dbs clean made their size drop as well

Thanks for the tip, I didn’t know that having more nodes could help. Will check on that, and thanks for the feedback on your setup

This turned out to be a game changer. I set it up to 40, then 70, and the ram occupancy is at bay!

I’ve now tried to increase it a little bit, in order to see where I can get. Is there a recommended max? I thought it was by default set at 7, but apparently (at least in my setup default) it was 0 (unlimited).

Will continue to monitor the situation, but it seems that this is the trick!

Thank you so much for your insight!

2 Likes

its a bit of a stop gap measure tho… it solves some issues it seems, but creates others like the db locked stuff…

if 70 can make it run then great… seems very high… at those ranges it really shouldn’t do much… but i duno… tested it a lot… it seems to have an effect, even tho some claim it doesn’t…

i ended up running at 20… i think 15 was the point where my system started to reject stuff often… and at 20 after it was all settled it would almost never reject anything… but it made a big difference until i got my hardware issues solved…