Weird node behaviour

i dont know if my node has a serious problem rn but this is what is happening.
I am using a rasberry pi and yesterday i saw a really weird and fast blinking in the green light of the ethernet cable so i tried opening dashboard from local network.
This didnt work while page was taking forever to open (i was waiting for more than a minute) so i connected with ssh and restarted the device. After 3-4 minutes node was back online and everything ok. After some hours i checked again and now it looked like it has crashed. The green light on the pi was open but not blinking along with the red light. At this time i couldnt connect with ssh and the device was propably shut off. took the power off waited for some time and started again and now i saw that bandwidth usage has been dramatically increased yesterday and while the online score is still good the Audit on the us2.storj.io:7777 went to 78%.

What all those mean ? What should i do?

Same for me on my raspberry !
My audit scores were 100% and they just dropped this morning. And it is so slow to use the dashboard rn that it’s nearly useless… But something linked to the node is pumping hard on m’y broadband because Download is slow on the raspberry now whereas it’s not when I stop the node… When I check the log it says that I have 1 successful audit and 0 error so I don’t understand… I even got kicked out of one of the satellite because of it…
It’d be great if there was a solution for this soon before we all get kicked out because of wrongly-based audit…

Hello @Matth92110 ,
Welcome to the forum!

@YourHelper1 , @Matth92110 the problem could be related to high activity on your nodes and exhausting RAM.
For raspi 3 we uses a RAM limit parameter --memory=800m (because 3th generation have only 1GB of RAM). Perhaps you need to specify it too. If you have GUI on your OS, the RAM limiter should be even lower, the GUI is consuming too much RAM but usually not needed - you can do everything from the CLI, so it’s better to avoid installing GUI on your OS on these small computers.

This could be related to lost or corrupted pieces or timeouts. I would recommend to use scripts from this article to figure out, what’s the reason: https://support.storj.io/hc/en-us/articles/360042257912-Suspension-mode
Also, keep in mind, if you did not redirected logs, they could be deleted with the container (when you call docker rm storagenode or with the automatic update) and you will not see errors which were before deletion.

Hey Alexey !
I added the line in the parameters. I always using the CLI and it was working perfectly since April.
Even with the memory cap added, my node has still a weird behaviour (it use a lot of the bandwidth and keep restarting from time to time). What I don’t understand is that it was working perfectly for months without it…
Indeed I always use rm storagenode when restarting so I would have a complete view of my logs…
I think there is something else there than just the memory cap, because I never had a problem before because of it.

It could be your disk is dying or become too slow. If that happen, the storagenode will try to keep data in memory, and when the RAM is exhausted, the OS will kill the container.

How is your disk connected? Is it SMR?

Hello @Alexey thanks for the fast reply!
I have specified this parameter to use less RAM but i just cheked it with htop and i am only using 667MB of the 3.65GB ram …

My disk is connected through a self powered usb hub connected to the raspi.
I use the raspi as a mediacenter (on a ssd separated from the hdd used for the node) as well and could benefit from it for months (I started it in April) but since Sunday I can’t use it properly at all.
I won’t be able to let the node affect the raspi performance too long so if there is no fix I will shut it off pretty soon even if the project was really interesting to me. Anyway if I don’t stop it I’ll probable get kicked out anyway seeing audit number dropping fast now…

The symptoms you are seeing could also be signs of a dying SD card. I would check dmesg for issues related to the boot drive.

You should really look into booting off something other then a SDcard for the future because they cannot handle the amount of reads and writes 24/7.

@baker @deathlessdd dmesg shows:

[10755.169442] INFO: task storagenode:3304 blocked for more than 120 seconds.
[10755.169458] Tainted: G C E 5.11.0-1016-raspi #17-Ubuntu
[10755.169474] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[10755.169487] task:storagenode state:D stack: 0 pid: 3304 ppid: 3199 flags:0x00000800
[10755.169510] Call trace:
[10755.169519] __switch_to+0xb8/0xe4
[10755.169538] __schedule+0x2bc/0x7dc
[10755.169554] schedule+0x7c/0x110
[10755.169570] schedule_preempt_disabled+0x30/0x4c
[10755.169586] __mutex_lock.constprop.0+0x188/0x570
[10755.169604] __mutex_lock_slowpath+0x1c/0x30
[10755.169620] mutex_lock+0x54/0x60
[10755.169637] lock_rename+0x3c/0xdc
[10755.169652] do_renameat2+0x228/0x480
[10755.169670] __arm64_sys_renameat+0x64/0x80
[10755.169687] el0_svc_common.constprop.0+0x88/0x220
[10755.169707] do_el0_svc+0x30/0xa0
[10755.169726] el0_svc+0x20/0x30
[10755.169740] el0_sync_handler+0x1a4/0x1b0
[10755.169755] el0_sync+0x184/0x1c0

Do you think i should try with another SD?
Better to boot with a UDB?

I am no expert on linux, but this article indicates that it is often related to I/O issues:
Some explanation to some errors and warnings - Helpful.

Is your disk SMR? See PSA: Beware of HDD manufacturers submarining SMR technology in HDD's without any public mention

If not - what is in your storagenode’s logs How do I check my logs? - Storj Docs before it hanging/stopped?

This doesn’t sounds good. Please, check your logs for errors related to GET_AUDIT and/or GET_REPAIR: https://support.storj.io/hc/en-us/articles/360042257912-Suspension-mode
You can shutdown the node to be not disqualified until we figure out, why it’s failing audits.

@baker I/O issues… ok… so what i can do???

@CutieePie i don’t know why u deleted this but thanks for your time i am gonna answer it anyway :wink:

user@pi:~$ free
              total        used        free      shared  buff/cache   available
Mem:        3829664     2983860      242324       12128      603480      657192
Swap:       1048572      372992      675580
user@pi:~$ lsusb -t
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 4: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 12M
        |__ Port 4: Dev 3, If 1, Class=Human Interface Device, Driver=usbhid, 12M
user@pi:~$ cat /boot/firmware/cmdline.txt
dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash
user@pi:~$ sudo dmesg |grep -i uas
[    0.808345] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    2.968327] usbcore: registered new interface driver uas
user@pi:~$ sudo dmesg |grep -i mq-deadline
[    0.874524] io scheduler mq-deadline registered
user@pi:~$

And finally:

I have a SD card from which i boot ubuntu but i use an external 4TB DISK to store all the storj data.

@Alexey if you provide a solution plz tell me a way that will keep my data and won’t throw away the 5 months that i am already hosting the node…

Do you see audits failing too?
If yes - search for failed audits/repair in your logs: Suspension mode – Storj

Also, please, search for OOM:

journalctl | grep -i oom | tail

Those are the last logs:

2021-08-24T19:05:27.164Z	ERROR	piecestore	failed to add bandwidth usage	{"error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:711\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:430\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:209\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:102\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:95\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2021-08-24T19:05:27.165Z	INFO	piecestore	uploaded	{"Piece ID": "I4WZRY7JIKXAOF7644KLU26EWWU6LTZXWWBVJNONANZSIG4MCK2A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 36608}
2021-08-24T19:05:27.552Z	INFO	piecestore	upload started	{"Piece ID": "7KJXGGFU6OZ4Y3J46YDS2FXRMIIMBKHQ27GOB2P3OHGB64EGLNKQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 3160304843264}
2021-08-24T19:05:28.216Z	INFO	piecestore	uploaded	{"Piece ID": "ZD7O3XREU5VKR76OPOJMPYSUJHJVSABTKVR5V6VZUIN763SQQBAA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT_REPAIR", "Size": 2816}
2021-08-24T19:05:29.308Z	INFO	piecestore	upload started	{"Piece ID": "R3VX6JGFXIM7IANUJOKKOOF4U2Z3FOHCUG5TVVS7KTBKDNPH6HTQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 3160304747776}
2021-08-24T19:05:29.461Z	INFO	piecestore	upload started	{"Piece ID": "ZHY35R4IAP7FQRIXQODKXIGO52KUTHOEOYWLHQQ7NAXXG2LOQUNA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Available Space": 3160304747776}
2021-08-24T19:05:31.135Z	INFO	piecestore	uploaded	{"Piece ID": "424YMJHHYJJ4KUAIXEG33VDEPNANK4AEFVMUQKBMVG3THMJ4PTQQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 36864}
2021-08-24T19:05:32.345Z	ERROR	piecestore	failed to add bandwidth usage	{"error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:711\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:430\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:209\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:102\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:95\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2021-08-24T19:05:32.345Z	INFO	piecestore	uploaded	{"Piece ID": "G4TY7563MRZHY4DX5ZNOM7CDGY667FD5Z634IFSR3CPP6EWGB4BA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 17920}
2021-08-24T19:05:33.000Z	INFO	piecestore	upload started	{"Piece ID": "5FVLYFY452W2TU3UM2FVP4E2S5LQEBPE7XE5R24HDJDCID7ZHKZQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 3160304746752}
2021-08-24T19:05:33.003Z	INFO	piecestore	upload started	{"Piece ID": "SXPCJZBNDXUNCKUT7ZMGUNBPAMEBX3L3LSBDWJN2BCCQZR6NI4EA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 3160304711680}
2021-08-24T19:05:34.706Z	INFO	piecestore	upload started	{"Piece ID": "KXQSJRYEZRIRJPJQYSR4DV4CXVTD324C7HYOG4M24V6DTNDA2BUA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 3160304664832}
2021-08-24T19:05:34.896Z	INFO	piecestore	upload started	{"Piece ID": "ADFCZFIC7DNNLDFNQCPLOP2CNVUPYSL7OVNS7PMNRW34MCLFFK6A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 3160304651520}
2021-08-24T19:05:36.582Z	INFO	piecestore	upload started	{"Piece ID": "TGPXQEPBEQOBKAH6U3GSE3D7L4B7WANQSWOUVSJ6GNK4CSVLJIFA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 3160304651520}
2021-08-24T19:05:37.105Z	INFO	piecestore	uploaded	{"Piece ID": "OQXEC3LG4CALVPM633Q3JETDE2VAM32G6GINBQAZAVYMACPQFVSA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 36096}

I see some errors like this one
Is this a fail audit?

Also your command result:::

@pi:~$ journalctl | grep -i oom | tail 
Ιουλ 30 14:52:08 pi containerd[795]: time="2021-07-30T14:52:08.106994943+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Ιουλ 30 15:55:01 pi containerd[807]: time="2021-07-30T15:55:01.908350903+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Αυγ 06 01:26:20 pi containerd[792]: time="2021-08-06T01:26:20.510632282+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Αυγ 14 16:22:40 pi containerd[806]: time="2021-08-14T16:22:40.602477180+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Αυγ 14 16:33:12 pi containerd[812]: time="2021-08-14T16:33:12.196503338+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Αυγ 15 11:46:03 pi containerd[797]: time="2021-08-15T11:46:03.964065541+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Αυγ 20 18:12:00 pi containerd[856]: time="2021-08-20T18:12:00.322401089+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Αυγ 21 19:22:04 pi containerd[793]: time="2021-08-21T19:22:04.456840623+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Αυγ 22 02:18:23 pi containerd[816]: time="2021-08-22T02:18:23.288161545+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
Αυγ 24 14:53:04 pi containerd[814]: time="2021-08-24T14:53:04.511129750+03:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"

Suggests that problem is in disk for sure - it cannot handle the load.

Is your node deployed to k8s? If so, what is storage driver you have used? Is it on PVC or nodePath or something other?

I am using a rasberry pi 4B with a HDD Toshiba Canvio Basic 2.5" - 4TB - USB 3.0 - Black connected in usb 3.0 port. The disk light is actually blinking really fast so yeah maybe it is the disk problem. I just loaded the dashboard on the local network and it seems like my node restarted by itslef…

STATUS

Online

UPTIME

28m

LAST CONTACT

3m ago

VERSION

v1.36.1

PERIOD

August




us2.storj.io:7777

Suspension

100 %

Audit

85.74 %

Online

94.15 %

saltlake.tardigrade.io:7777

Suspension

100 %

Audit

95 %

Online

93.89 %

ap1.storj.io:7777

Suspension

100 %

Audit

100 %

Online

100 %

us1.storj.io:7777

Suspension

100 %

Audit

90.25 %

Online

93.73 %

eu1.storj.io:7777

Suspension

100 %

Audit

90.25 %

Online

93.33 %

europe-north-1.tardigrade.io:7777

Suspension

100 %

Audit

100 %

Online

100 %

this is my dashboard…
It takes longer than ever but at least i was able to see it.
What i did yesterday is adding a 32GB flash as swap to my system which seem to helps a bit as i see from htop

But about the hard disk, what can i do?

Seems it’s a SMR disk

It would be better to avoid SMR, if that not possible, then try to move databases to more fast disk:

If you do not have more fast disk, then there is another option - add a new node with a separate disk, new generated identity, new authorization token and different external ports (the right part of the port mapping should not be changed). It will spread the same load to two nodes instead of one.

Adding a new node is not possible :slight_smile:

Can you suggest another disk type?
What should i look for at it’s specs??

I never thought the disk would face any problem and the only think i looked for was the connection so when i saw usb 3.1 i thought it would be fast enough :frowning:

@CutieePie i tried this storage2.max-concurrent-requests: 5 and reboot the device but my node now isn’t even starting. You mean i have to start all over again if i want to do this???

Guys @Alexey @CutieePie is there any way of saving thinks without throwing away the 5 months that my node is already running???
Plz focus on THIS
THIS IS THE QUESTION

You don’t need to start over. If you can check your node logs, it should give an indication as to why the node isn’t starting. It could just be a typo in the config file.

There should be a way to get your node running reliably again. Is your USB drive connected directly to the pi, or through a hub? I run my nodes on a rock64 (similar to the pi4) and after I encountered some problems I found my 4TB USB 3.0 disk wasn’t actually able to get enough power from the board itself. After I added an externally powered USB hub in between the drive and my board, I stopped having random corruption issues.