0.15.3 arm keeps restarting

tyakimov · July 22, 2019, 6:03pm

Well, the node crashed again, and now it’s in a bootloop again… Yes, it did appear to be receiving data. Remaining output before the crash from last timestamp:
https://pastebin.com/raw/XVzd6wLj

tyakimov · July 22, 2019, 6:03pm

Now it’s back to saying:
2019-07-22T18:03:10.410Z DEBUG kademlia:endpoint Successfully connected with 92.255.183.53:4002

and then restarting every minute

baker · July 22, 2019, 6:22pm

Hard to say if this is progress or not. We’ll have to wait for a dev to take a look I think.

stefanbenten · July 22, 2019, 6:34pm

@baker @tyakimov This looks like your node is not using your signed identity.
I am running a ping test now to see if i
a) can find your node in the network
b) it has a signed identity.

Please check the amount of files in your identity folder (it needs to be 6, 2 of them have a unix timestamp in their name).

stefanbenten · July 22, 2019, 6:36pm

Also i am seeing you changed settings in the config file. For reference, please post it in its entirety somewhere.

tyakimov · July 22, 2019, 6:44pm

It appears I have six files in my identities folder:
drwxrwxrwx 2 root root 4096 May 19 04:19 .
drwxr-xr-x 7 root root 4096 Jul 22 20:28 …
-rw-r–r-- 1 root root 542 May 19 04:19 ca.1558228738.cert
-rw-r–r-- 1 root root 1076 May 19 04:19 ca.cert
-rw-r–r-- 1 root root 241 May 19 04:18 ca.key
-rw-r–r-- 1 root root 1084 May 19 04:19 identity.1558228738.cert
-rw-r–r-- 1 root root 1618 May 19 04:19 identity.cert
-rw-r–r-- 1 root root 241 May 19 04:18 identity.key

Also, I haven’t done any updates to the config.yml, besides the suggested storage2.max-concurrent-requests: 7 and setting an external log file, so that I can see last logged lines before restart:
root@raspberrypi:/mnt/storagenode_appConfig# cat config.yaml
# path to the certificate chain for this identity
identity.cert-path: “identity/identity.cert”

# path to the private key for this identity
identity.key-path: "identity/identity.key"

# the public address of the Kademlia node, useful for nodes behind NAT
kademlia.external-address: ""

# operator email address
kademlia.operator.email: ""

# operator wallet address
kademlia.operator.wallet: ""

# the minimum log level to log
log.level: debug

# public address to listen on
server.address: ":28967"

# log all GRPC traffic to zap logger
server.debug-log-traffic: debug

# private address to listen on
server.private-address: "127.0.0.1:7778"

# total allocated bandwidth in bytes
storage.allocated-bandwidth: 2.0 TB

# total allocated disk space in bytes
storage.allocated-disk-space: 1.0 TB



storage2.max-concurrent-requests: 7
log.output: "/app/config/node.log"

stefanbenten · July 22, 2019, 6:49pm

I just saw that you named your storagenode in the docker run command storagenode2. Are you running two nodes on that pi?

tyakimov · July 22, 2019, 6:49pm

no im not running two nodes.

stefanbenten · July 22, 2019, 6:52pm

Okay good. did you set a memory limit to the docker process, to ensure its not forcefully killed or frezzes the OS?
From what i can see above, the node seems to behave now. In your interest, you should reduce the log level again to info and remove the log file writing.

tyakimov · July 22, 2019, 6:56pm

Nope, I’ve already posted the container startup command. The bootloop is there with or without hard-limits on the container’s memory usage. OS does not freeze, docker should never allocate this much memory to a single container.

tyakimov · July 22, 2019, 6:57pm

On 14.x I used to run a netdata container also, but I’ve stopped that since, because of all this.

tyakimov · July 22, 2019, 6:58pm

The container keeps on restarting with 700MB free all the time

stefanbenten · July 22, 2019, 7:23pm

That shows something different tho:

2019-07-22T12:10:01.737Z INFO Got a signal from the OS: “terminated”

tyakimov · July 22, 2019, 7:26pm

I’ve no statements about OOM kills being performed by the docker daemon in my dmesg kernel logging.
The container is restarting on its own whilst using 50MB of memory.

Alexey · July 22, 2019, 10:18pm

grep -B 3 -i fatal /mnt/storagenode_appConfig/node.log

Alexey · July 22, 2019, 10:23pm

sudo journalctl -r | grep shim

tyakimov · July 23, 2019, 12:01am

oh nice, i actually had shim reaped in journal. added --memory=650 to container start and reduced storage2.max-concurrent-requests: from 7 to 3. this actually has the container still running form the 1st go.

tyakimov · July 23, 2019, 12:02am

amazing. im getting audits. thanks