Hi all!
This morning, my uptimerobot
was complaining a lot as most of my nodes (but not all of them) stopped working during last night, on my Raspberry Pi 4B (Raspbian 64).
Here is what I checked on one of them:
Latest logs entries before the incident (all seems in order):
2022-10-17T01:46:44.155Z INFO piecestore download started {"Process": "storagenode", "Piece ID": "GGEEMEBDN2PD2OQFAPD3PZHB6OOIAVNVZJPRJ3BMEJCEMSN5FPBQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET"}
2022-10-17T01:46:44.352Z INFO piecestore downloaded {"Process": "storagenode", "Piece ID": "GGEEMEBDN2PD2OQFAPD3PZHB6OOIAVNVZJPRJ3BMEJCEMSN5FPBQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET"}
2022-10-17T01:46:45.571Z INFO piecestore download started {"Process": "storagenode", "Piece ID": "GGEEMEBDN2PD2OQFAPD3PZHB6OOIAVNVZJPRJ3BMEJCEMSN5FPBQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET"}
2022-10-17T01:46:45.711Z INFO piecestore downloaded {"Process": "storagenode", "Piece ID": "GGEEMEBDN2PD2OQFAPD3PZHB6OOIAVNVZJPRJ3BMEJCEMSN5FPBQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET"}
2022-10-17T01:46:46.269Z INFO piecestore download started {"Process": "storagenode", "Piece ID": "4CG44XNPVQ4JPAFC7JBS2F2UZXGU3GWKYSV2UU3RNO44OWFJJWKA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET"}
2022-10-17T01:46:47.002Z INFO piecestore downloaded {"Process": "storagenode", "Piece ID": "4CG44XNPVQ4JPAFC7JBS2F2UZXGU3GWKYSV2UU3RNO44OWFJJWKA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET"}
2022-10-17T01:46:47.595Z INFO piecestore upload started {"Process": "storagenode", "Piece ID": "I2QS2I4GMIHUA6JHPES6NDYRFWO62T6UMIFGPJ7MRPPGGRPNE35A", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Available Space": 1094477440256}
2022-10-17T01:46:49.811Z INFO piecestore download started {"Process": "storagenode", "Piece ID": "4CG44XNPVQ4JPAFC7JBS2F2UZXGU3GWKYSV2UU3RNO44OWFJJWKA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET"}
2022-10-17T01:46:50.644Z INFO piecestore uploaded {"Process": "storagenode", "Piece ID": "FG2ME4KBNHQR7GIMQVNBAO2F46CHYGUB6Q3BE7PEII5L7FAHJT6Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 73728}
2022-10-17T01:46:50.708Z INFO piecestore uploaded {"Process": "storagenode", "Piece ID": "I2QS2I4GMIHUA6JHPES6NDYRFWO62T6UMIFGPJ7MRPPGGRPNE35A", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Size": 2174464}
Latest docker log entries before the incident (all seems in order):
2022-10-17T01:22:50.389Z INFO Downloading versions. {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2022-10-17T01:22:50.866Z INFO Current binary version {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.64.1"}
2022-10-17T01:22:50.866Z INFO New version is being rolled out but hasn't made it to this node yet {"Process": "storagenode-updater", "Service": "storagenode"}
2022-10-17T01:22:50.927Z INFO Current binary version {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.64.1"}
2022-10-17T01:22:50.928Z INFO New version is being rolled out but hasn't made it to this node yet {"Process": "storagenode-updater", "Service": "storagenode-updater"}
2022-10-17T01:37:50.388Z INFO Downloading versions. {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2022-10-17T01:37:50.901Z INFO Current binary version {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.64.1"}
2022-10-17T01:37:50.901Z INFO New version is being rolled out but hasn't made it to this node yet {"Process": "storagenode-updater", "Service": "storagenode"}
2022-10-17T01:37:50.993Z INFO Current binary version {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.64.1"}
2022-10-17T01:37:50.993Z INFO New version is being rolled out but hasn't made it to this node yet {"Process": "storagenode-updater", "Service": "storagenode-updater"}
I see no entries when I run grep -i "killed" /var/log/syslog
or grep -i "killed" /var/log/messages
I also noticed my Raspberry Pi rebooted as top
shows itās been up for 3:46 so thatās weirdā¦
So maybe the system rebooted abruptly because of a power-cut. I have no way to be sure, but thatās more than probable because my ISPās box did reboot around the same time too.
So my best guess right know is a power-cut, and that some nodes did not get restarted automatically even though theyāre all configured with --restart unless-stopped
, which is suspiciousā¦
Anyway I restarted all of them, and everything seems back to normal for now: Anything else I could check to better understand why docker did not restart some of my nodes?
Cheers
PS: Iāll take this opportunity to remind new comers that they should use a service like uptimerobot
(or similar) for detecting such incidents quickly, thatās really a must-have ^^