My Storage Node has been shutdown for 47 hours... Help on diagnosis

Hi,

My Storage Node (Raspberry Pi) is hosted on my mother’s house (and I don’t live there).
On saturday (July, 4th), the server has been shutdown and my mother can’t tell me if she did something wrong or not (maybe she powered off the outlet? she doesn’t know).

Any way, I just saw that today, the server is up again (I can’t reach my mother and I think when I’ll get her, she will tell me that she doesn’t know :/).

Anyway, I would need your help to:

  • how can I know if this was due to a power outage? Maybe in the logs but I really don’t know where to look…
  • If it wasn’t due to a “physical” power outage, where and how should I investigate to find the real cause?
  • should I delete my node now and set it up again to avoid being disqualified when the DQ will be activated? My node is quite recent and I shoudn’t lose a lot of money (less than $2 held amount)
  • I tried to check the audits with the following command (storagenode5 is my storage node name) (sorry I made a screenshot because the actual command is modified by the website):

but I received the error: “Invalid numeric literal at line 1, column 10”. How can I fix the command?

Thank you so much for your help!

I am not sure how to fix your command, but maybe you could try something like docker logs storagenode5 | grep -i "audit" - there is a larger list of useful commands related to audits here: https://support.storj.io/hc/en-us/articles/360029233952-Some-statistics-from-logs

If your node has been shut down, I think it is unlikely that your problem is related to audits. It is more likely that there is some sort of configuration or migration issue on startup. The best way to figure out would just be to check the last few logs (docker logs storagenode5).

If you have not gotten failing audits and nothing is irreparably broken on your node, you should not need to worry about creating a new node. We have not implemented penalties for downtime yet, so fortunately you should be able to fix whatever the issue is and put the same node back online with no issues.

Thank you.
Actually the node was running for several weeks before crashing without changing anything on the configuration. That’s why I think it’s probably just a power outage.
I expected to find some things in the audits because if data are corrupted or something like that, audits should show some errors, am I wrong?

[EDIT] by the way, I forgot to say that the whole Pi was unreachable, not only the container (node). That’s why I was thinking about a power outage.
So, my real concern would be to make sure that my node is OK and no data is corrupted. Thanks

Maybe it’s not really meaningful (as you said, @moby) but I managed to get the audits statistics (my previous command had the wrong quotes):

"118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"
{
  "totalCount": 224,
  "successCount": 223,
  "alpha": 19.99979530012361,
  "beta": 0,
  "unknownAlpha": 19.965939999479403,
  "unknownBeta": 0.0338655356380322,
  "score": 1,
  "unknownScore": 0.9983067067537947
}
"1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"
{
  "totalCount": 1542,
  "successCount": 1541,
  "alpha": 19.99999999999995,
  "beta": 0,
  "unknownAlpha": 19.998910469221133,
  "unknownBeta": 0.0010895307788482875,
  "score": 1,
  "unknownScore": 0.9999455234610576
}
"121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"
{
  "totalCount": 2717,
  "successCount": 2712,
  "alpha": 19.99999999999995,
  "beta": 0,
  "unknownAlpha": 19.999999993823906,
  "unknownBeta": 6.17608140886572e-09,
  "score": 1,
  "unknownScore": 0.999999999691196
}
"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"
{
  "totalCount": 2947,
  "successCount": 2946,
  "alpha": 19.99999999999995,
  "beta": 0,
  "unknownAlpha": 19.99999999999995,
  "unknownBeta": 2.105247970877969e-41,
  "score": 1,
  "unknownScore": 1
}
"12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"
{
  "totalCount": 2299,
  "successCount": 2295,
  "alpha": 19.99999999999995,
  "beta": 0,
  "unknownAlpha": 19.99999996236524,
  "unknownBeta": 3.7634744885646556e-08,
  "score": 1,
  "unknownScore": 0.9999999981182628
}
"12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"
{
  "totalCount": 9961,
  "successCount": 9951,
  "alpha": 19.99999999999995,
  "beta": 0,
  "unknownAlpha": 19.99999999999995,
  "unknownBeta": 9.777971107264765e-25,
  "score": 1,
  "unknownScore": 1
}

Do you see anything wrong or soemthing I should be worried about?
Thank you again for your help

An audit score of 1 is perfect. You do not need to worry about disqualification until it gets to 0.6. Your unknownScore is almost perfect too, and similarly you would not need to worry about being suspended until it gets to 0.6. So there is no harm in keeping this node around.

As I mentioned previously, the best way to figure out what caused the node to go offline would be to look at the logs around that time to see if anything went wrong.

Do you have your node restarting automatically, or do you need to manually start it if it shuts down? If you are not running it automatically when the Pi boots up, then a power outage could explain it.

Thank you.
Below are the results of docker logs command (I filtred to see just the last logs before the crash) (sorry I don’t know how to format it correcly…):

2020-07-04T13:22:55.532Z INFO piecestore upload started {“Piece ID”: “YMWETG5CWD5GWJC7P4ADFG2AE3KUD5WBXC4LJXKXZLFWWABHI3UQ”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”, “Available Space”: 135878523008}
2020-07-04T13:22:55.778Z INFO piecestore upload started {“Piece ID”: “IIEQ6363C6O3MBWDPUMLTVFKEKZHHH276B4KX3HK2DEGPKYF72TA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”, “Available Space”: 135878521728}
2020-07-04T13:22:56.114Z INFO piecestore uploaded {“Piece ID”: “A4VJJMF6GHASXLLBSIZYEH54R6I3VH7QGHR7PXLYPGY2DNXHGZ4A”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:22:56.898Z INFO piecestore upload started {“Piece ID”: “EANZFCENE2YOAFFOALPBIC25QF25NVUGU3VEEUBOWUX5LJNHV4GA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”, “Available Space”: 135877475968}
2020-07-04T13:22:57.430Z INFO piecestore uploaded {“Piece ID”: “3MXAY5SZQXILRF74PE5TQUOXD4G72GW5QSFSSJVWHTUSQSMONNAA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:22:57.434Z INFO piecestore uploaded {“Piece ID”: “QXBK43SZS5NZWZ7T5KTRQU3QMLBOH6Y2AJT73NOPVKX6BFIGBZHA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:22:57.435Z INFO piecestore uploaded {“Piece ID”: “YMWETG5CWD5GWJC7P4ADFG2AE3KUD5WBXC4LJXKXZLFWWABHI3UQ”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”}
2020-07-04T13:22:57.436Z INFO piecestore upload canceled {“Piece ID”: “VYLK2O4ICL6AMIIGY6AM6UIY2RUQDLJ7VUH64XVEWBQOXEKR23CQ”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:22:58.781Z INFO piecestore downloaded {“Piece ID”: “HDBNBWYQBGFOXVXDQYQGQH4JX2LUVBSAD3KHH3XBFDAOL5ZF2SSQ”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “GET”}
2020-07-04T13:22:58.787Z INFO piecestore upload started {“Piece ID”: “L6I6HZGT57MY4I3C4GGLOWPK4O7LL4TYF2E3LS3BV3SRLSMEPUIA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”, “Available Space”: 135875156096}
2020-07-04T13:22:59.087Z INFO piecestore upload canceled {“Piece ID”: “IIEQ6363C6O3MBWDPUMLTVFKEKZHHH276B4KX3HK2DEGPKYF72TA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:23:03.207Z INFO piecestore upload started {“Piece ID”: “GJHJM7HX2WYEUD4G7YUWVUPESJSXWXZ7FQNZQJLAU7KNBCXYKOCQ”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”, “Available Space”: 135872836224}
2020-07-04T13:23:03.820Z INFO piecestore uploaded {“Piece ID”: “EANZFCENE2YOAFFOALPBIC25QF25NVUGU3VEEUBOWUX5LJNHV4GA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:23:04.568Z INFO piecestore upload canceled {“Piece ID”: “L6I6HZGT57MY4I3C4GGLOWPK4O7LL4TYF2E3LS3BV3SRLSMEPUIA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:23:05.996Z INFO piecestore upload started {“Piece ID”: “P2HEQQI4EY5IAD4RH6EHZMXBJHWZQ2UAKPFKKDKV2ZDN2M6KIYNA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”, “Available Space”: 135868196480}
2020-07-04T13:23:06.082Z INFO piecestore upload canceled {“Piece ID”: “GJHJM7HX2WYEUD4G7YUWVUPESJSXWXZ7FQNZQJLAU7KNBCXYKOCQ”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:23:07.025Z INFO piecestore upload started {“Piece ID”: “ULHO2YQXVDVFKMX7J66GBRJPYTCR2SX7MSJ4MIEOFYDGYAVLKLRQ”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”, “Available Space”: 135867150720}
2020-07-04T13:23:07.037Z INFO piecestore upload started {“Piece ID”: “TZIHEP7LCEWYWJODJYOHMBABT5BPIDENNEVT7VKLBJZP4BRVDYBA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”, “Available Space”: 135867150720}
2020-07-04T13:23:07.689Z INFO piecestore upload canceled {“Piece ID”: “Y7LHCBVFVJXGFLBC6IQ3LDME5Y2B3RMN6W4EJUW7EAHQWLNJUOPQ”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “PUT_REPAIR”}
2020-07-04T13:23:08.375Z INFO piecestore upload canceled {“Piece ID”: “P2HEQQI4EY5IAD4RH6EHZMXBJHWZQ2UAKPFKKDKV2ZDN2M6KIYNA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:23:09.259Z INFO piecestore upload started {“Piece ID”: “BIIDW4K7IO6DNMIHHER722P37ARBZAZBKKVTJCE77EM5BNAUGATQ”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”, “Available Space”: 135860191104}
2020-07-04T13:23:09.675Z INFO piecestore uploaded {“Piece ID”: “TZIHEP7LCEWYWJODJYOHMBABT5BPIDENNEVT7VKLBJZP4BRVDYBA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:23:10.646Z INFO piecestore upload canceled {“Piece ID”: “OUZGIN4V543APWUGCQCQ3URIHP5RBKRLJKXE663SU22EZVGPZUKQ”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “PUT_REPAIR”}
2020-07-04T13:23:10.691Z INFO piecestore uploaded {“Piece ID”: “ULHO2YQXVDVFKMX7J66GBRJPYTCR2SX7MSJ4MIEOFYDGYAVLKLRQ”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:23:12.400Z INFO piecestore uploaded {“Piece ID”: “OIEYHHZNQLAZ44UIKNKFBUK4L7U2DILLCNAY35YHCCEYZLVC7PAA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}
2020-07-04T13:23:13.498Z INFO piecestore upload started {“Piece ID”: “NKJHOTAZYUUBOEPNQYEGQTYCRVE6FKGZZZ6FTJNC4FRF3UUBPUVA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”, “Available Space”: 135853231488}
2020-07-04T13:23:14.072Z INFO piecestore uploaded {“Piece ID”: “BIIDW4K7IO6DNMIHHER722P37ARBZAZBKKVTJCE77EM5BNAUGATQ”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “PUT”}

Yes, the container is configured so it will restart automatically.

He was asking if the node container automatically starts up after the pi finished booting.

From what I get, you are referring to restarting the node if its container fails while the OS is still running fine to be able to restart it, which of course won’t help during a power outage :wink:

Sorry if I wasn’t clear. I meant that the node container automatically starts up after the pi finished booting.

1 Like

Nothing looks wrong in the logs you sent, so I think the most likely thing was that your Pi lost power and for some reason the node did not restart afterwards. Are you able to run it now?

Thank you @moby.
Yes, I am able to run it now. Actually my mom just cut the whole power (of the living room) and so, it started up this afternoon…

Anyway, thank you for your help. I am reassured: wathever the cause, it seems OK now.

1 Like