Help with node that goes offline

mart · May 6, 2024, 9:20pm

hello

I was running a node without issues for months and it decided to stop on its own today.

I can restart it from windows → services and it will run, say, an hour or couple of hours then stops again…

it’s pretty annoying, any idea what’s causing this ? Is it software only ?

Could it be related to the physical network like a bad plug or wire ? Let me explain :

Before I had two nodes on 2 separated computers, one, in an another room was perfectly fine but the second one, on another computer but on the same plug than today, had the same issue.

It’s really frustrating… if someone can help me, I’ll be grateful.

Thanks by advance

snorkel · May 6, 2024, 9:30pm

Without providing more info, you can’t expect anyone recommending you something.
First you should check the logs and see if there are any errors. Second comeback with the last 20 lines and details about your setup, etc.

Alexey · May 7, 2024, 4:15am

Hello @mart,
Welcome to the forum!

Please search for FATAL errors in your logs, or post here 20 last lines of the log after the crash.

mart · May 7, 2024, 5:59pm

Hello again,

When I’m searching for Fatal the days the node stopped, I only found these type of errors. Like 4 of them this 05th of may, for example. I found nothing today.

2024-05-05T21:44:48+02:00 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78”}

snorkel · May 7, 2024, 8:14pm

You must increase the writability timeout in config to 2 min or more and restart the node. Your system is to slow and writes takes to much time. You should impruve your setup. This usualy happens on USB connected drives, SMR drives, maybe 5400 RPM drives.
Changing the setting just solves the restart issue but not the cause. You are loosing many races and the db-es can get stuck if the drive is too slow.
Search for the thread “Fatal error on my node”.

Alexey · May 10, 2024, 1:02pm

mart · May 11, 2024, 12:35pm

Thank you guys, I’ll take a look !