Windows Node keeps crashing

Walter1 · July 27, 2023, 7:41am

Hello together,

since some STORJ-Versions the Windows Node keeps crashing. It runs for like a day, then crashes and I need to restart it so it lasts another day. I checked the logs and they looked fine with all the INFO messages with sucessful up- and downloads.

Can you tell what may be the reason for the constant crashes?

Thanks and kind regards,

Vadim · July 27, 2023, 7:44am

can you show the logs, it is in program Files\Storj\Storagenode
Search Fatal Errors

Walter1 · July 27, 2023, 7:44am

Just now I deleted the logs. Am still curious how they just keep increasing without limitation. Need to wait then till next crash.

Vadim · July 27, 2023, 7:46am

so when it crash, check the logs, last lines should be with errors

Alexey · July 27, 2023, 7:53am

I guess it’s likely the same issue as discussed in

And solution would be:

Stop the node
Check the disk for errors and fix them
Run defragmentation, make sure that the automated defragmentation is enabled for this drive
Check for timeouts after a while
If the node would stop with a timeout again - increase the related timeout and restart

If the timeout would be too great (more or equal 5 minutes), it’s time to check what’s wrong with that drive or your setup, perhaps it has hidden issues, which should be addressed.

daki82 · July 27, 2023, 11:19am

Please set the node service in windows to restart after x minutes. After errors Eg: 20
Configure uptimerobot with check all x-3 min.

So you dont need to do it manualy.

Also follow and read Alexeys link.

Most likely the timeout error.

Could be databases.

Post version running and screenshot of percentages audit suspension and online.
Also log with the fatal line.

For unskilled SNO maybe set loglevel to error or fatal.

Walter1 · July 27, 2023, 11:50am

Thanks for your answer. Do you mean this answer:

the minimum log level to log

log.level: info

then to “error” or to “fatal”?

Thanks and kind regards,

daki82 · July 27, 2023, 11:58am

Yes.

Error=much less log but still a lot for me

Fatal=no logs until it crashes. Im personaly fine with that.

The other option is using the logrotate script if that is on your skill level.

Walter1 · July 28, 2023, 7:16am

It is this error message and the node crashed again:

FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:169\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:161\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

Am now checking the disk with the command:

chkdsk E: /F

It must be written in capital letters otherwise it can’t find it.

Edit: How often should the defragmentation run? Once in a month is enough?

Alexey · July 28, 2023, 7:28am

The default schedule is fine. I do not remember what Windows uses by default, never changed it.

Walter1 · July 28, 2023, 7:41am

It is weekly and I didn’t change it.

Alexey · July 28, 2023, 7:52am

Then perhaps your node collected enough data to slow down your disk (if it doesn’t have issues though), so you may try to increase a timeout for a writeability check in your case.

Walter1 · July 28, 2023, 7:55am

Can you tell which line it is exactly in the config.yaml?

Alexey · July 28, 2023, 8:05am

You need to add a new parameter, if it doesn’t exist

save the config, and restart the node.
Use a Notepad++ editor to edit the config file and do not forget to save it.

daki82 · July 28, 2023, 8:51am

since it will take ages to defrag anyway, i would do it once a 1TB filled up every new TB.
maybe al 2-3 month

Walter1 · July 28, 2023, 9:01am

OK did it and pasted it above the other storage2 entries.

Am now curious if it passes trough stable. The defragmentation is also still runnning but shouldn’t be a problem parallel. Right?

daki82 · July 28, 2023, 9:12am

on healthy node no problem.
on slow node you can set it to full for less disturbing write while defrag.

atm my node with databases on ssd fragments at 0.2% per 50GB disk space used.or al 2days
so all 40-60 days defrag is ok i think. maybe 3 times a year also.

on my 7.8TB node with databases on disk. defrag will run for one whole week.

Walter1 · July 28, 2023, 9:14am

I have also like ~8 TB and with database on it. That’s why I turned it on before the defragmentation is complete. ^^

arrogantrabbit · July 28, 2023, 7:16pm

Why are we discussing defragmentation here? Disable it. Vast majority of files are smaller than sector size.

Alexey · July 29, 2023, 7:08am

This is a bad practice, because the regular defragmentation will be shorter than if you run it once in several months especially on the storage location, where pieces moves very often (if it still has a free space).

right, it has a low priority.

on SSD the fragmentation has almost no impact on latency, since there is no mechanical moves.

I strongly disagree. Unlike ext4 the NTFS fs has a serious impact on latency due fragmentation.
So, please do not disable it for the storage location.

To support my opinion I can invite @Vadim to confirm that defragmentation has fixed an issue with disk timeouts for his setup.