Windows Node keeps crashing

Hello together,

since some STORJ-Versions the Windows Node keeps crashing. It runs for like a day, then crashes and I need to restart it so it lasts another day. I checked the logs and they looked fine with all the INFO messages with sucessful up- and downloads.

Can you tell what may be the reason for the constant crashes?

Thanks and kind regards,

can you show the logs, it is in program Files\Storj\Storagenode
Search Fatal Errors

1 Like

Just now I deleted the logs. Am still curious how they just keep increasing without limitation. Need to wait then till next crash.

so when it crash, check the logs, last lines should be with errors

1 Like

I guess it’s likely the same issue as discussed in

And solution would be:

  1. Stop the node
  2. Check the disk for errors and fix them
  3. Run defragmentation, make sure that the automated defragmentation is enabled for this drive
  4. Check for timeouts after a while
  5. If the node would stop with a timeout again - increase the related timeout and restart

If the timeout would be too great (more or equal 5 minutes), it’s time to check what’s wrong with that drive or your setup, perhaps it has hidden issues, which should be addressed.

2 Likes

Please set the node service in windows to restart after x minutes. After errors Eg: 20
Configure uptimerobot with check all x-3 min.

So you dont need to do it manualy.

Also follow and read Alexeys link.

Most likely the timeout error.

Could be databases.

Post version running and screenshot of percentages audit suspension and online.
Also log with the fatal line.

For unskilled SNO maybe set loglevel to error or fatal.

Thanks for your answer. Do you mean this answer:

the minimum log level to log

log.level: info

then to “error” or to “fatal”?

Thanks and kind regards,

Yes.

Error=much less log but still a lot for me

Fatal=no logs until it crashes. Im personaly fine with that.

The other option is using the logrotate script if that is on your skill level.

It is this error message and the node crashed again:

FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:169\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:161\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

Am now checking the disk with the command:

chkdsk E: /F

It must be written in capital letters otherwise it can’t find it. :wink:

Edit: How often should the defragmentation run? Once in a month is enough?

The default schedule is fine. I do not remember what Windows uses by default, never changed it.

It is weekly and I didn’t change it.

1 Like

Then perhaps your node collected enough data to slow down your disk (if it doesn’t have issues though), so you may try to increase a timeout for a writeability check in your case.

1 Like

Can you tell which line it is exactly in the config.yaml?

You need to add a new parameter, if it doesn’t exist

save the config, and restart the node.
Use a Notepad++ editor to edit the config file and do not forget to save it.

1 Like

since it will take ages to defrag anyway, i would do it once a 1TB filled up every new TB.
maybe al 2-3 month

OK did it and pasted it above the other storage2 entries.

Am now curious if it passes trough stable. The defragmentation is also still runnning but shouldn’t be a problem parallel. Right?

on healthy node no problem.
on slow node you can set it to full for less disturbing write while defrag.

atm my node with databases on ssd fragments at 0.2% per 50GB disk space used.or al 2days
so all 40-60 days defrag is ok i think. maybe 3 times a year also.

on my 7.8TB node with databases on disk. defrag will run for one whole week.

I have also like ~8 TB and with database on it. That’s why I turned it on before the defragmentation is complete. ^^

Why are we discussing defragmentation here? Disable it. Vast majority of files are smaller than sector size.

This is a bad practice, because the regular defragmentation will be shorter than if you run it once in several months especially on the storage location, where pieces moves very often (if it still has a free space).

right, it has a low priority.

on SSD the fragmentation has almost no impact on latency, since there is no mechanical moves.

I strongly disagree. Unlike ext4 the NTFS fs has a serious impact on latency due fragmentation.
So, please do not disable it for the storage location.

To support my opinion I can invite @Vadim to confirm that defragmentation has fixed an issue with disk timeouts for his setup.

1 Like