All nodes crash at the same time

Hey so I set up few nodes on a new pc and I see this thing happening everyday on all 3 nodes 2 disks are used and 1 brand new and all of them crash at the same time
all of them with same fatal error

2024-09-04T19:57:56+03:00 ERROR failure during run {error: piecestore monitor: timed out after 1m0s while verifying readability of storage directory, errorVerbose: piecestore monitor: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:153\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:140\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78}
2024-09-04T19:57:56+03:00 FATAL Unrecoverable error {error: piecestore monitor: timed out after 1m0s while verifying readability of storage directory, errorVerbose: piecestore monitor: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:153\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:140\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78}
I know Alexey will say disks are too slow but there is nothing else running on them they running on almost 0% all day after they finish filewalkers
this is windows btw.

Well, it says it timed out waiting for a response back from the drive. So, the drive didn’t return the data in one minute of time. Perhaps your drive is going to sleep at that time due to some power setting? Or your drive is indeed slow.

1 Like

Something in your pc (assuming the disks are internal) is a bottleneck. You could check Process Monitor (from MS) to check the I/O and/or CPU/RAM usage. That might give you some clues.

But on all nodes at the same time not like 10minutes difference, same minute same second.
Sleep settings set to never sleep too

There are only two devices that can freeze/seize up Windows, GPU driver & HDD. In case of the latter, you may not necessarily notice your entire disk subsytem to be frozen, as long as there is ram the kernel will continue to service, system will appear fine - but upon any disk access, will freeze until the bus /flush/clears/reconnects.

#1. Check your OS drive for errors;
#2. Check your three remaining drives, any one drive can freeze a system.
#3. If any of those are dynamic .vhdx, and you have previously expanded any volumes on them.

2 cents

1 Like

I did have few bluescreens before that might be it, and my taskbar doesnt work right now aswell…

Blue screen, be nice to know the specific error. Nevertheless, DISM then SFC scan.

it was karnel power I think might be ram might be psu but I only got 1 bluescreen I think

Task Manager then re-start explorer.exe tasks.

Do DISM online repair first, then SFC /scannow after that, on your OS drive.

GL

1 Like

ran chkdsk and ut didnt find anything

With scannow it did find corrupt files, lets see if that works. Thank you.

If you don’t do DISM repair first, even though SFC finds errors it may not solve your problems. If you don’t go back and do DISM online repair, then SFC. The DISM repairs the repository that SFC uses to do it’s repairs.

But sounds good enough for now… :slight_smile:

dism check didnt find any errors.

2 Likes

You may find more information here,

my tip is higher timeouts and defragmentation of the mft.
temporarily(?) loglevel to fatal

oh, and maybe update bios, network and mainboard driver.

My steps would be:

  1. CHKDSK
  2. Make sure the drive had enough power / disable APM
  3. Disable AAM or set to performant node, if the disk supports it
  4. Change to badger cache.
  5. Increase timeout to 5m0s or something

To be sure: are all nodes running on the same disk?

You see its a diffrent problem here all nodes crash at the same time with same error. This had to be something else and in my case sfc /scannow helped and nodes didnt crash in 2 days.

1 Like

sfc /scannow helped while chkdsk didnt find anything wrong and I do have my pc on power saver. My cpu goes down to 1.5 ghz sometime that might be too slow.

1 Like

Yeah, so I think it might be underpower. Si hoe are they powered? How many disks/ hoe many Watt?
And are all nodes running at different drives?

Don’t think so, have about 35 nodes on one CPU (N100) idling 20% of the time.