Fatal Error on my Node / Timeout after 1 min

Stubbsey · January 31, 2024, 6:45pm

maybe but how do i know when file walker is timed to run?

daki82 · January 31, 2024, 6:52pm

The question is, why is the disk slow?

could be fragmentation, big MFT, or many others, including bad disk.

How to workaround and icrease the timeouts: its in the other post.

Alexey · February 1, 2024, 4:03am

Stop the node
Check and fix errors on the disk
Run a defragmentation for this disk
Enable an automatic defragmentation back, if you disabled it (it’s enabled by default)
Start the node

If the problem would occur again

Slowmotion · February 8, 2024, 2:34pm

Hi there.

I’m running a a node in a windows 10 PC. Since last week the storage node service keeps stopping constantly. I can restart it but an hour later stops. These are the last three lines in the storagenode.log

2024-02-08T14:55:45+01:00 INFO piecestore upload canceled {“Piece ID”: “4ZZJQZSJO6M2OJK4OGSK6ZX5EYLKSD6VTIVFCDYZQARUYUBN3VAQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 65536, “Remote Address”: “5.161.176.200:54238”}

2024-02-08T14:55:45+01:00 INFO piecestore upload canceled (race lost or node shutdown) {“Piece ID”: “JLAT7KP4EW6SKWGCE3MXPDFH2VHKYARBJKJWCN56QQ46G4HQDTKQ”}

2024-02-08T14:55:45+01:00 INFO piecestore upload canceled (race lost or node shutdown) {“Piece ID”: “E2WPLGMSHGQZMITMPGC2VI4X3BF4UEAFUDR2BESREAVN3OTSWJZA”}

2024-02-08T14:55:45+01:00 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying readability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:152\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:141\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

Any thoughts?

Thank you.

Roberto · February 8, 2024, 2:57pm

aseegy · April 22, 2024, 6:54am

Hello!

I am running version v1.101.3 on Windows 11.

The node is relatively new has a few months of life.

The 4TB Western Digital HDD is external and connected via USB to the PC. I have verified the HDD for issues with diagnostic tool Victoria but it is fine.

2024-04-21T00:38:01+02:00	ERROR	failure during run	{"error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-04-21T00:38:01+02:00	FATAL	Unrecoverable error	{"error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

2024-04-22T07:34:17+02:00	ERROR	failure during run	{"error": "database is locked"}
2024-04-22T07:34:17+02:00	FATAL	Unrecoverable error	{"error": "database is locked"}

Any ideas?

Roxor · April 22, 2024, 12:26pm

That looks like the drive is going read-only: causing Storj to throw those first errors (or something is causing writes to take over a minute). I’d expect to see entries in your Windows Event Logs around the same time.

aseegy · April 22, 2024, 8:27pm

It seems to be working now: at least for the whole day without further issues Before it was happening rather often, multiple times during the day. I did not do anything special besides restarting windows.

Let’s see how it goes.

snorkel · April 22, 2024, 8:44pm

Old problem:
https://forum.storj.io/t/fatal-error-on-my-node/22052

Alexey · April 23, 2024, 7:58am

and the same solution:

Hjallisharkimo · May 1, 2024, 8:57am

Hey! im having a problem with 1 of my nodes atm. it updated to the newest version 1.101.3 and after that it wont start saying this

2024-05-01T11:51:16+03:00	INFO	Current binary version	{Service: storagenode-updater, Version: v1.101.3}
2024-05-01T11:51:16+03:00	INFO	New version is being rolled out but hasn’t made it to this node yet	{Service: storagenode-updater}

Any ideas?

Knowledge · May 1, 2024, 3:18pm

Does it provide any errors when starting? Can you search your logs for the word fatal and see what comes up?

Alexey · May 2, 2024, 6:37am

Hello @Hjallisharkimo,
Welcome to the forum!

You showed logs from the storagenode-updater, not from storagenode.
But I guess you are on Windows GUI and have a duplicated keys in your config.yaml and your node is stopping, am I right?
If so, you may check this:

This one

Is not a problem actually. You need to wait, until the release would be available for your NodeID.

hewicker · May 17, 2024, 12:31pm

I have a similar problem on my node:

2024-05-17T12:06:03Z	ERROR	services	unexpected shutdown of a runner	{"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-05-17T12:06:03Z	INFO	lazyfilewalker.used-space-filewalker	subprocess exited with status	{"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "status": -1, "error": "signal: killed"}
2024-05-17T12:06:03Z	ERROR	pieces	failed to lazywalk space used by satellite	{"Process": "storagenode", "error": "lazyfilewalker: signal: killed", "errorVerbose": "lazyfilewalker: signal: killed\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:85\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:704\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-05-17T12:06:03Z	INFO	lazyfilewalker.used-space-filewalker	starting subprocess	{"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-05-17T12:06:03Z	ERROR	lazyfilewalker.used-space-filewalker	failed to start subprocess	{"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "context canceled"}
2024-05-17T12:06:03Z	ERROR	pieces	failed to lazywalk space used by satellite	{"Process": "storagenode", "error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:704\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-05-17T12:06:18Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "retain"}
2024-05-17T12:06:18Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "piecestore:cache"}
2024-05-17T12:06:18Z	WARN	servers	service takes long to shutdown	{"Process": "storagenode", "name": "server"}
2024-05-17T12:06:18Z	INFO	services	slow shutdown	{"Process": "storagenode", "stack": "goroutine 1078 [running]:

After that there is a very long stack trace, I can provide a full log if that helps. I am running a docker node on my Synology RS1221+.

nerdatwork · May 17, 2024, 12:59pm

Your node hit a timeout while checking writability. Try to create a file with content on the drive to check.

Shahzada · June 8, 2024, 3:23pm

Having weird problem in last 2 days.
storj PC stay on as that run my pihole vm, there no issues with it or network seems to be on itself
Storj went offline 2 days ago in the night time woke up checked and put it on
Rebooted the machine yesterday as it went offline, it was offline last night and earlier went offline again

my network itself is on as I have been online the whole day. this

don’t think its resource issue, any suggestion i can provide the logs so someone can advise?

shows suspension at 95% as well

nerdatwork · June 8, 2024, 3:52pm

The scores will fix themselves over time. Check your logs when it went down to get a better picture of what happened. You could try searching for fatal errors in the log.

Shahzada · June 8, 2024, 4:09pm

Toyoo · June 8, 2024, 4:18pm

Have you tried another editor?

Shahzada · June 8, 2024, 5:03pm

nothing