Fatal Error on my Node / Timeout after 1 min

Hello Everyone,
a week ago i got several errors on my main node. Maybe someone can help me to solve the problem.

The Node runs on Windows 10.

Thanks for your Help.

2023-03-27T05:11:25.897+0200 INFO piecestore download started {Piece ID: 7WMQ364BMF672RLAQWDD76IQ6EORNXMYXIL5PGN46I4VUT4P4JXA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: GET, Offset: 110336, Size: 469504, Remote Address: 184.104.224.98:48702}
2023-03-27T05:11:25.916+0200 ERROR services unexpected shutdown of a runner {name: piecestore:monitor, error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory, errorVerbose: piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75}
2023-03-27T05:11:26.264+0200 ERROR piecestore:cache error getting current used space: {error: context canceled; context canceled, errorVerbose: group:\n— context canceled\n— context canceled}
2023-03-27T05:11:26.345+0200 ERROR piecestore error sending hash and order limit {error: context canceled}
2023-03-27T05:11:26.345+0200 INFO piecedeleter delete piece sent to trash {Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: EW35SDZ2TKHDRKCGDXZ6J26X4ZHOIR4SSTO2P5H6E5MOUII2TQHQ}
2023-03-27T05:11:26.350+0200 INFO piecestore download canceled {Piece ID: 2IKYB4WBKJ7OXIEHFD5DK5N5HP5JX2WSOF4RLHB5UZVQQDSMEQQA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: GET_REPAIR, Offset: 0, Size: 0, Remote Address: 5.161.50.62:45688}
2023-03-27T05:11:26.588+0200 ERROR piecedeleter could not send delete piece to trash {Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Piece ID: 4QI26GFAIG6XY6IVCFDP32IENDVWVPMRTXXB6WTUZU6XL6PWBAMA, error: pieces error: pieceexpirationdb: context canceled, errorVerbose: pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75}
2023-03-27T05:11:26.694+0200 ERROR piecedeleter could not send delete piece to trash {Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: S2B3BICIMGEVACH5UN2SDFHFXKP776DHFTCE5VGCGJDAJ3ST2RAA, error: pieces error: pieceexpirationdb: context canceled, errorVerbose: pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75}
2023-03-27T05:11:27.297+0200 ERROR gracefulexit:chore error retrieving satellites. {error: satellitesdb: context canceled, errorVerbose: satellitesdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits.func1:152\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:164\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:58\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75}

2023-03-27T05:11:31.912+0200 INFO piecestore upload canceled {“Piece ID”: “ISMNJLZS6L4OR5T2KXJCGOKCQICLLTPQNGMDIZDR66NRN764NN2A”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 0, “Remote Address”: “5.161.46.76:27606”}
2023-03-27T05:11:32.058+0200 INFO piecestore uploaded {“Piece ID”: “AOLI2WHQXTTZAQ6LHMR4HGSC6KBIZZ6TMDRPOFFDPTBEXFZHXDMQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 6144, “Remote Address”: “5.161.96.123:43944”}
2023-03-27T05:11:33.474+0200 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-27T05:15:23.861+0200 INFO Configuration loaded {“Location”: “C:\Program Files\Storj\Storage Node\config.yaml”}
2023-03-27T05:15:23.893+0200 INFO Anonymized tracing enabled

i have the timeout error as well. but not the graceful exit eighter. so far no solution, after restart of the service it runs again fine for some hours. disc check done, no errors. you have 100%audit too?

tought big logfile may the cause, but its obviously not. nothing changed exept set the useable space from 9 to 10 tb, hdd has 12(10.9) with 5.5 full

funny it started also a week ago at my node…

1 Like

Don´t know what you mean.

My Logfile was to big so iI deleted it and create a new one.
I have 15 nodes and it is the only one with the problem. I Created a Script wich restarts the complete VM of this Node when the error appear.

I saw that my HDD has only 1.65 TB in use but suddenly on the HDD is like 3TB in use.

1 Like

Hello @Hogan1337,
Welcome to the forum!

This error mean that your disk have issues - storagenode cannot write on it.
I suggest to stop the storagenode service either from the Services applet or from the elevated PowerShell:

Stop-Service storagenode

then check the cable and power supply of your disk. If it’s an external USB disk, it must have an external power supply.
After that run in the PowerShell or Command prompt as an Administrator (assuming that you disk has letter D:):

chkdsk /f d:\

Perhaps you should run it several times until all errors will be fixed.
After that you may try to start the storagenode service either from the Services applet, or from the elevated PowerShell:

Start-Service storagenode
1 Like

Then you likely have the same issue. Please follow the steps above too.

1 Like

Thanks for your help

The command is chkdsk D: /f

I did it 4 times no errors has been found.

Is ist possible to regain the data stored on the node?

1 Like

As far as I understand it’s not gone yet. If there is no errors detected, then it must be a cable or power.
If you checked them (unplug and plug again while storagenode is not running), you may run your node back and see - did it help or not.
If it did not help, you probably need to replace the cable.

1 Like

I don´t think it is the Cable but I will replace it when im back home.

Just for understanding my setup:
One windows Server Ryzen 1700 with 5 internal HDDs and one m.2 SSD on it. on the SSD is the Windows System with VMWate Workstation on it. On VMWare Workstation are 5 VMS. The VM Store is on the m.2 SSD. In every VM is one HDD Mounted as E Partition. This Partition is the Data Store for the node. And on my first one I got the Problem.

1 Like

hi
i have the same problem for a week now i use a external usb drive already checkt all the kables
and did the power shell thing

2 Likes

Hmm I think I got the update to version v1.75.2 at the same time when the node starts to get the problem. But not sure. I Have serveral Nodes with this version and they don´t have an problem.

1 Like
2023-03-28T11:22:33.698+0200 ERROR piecestore upload failed {Piece ID: GRDZ3WCP2HHTQ56CJWJAMJOQHS4EKYUTLI6ES4RLRJBNQWJGUHJA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->184.104.224.99:48256: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->184.104.224.99:48256: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223, Size: 65536, Remote Address: 184.104.224.99:48256}
2023-03-28T11:22:33.699+0200 ERROR piecestore upload failed {Piece ID: XSJFRYN64SM6JJ46XO3EKZNUG5YY3E6ENWWZTVC7N6YHZGLSETPA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->216.66.40.82:30144: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->216.66.40.82:30144: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223, Size: 65536, Remote Address: 216.66.40.82:30144}
2023-03-28T11:22:33.700+0200 ERROR piecestore upload failed {Piece ID: LDC4LSWV52HGEWL6PCNYCBNIXFLKPJA6HEH3HVEO24SKP3SDNCSA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->72.52.83.202:8036: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->72.52.83.202:8036: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223, Size: 65536, Remote Address: 72.52.83.202:8036}
2023-03-28T11:22:33.700+0200 INFO piecestore upload canceled {Piece ID: GA6SS2JFHHIORGHDHMP46ASQ4FQMVGRQAZUDRKUHZFYERIPZ4JSA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, Size: 65536, Remote Address: 5.75.252.132:39876}
2023-03-28T11:22:33.847+0200 ERROR piecestore upload failed {Piece ID: NFUBUQJAGHMK6Y2XEYPIE6PXQ66VOBKJYFTSLAH7IJ7Z5NLFLEUQ, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->5.75.227.177:60432: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->5.75.227.177:60432: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223, Size: 65536, Remote Address: 5.75.227.177:60432}
2023-03-28T11:22:33.905+0200 INFO piecestore upload canceled {Piece ID: D5CQF2EC2H2IBYT2UKACTIYXQJLX5ZBLLTBUJUT55VFTB3VGPIAQ, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, Size: 0, Remote Address: 184.104.224.99:15816}
2023-03-28T11:22:33.905+0200 ERROR piecestore upload failed {Piece ID: GXEYM24LSR4PG7AV2UT64D4OSHRP22IP7JTXUNL7ZZE6GFPZIKNA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->116.202.111.169:15510: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->116.202.111.169:15510: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223, Size: 65536, Remote Address: 116.202.111.169:15510}

2023-03-28T13:25:02.622+0200 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

since my setup is kompletely different it has to be something else… but my node goes down slowly and no cure in sight. restarting the service after 1min is now configured.

Are you certain the drive isn’t full? Not just what the dashboard reports, but have you checked the drive itself?

Of course, there is enough free space. But I noticed that the values from the dashboard do not exactly match those on the hard drive.

none of the drives are full or missbehave. no screeching, just usage stopps. when the runner times out.
can use them normaly for data, no bsod, no errors from windows. drive is exclusive for storj atm.

tested my bigabyte brix last summer with high cpu load and ram tests. environment and temps normal. its just the runner who kills the background service.

also windows drive diagnosis returns no errors.

normal bc of different calculation / 1000 or/ 1024 + node should have 90% of formated capacity max,

It’s throwing an error when it tries to write to the drive.

Can you try manually writing something to the drive? Maybe just move a document over to it. Does it work?

yes moved a file with some MB while running storj: did work fine and smooth.

also restarting the service after 1m automatically works for hours.

still get suspension on 2 satelites

is it possible that the incorrectly closed connections from noise cause the runner to timeout?

Possibly, but at the moment I don’t have a lot of corresponding data from other node operators to determine if the issue is widespread or not. This error has been seen in the past, and it is usually because drives are full or have errors. Your other nodes are not having this issue, correct?