Fatal Error on my Node / Timeout after 1 min

Hogan1337 · March 27, 2023, 6:36am

Hello Everyone,
a week ago i got several errors on my main node. Maybe someone can help me to solve the problem.

The Node runs on Windows 10.

Thanks for your Help.

2023-03-27T05:11:25.897+0200	INFO	piecestore	download started	{Piece ID: 7WMQ364BMF672RLAQWDD76IQ6EORNXMYXIL5PGN46I4VUT4P4JXA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: GET, Offset: 110336, Size: 469504, Remote Address: 184.104.224.98:48702}
2023-03-27T05:11:25.916+0200	ERROR	services	unexpected shutdown of a runner	{name: piecestore:monitor, error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory, errorVerbose: piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(Service).Run.func2.1:150\n\tstorj.io/common/sync2.(Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(Group).Go.func1:75}
2023-03-27T05:11:26.264+0200	ERROR	piecestore:cache	error getting current used space:	{error: context canceled; context canceled, errorVerbose: group:\n— context canceled\n— context canceled}
2023-03-27T05:11:26.345+0200	ERROR	piecestore	error sending hash and order limit	{error: context canceled}
2023-03-27T05:11:26.345+0200	INFO	piecedeleter	delete piece sent to trash	{Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: EW35SDZ2TKHDRKCGDXZ6J26X4ZHOIR4SSTO2P5H6E5MOUII2TQHQ}
2023-03-27T05:11:26.350+0200	INFO	piecestore	download canceled	{Piece ID: 2IKYB4WBKJ7OXIEHFD5DK5N5HP5JX2WSOF4RLHB5UZVQQDSMEQQA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: GET_REPAIR, Offset: 0, Size: 0, Remote Address: 5.161.50.62:45688}
2023-03-27T05:11:26.588+0200	ERROR	piecedeleter	could not send delete piece to trash	{Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Piece ID: 4QI26GFAIG6XY6IVCFDP32IENDVWVPMRTXXB6WTUZU6XL6PWBAMA, error: pieces error: pieceexpirationdb: context canceled, errorVerbose: pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(Group).Go.func1:75}
2023-03-27T05:11:26.694+0200	ERROR	piecedeleter	could not send delete piece to trash	{Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: S2B3BICIMGEVACH5UN2SDFHFXKP776DHFTCE5VGCGJDAJ3ST2RAA, error: pieces error: pieceexpirationdb: context canceled, errorVerbose: pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(Group).Go.func1:75}
2023-03-27T05:11:27.297+0200	ERROR	gracefulexit:chore	error retrieving satellites.	{error: satellitesdb: context canceled, errorVerbose: satellitesdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(satellitesDB).ListGracefulExits.func1:152\n\tstorj.io/storj/storagenode/storagenodedb.(satellitesDB).ListGracefulExits:164\n\tstorj.io/storj/storagenode/gracefulexit.(Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(Chore).AddMissing:58\n\tstorj.io/common/sync2.(Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(Chore).Run:51\n\tstorj.io/storj/private/lifecycle.(Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75}

2023-03-27T05:11:31.912+0200 INFO piecestore upload canceled {“Piece ID”: “ISMNJLZS6L4OR5T2KXJCGOKCQICLLTPQNGMDIZDR66NRN764NN2A”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 0, “Remote Address”: “5.161.46.76:27606”}
2023-03-27T05:11:32.058+0200 INFO piecestore uploaded {“Piece ID”: “AOLI2WHQXTTZAQ6LHMR4HGSC6KBIZZ6TMDRPOFFDPTBEXFZHXDMQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT”, “Size”: 6144, “Remote Address”: “5.161.96.123:43944”}
2023-03-27T05:11:33.474+0200 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-27T05:15:23.861+0200 INFO Configuration loaded {“Location”: “C:\Program Files\Storj\Storage Node\config.yaml”}
2023-03-27T05:15:23.893+0200 INFO Anonymized tracing enabled

daki82 · March 27, 2023, 9:07am

i have the timeout error as well. but not the graceful exit eighter. so far no solution, after restart of the service it runs again fine for some hours. disc check done, no errors. you have 100%audit too?

tought big logfile may the cause, but its obviously not. nothing changed exept set the useable space from 9 to 10 tb, hdd has 12(10.9) with 5.5 full

funny it started also a week ago at my node…

Hogan1337 · March 27, 2023, 9:53am

Don´t know what you mean.

My Logfile was to big so iI deleted it and create a new one.
I have 15 nodes and it is the only one with the problem. I Created a Script wich restarts the complete VM of this Node when the error appear.

I saw that my HDD has only 1.65 TB in use but suddenly on the HDD is like 3TB in use.

Alexey · March 28, 2023, 2:49am

Hello @Hogan1337,
Welcome to the forum!

This error mean that your disk have issues - storagenode cannot write on it.
I suggest to stop the storagenode service either from the Services applet or from the elevated PowerShell:

Stop-Service storagenode

then check the cable and power supply of your disk. If it’s an external USB disk, it must have an external power supply.
After that run in the PowerShell or Command prompt as an Administrator (assuming that you disk has letter D:):

chkdsk /f d:\

Perhaps you should run it several times until all errors will be fixed.
After that you may try to start the storagenode service either from the Services applet, or from the elevated PowerShell:

Start-Service storagenode

Alexey · March 28, 2023, 2:51am

Then you likely have the same issue. Please follow the steps above too.

Hogan1337 · March 28, 2023, 7:47am

Thanks for your help

The command is chkdsk D: /f

I did it 4 times no errors has been found.

Is ist possible to regain the data stored on the node?

Alexey · March 28, 2023, 8:00am

As far as I understand it’s not gone yet. If there is no errors detected, then it must be a cable or power.
If you checked them (unplug and plug again while storagenode is not running), you may run your node back and see - did it help or not.
If it did not help, you probably need to replace the cable.

Hogan1337 · March 28, 2023, 8:17am

I don´t think it is the Cable but I will replace it when im back home.

Just for understanding my setup:
One windows Server Ryzen 1700 with 5 internal HDDs and one m.2 SSD on it. on the SSD is the Windows System with VMWate Workstation on it. On VMWare Workstation are 5 VMS. The VM Store is on the m.2 SSD. In every VM is one HDD Mounted as E Partition. This Partition is the Data Store for the node. And on my first one I got the Problem.

janmascha · March 28, 2023, 8:59am

hi
i have the same problem for a week now i use a external usb drive already checkt all the kables
and did the power shell thing

Hogan1337 · March 28, 2023, 9:07am

Hmm I think I got the update to version v1.75.2 at the same time when the node starts to get the problem. But not sure. I Have serveral Nodes with this version and they don´t have an problem.

Hogan1337 · March 28, 2023, 9:29am

2023-03-28T11:22:33.698+0200	ERROR	piecestore	upload failed	{Piece ID: GRDZ3WCP2HHTQ56CJWJAMJOQHS4EKYUTLI6ES4RLRJBNQWJGUHJA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->184.104.224.99:48256: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->184.104.224.99:48256: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(Conn).Read:143\n\tstorj.io/drpc/drpcwire.(Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(Manager).manageReader:223, Size: 65536, Remote Address: 184.104.224.99:48256}
2023-03-28T11:22:33.699+0200	ERROR	piecestore	upload failed	{Piece ID: XSJFRYN64SM6JJ46XO3EKZNUG5YY3E6ENWWZTVC7N6YHZGLSETPA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->216.66.40.82:30144: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->216.66.40.82:30144: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(Conn).Read:143\n\tstorj.io/drpc/drpcwire.(Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(Manager).manageReader:223, Size: 65536, Remote Address: 216.66.40.82:30144}
2023-03-28T11:22:33.700+0200	ERROR	piecestore	upload failed	{Piece ID: LDC4LSWV52HGEWL6PCNYCBNIXFLKPJA6HEH3HVEO24SKP3SDNCSA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->72.52.83.202:8036: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->72.52.83.202:8036: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(Conn).Read:143\n\tstorj.io/drpc/drpcwire.(Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(Manager).manageReader:223, Size: 65536, Remote Address: 72.52.83.202:8036}
2023-03-28T11:22:33.700+0200	INFO	piecestore	upload canceled	{Piece ID: GA6SS2JFHHIORGHDHMP46ASQ4FQMVGRQAZUDRKUHZFYERIPZ4JSA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, Size: 65536, Remote Address: 5.75.252.132:39876}
2023-03-28T11:22:33.847+0200	ERROR	piecestore	upload failed	{Piece ID: NFUBUQJAGHMK6Y2XEYPIE6PXQ66VOBKJYFTSLAH7IJ7Z5NLFLEUQ, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->5.75.227.177:60432: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->5.75.227.177:60432: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(Conn).Read:143\n\tstorj.io/drpc/drpcwire.(Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(Manager).manageReader:223, Size: 65536, Remote Address: 5.75.227.177:60432}
2023-03-28T11:22:33.905+0200	INFO	piecestore	upload canceled	{Piece ID: D5CQF2EC2H2IBYT2UKACTIYXQJLX5ZBLLTBUJUT55VFTB3VGPIAQ, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, Size: 0, Remote Address: 184.104.224.99:15816}
2023-03-28T11:22:33.905+0200	ERROR	piecestore	upload failed	{Piece ID: GXEYM24LSR4PG7AV2UT64D4OSHRP22IP7JTXUNL7ZZE6GFPZIKNA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: manager closed: read tcp 10.8.0.2:28976->116.202.111.169:15510: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen., errorVerbose: manager closed: read tcp 10.8.0.2:28976->116.202.111.169:15510: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(Conn).Read:143\n\tstorj.io/drpc/drpcwire.(Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(Manager).manageReader:223, Size: 65536, Remote Address: 116.202.111.169:15510}

Hogan1337 · March 28, 2023, 11:43am

2023-03-28T13:25:02.622+0200 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

daki82 · March 28, 2023, 12:14pm

since my setup is kompletely different it has to be something else… but my node goes down slowly and no cure in sight. restarting the service after 1min is now configured.

Knowledge · March 28, 2023, 1:12pm

Are you certain the drive isn’t full? Not just what the dashboard reports, but have you checked the drive itself?

Hogan1337 · March 28, 2023, 1:46pm

Of course, there is enough free space. But I noticed that the values from the dashboard do not exactly match those on the hard drive.

daki82 · March 28, 2023, 1:47pm

none of the drives are full or missbehave. no screeching, just usage stopps. when the runner times out.
can use them normaly for data, no bsod, no errors from windows. drive is exclusive for storj atm.

tested my bigabyte brix last summer with high cpu load and ram tests. environment and temps normal. its just the runner who kills the background service.

also windows drive diagnosis returns no errors.

daki82 · March 28, 2023, 1:50pm

normal bc of different calculation / 1000 or/ 1024 + node should have 90% of formated capacity max,

Knowledge · March 28, 2023, 2:34pm

It’s throwing an error when it tries to write to the drive.

Can you try manually writing something to the drive? Maybe just move a document over to it. Does it work?

daki82 · March 28, 2023, 2:56pm

yes moved a file with some MB while running storj: did work fine and smooth.

also restarting the service after 1m automatically works for hours.

still get suspension on 2 satelites

is it possible that the incorrectly closed connections from noise cause the runner to timeout?

Knowledge · March 28, 2023, 3:00pm

Possibly, but at the moment I don’t have a lot of corresponding data from other node operators to determine if the issue is widespread or not. This error has been seen in the past, and it is usually because drives are full or have errors. Your other nodes are not having this issue, correct?