Logfile bloating, node stopping.(cache timeout)

Help, my node stops with no errors logged, recently deleted logfile was 24GB big.
is storj team aware of the problem 1.75.2 (not beeing able to write logfile properly)?
maybe the error started since a month or two unnoticed.

node: 8gb ram 256gb ssd main with storj, 12tb WD @6tb full

166MB log for 5 days, im not a programmer, but can i reduce the log level to error?
struggling to keep the node alive, goes down in the mid of the night. uptimerobot running, but did not wake up. getting emails from storj first time despite previous short downtimes last year.

i just start the service again and it runs fine for half a day.

You need to add to the docker command:


--log.level=error

as im not using docker, what? how? where? al i know is config.yaml

Hello @daki82,
Welcome to the forum!

You may add this option to the config.yaml file.

log.level: error

Please use a Notepad++ editor, and do not forget to explicitly save it (Notepad++ will not suggest to save it on close), after that restart the storagenode service either from the Services applet or from the elevated PowerShell

Restart-Service storagenode
1 Like

Thanks, will appy it soon and check if logfile still bloats.

You may also setup a logrotate for Windows to do it for you: LogRotateWin / Wiki / LogRotate

Since im no programmer and would need an instruction to do so, its no option for me.
maybe i google it later.
Also i think its worth fixing, when bloated log is able to kill nodes.

It’s not a bug, so nothing to fix. It could be a feature request, however, since you can use an alternative methods like logrotate or PowerShell scripts like:

this feature is unlikely will be implemented, unless the Community send a pool request.

Ok, logs aside, i was able to get a log when the service stops unexpected:

2023-03-26T11:18:59.411+0200 ERROR piecestore download failed {“Piece ID”: “ZDUMYU72HBZBTNQ3O2KDIE7NXZJDQ6XNMCXB2KHH7J5UGZL3W33Q”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “GET”, “Offset”: 1795840, “Size”: 311296, “Remote Address”: “188.34.176.110:24216”, “error”: “manager closed: read tcp 192.168.2.252:28967->188.34.176.110:24216: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.”, “errorVerbose”: “manager closed: read tcp 192.168.2.252:28967->188.34.176.110:24216: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223”}
2023-03-26T11:20:01.904+0200 ERROR services unexpected shutdown of a runner {“name”: “piecestore:monitor”, “error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.240+0200 ERROR piecestore:cache error getting current used space: {“error”: “context canceled; context canceled”, “errorVerbose”: “group:\n— context canceled\n— context canceled”}
2023-03-26T11:21:10.249+0200 ERROR piecestore error sending hash and order limit {“error”: “context canceled”}
2023-03-26T11:21:10.528+0200 ERROR piecedeleter could not send delete piece to trash {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “SSOXF5VWE24GDHM7P6UEPSMEPSJ4S4DRD4PP6F7UVV26GT3JV6TA”, “error”: “pieces error: pieceexpirationdb: context canceled”, “errorVerbose”: “pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.598+0200 ERROR piecedeleter could not send delete piece to trash {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “3SHNLHW7UQSSJSZLPPYEHFGZVGAYCEHJ2ACRIJRR4EZA7KS2BZAA”, “error”: “pieces error: pieceexpirationdb: context canceled”, “errorVerbose”: “pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.727+0200 ERROR piecedeleter could not send delete piece to trash {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “Z7DFZ6NCRVPZP5C6NDG6D5WXWCNVCD6LELSBHASNM7N5WHKKOTFQ”, “error”: “pieces error: pieceexpirationdb: context canceled”, “errorVerbose”: “pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.760+0200 ERROR piecedeleter could not send delete piece to trash {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “HCC2IFKRUNDMHMDIUI6LMDYEQWVX65NIOBXWEVSDFWKGTUPCZIFA”, “error”: “pieces error: pieceexpirationdb: context canceled”, “errorVerbose”: “pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.909+0200 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:47:36.794+0200 ERROR piecestore failed to add bandwidth usage {“error”: “bandwidthdb: database is locked”, “errorVerbose”: “bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:843\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:530\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-03-26T11:47:36.929+0200 ERROR piecestore upload failed {“Piece ID”: “2JNBYQLWZEXVLN32KFAP5GU4JGDNOS4IAF7R6SYPMGR2O543HPPA”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”, “error”: “manager closed: read tcp 192.168.2.252:28967->149.6.140.106:20426: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.”, “errorVerbose”: “manager closed: read tcp 192.168.2.252:28967->149.6.140.106:20426: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223”, “Size”: 0, “Remote Address”: “149.6.140.106:20426”}
2023-03-26T11:47:37.413+0200 ERROR piecestore failed to add bandwidth usage {“error”: “bandwidthdb: database is locked”, “errorVerbose”: “bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:843\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:530\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-03-26T11:47:38.155+0200 ERROR piecestore failed to add bandwidth usage {“error”: “bandwidthdb: database is locked”, “errorVerbose”: “bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:843\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func7:795\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:806\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-03-26T11:47:38.158+0200 ERROR piecestore download failed {“Piece ID”: “WFZCXRWD6PNDP754RYZVQ3SI7WPUSB6CEET27DCRFJWVB3IE64AQ”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “GET”, “Offset”: 147968, “Size”: 250880, “Remote Address”: “5.75.241.34:57940”, “error”: “manager closed: read tcp 192.168.2.252:28967->5.75.241.34:57940: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.”, “errorVerbose”: “manager closed: read tcp 192.168.2.252:28967->5.75.241.34:57940: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223”}

but i cant find the problem. help?

Meaning either:

  • your disk is disconnecting from time to time (is it an external USB drive? If so, it must have an external power supply)
  • filesystem is corrupted
  • disk is dying

I recommend:

  1. Stop the node either from the Services applet or from the elevated PowerShell
Stop-Service storagenode
  1. Check the cable and power supply for your disk
  2. Run as an Administrator (assuming your disk is D:\):
chkdsk /f D:\

perhaps you would need to run check disk command several times, until no errors will be shown.
After that you may try to run the storagenode service back either from the Services applet, or from the elevated PowerShell:

Start-Service storagenode

It was only the slow drive subsystem, I defragmented, increased the timeouts, set the loglevel to FATAL, and put the db s on USB-flashdrive, running fine since.

1 Like