Logfile bloating, node stopping.(cache timeout)

daki82 · March 25, 2023, 7:35am

Help, my node stops with no errors logged, recently deleted logfile was 24GB big.
is storj team aware of the problem 1.75.2 (not beeing able to write logfile properly)?
maybe the error started since a month or two unnoticed.

node: 8gb ram 256gb ssd main with storj, 12tb WD @6tb full

166MB log for 5 days, im not a programmer, but can i reduce the log level to error?
struggling to keep the node alive, goes down in the mid of the night. uptimerobot running, but did not wake up. getting emails from storj first time despite previous short downtimes last year.

i just start the service again and it runs fine for half a day.

Roberto · March 25, 2023, 8:50am

You need to add to the docker command:


--log.level=error

daki82 · March 25, 2023, 9:00am

as im not using docker, what? how? where? al i know is config.yaml

Alexey · March 25, 2023, 10:30am

Hello @daki82,
Welcome to the forum!

You may add this option to the config.yaml file.

log.level: error

Please use a Notepad++ editor, and do not forget to explicitly save it (Notepad++ will not suggest to save it on close), after that restart the storagenode service either from the Services applet or from the elevated PowerShell

Restart-Service storagenode

daki82 · March 25, 2023, 11:20am

Thanks, will appy it soon and check if logfile still bloats.

Alexey · March 25, 2023, 12:38pm

You may also setup a logrotate for Windows to do it for you: LogRotateWin / Wiki / LogRotate

daki82 · March 25, 2023, 12:48pm

Since im no programmer and would need an instruction to do so, its no option for me.
maybe i google it later.
Also i think its worth fixing, when bloated log is able to kill nodes.

Alexey · March 26, 2023, 5:08am

It’s not a bug, so nothing to fix. It could be a feature request, however, since you can use an alternative methods like logrotate or PowerShell scripts like:

this feature is unlikely will be implemented, unless the Community send a pool request.

daki82 · March 26, 2023, 10:09am

Ok, logs aside, i was able to get a log when the service stops unexpected:

2023-03-26T11:18:59.411+0200 ERROR piecestore download failed {“Piece ID”: “ZDUMYU72HBZBTNQ3O2KDIE7NXZJDQ6XNMCXB2KHH7J5UGZL3W33Q”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “GET”, “Offset”: 1795840, “Size”: 311296, “Remote Address”: “188.34.176.110:24216”, “error”: “manager closed: read tcp 192.168.2.252:28967->188.34.176.110:24216: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.”, “errorVerbose”: “manager closed: read tcp 192.168.2.252:28967->188.34.176.110:24216: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223”}
2023-03-26T11:20:01.904+0200 ERROR services unexpected shutdown of a runner {“name”: “piecestore:monitor”, “error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.240+0200 ERROR piecestore:cache error getting current used space: {“error”: “context canceled; context canceled”, “errorVerbose”: “group:\n— context canceled\n— context canceled”}
2023-03-26T11:21:10.249+0200 ERROR piecestore error sending hash and order limit {“error”: “context canceled”}
2023-03-26T11:21:10.528+0200 ERROR piecedeleter could not send delete piece to trash {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “SSOXF5VWE24GDHM7P6UEPSMEPSJ4S4DRD4PP6F7UVV26GT3JV6TA”, “error”: “pieces error: pieceexpirationdb: context canceled”, “errorVerbose”: “pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.598+0200 ERROR piecedeleter could not send delete piece to trash {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “3SHNLHW7UQSSJSZLPPYEHFGZVGAYCEHJ2ACRIJRR4EZA7KS2BZAA”, “error”: “pieces error: pieceexpirationdb: context canceled”, “errorVerbose”: “pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.727+0200 ERROR piecedeleter could not send delete piece to trash {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “Z7DFZ6NCRVPZP5C6NDG6D5WXWCNVCD6LELSBHASNM7N5WHKKOTFQ”, “error”: “pieces error: pieceexpirationdb: context canceled”, “errorVerbose”: “pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.760+0200 ERROR piecedeleter could not send delete piece to trash {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “HCC2IFKRUNDMHMDIUI6LMDYEQWVX65NIOBXWEVSDFWKGTUPCZIFA”, “error”: “pieces error: pieceexpirationdb: context canceled”, “errorVerbose”: “pieces error: pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:387\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:21:10.909+0200 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-03-26T11:47:36.794+0200 ERROR piecestore failed to add bandwidth usage {“error”: “bandwidthdb: database is locked”, “errorVerbose”: “bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:843\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:530\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-03-26T11:47:36.929+0200 ERROR piecestore upload failed {“Piece ID”: “2JNBYQLWZEXVLN32KFAP5GU4JGDNOS4IAF7R6SYPMGR2O543HPPA”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”, “error”: “manager closed: read tcp 192.168.2.252:28967->149.6.140.106:20426: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.”, “errorVerbose”: “manager closed: read tcp 192.168.2.252:28967->149.6.140.106:20426: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223”, “Size”: 0, “Remote Address”: “149.6.140.106:20426”}
2023-03-26T11:47:37.413+0200 ERROR piecestore failed to add bandwidth usage {“error”: “bandwidthdb: database is locked”, “errorVerbose”: “bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:843\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:530\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-03-26T11:47:38.155+0200 ERROR piecestore failed to add bandwidth usage {“error”: “bandwidthdb: database is locked”, “errorVerbose”: “bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:843\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func7:795\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:806\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35”}
2023-03-26T11:47:38.158+0200 ERROR piecestore download failed {“Piece ID”: “WFZCXRWD6PNDP754RYZVQ3SI7WPUSB6CEET27DCRFJWVB3IE64AQ”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “GET”, “Offset”: 147968, “Size”: 250880, “Remote Address”: “5.75.241.34:57940”, “error”: “manager closed: read tcp 192.168.2.252:28967->5.75.241.34:57940: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.”, “errorVerbose”: “manager closed: read tcp 192.168.2.252:28967->5.75.241.34:57940: wsarecv: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.\n\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:183\n\tgithub.com/jtolio/noiseconn.(*Conn).Read:143\n\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:96\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:223”}

but i cant find the problem. help?

Alexey · March 26, 2023, 10:22am

Meaning either:

your disk is disconnecting from time to time (is it an external USB drive? If so, it must have an external power supply)
filesystem is corrupted
disk is dying

I recommend:

Stop the node either from the Services applet or from the elevated PowerShell

Stop-Service storagenode

Check the cable and power supply for your disk
Run as an Administrator (assuming your disk is D:\):

chkdsk /f D:\

perhaps you would need to run check disk command several times, until no errors will be shown.
After that you may try to run the storagenode service back either from the Services applet, or from the elevated PowerShell:

Start-Service storagenode

daki82 · December 19, 2023, 12:04pm

It was only the slow drive subsystem, I defragmented, increased the timeouts, set the loglevel to FATAL, and put the db s on USB-flashdrive, running fine since.