strange thing is that in the config file the writabily time out is 5m while the error happens just in 1m, how is that possible?
filestore.write-buffer-size: 128.0 KiB was the error, is set to 4MiB for the node that runs without dropouts. Iâve gone through everything again and thatâs the only parameter that was different. only the order of the parameters is partially different. now both nodes are running great with 4MiB.
now the only question is why the setting was different. I myself have not changed or entered any settings until the error occurred. Then why is one node installed with 4MiB and the other with 128KiB?
Guess standard is 128k
Since some pieces are 2-3 mb maybe this caused slowlyness.
But why is one node with 128KiB and the other with 4MiB, although I hadnât changed anything in either of them. (Both version 1.75.2.0) that would mean something was changed with your update or version, wasnât it? and a node did not get this change.
Was the # still in front or not?
no was active on both nodes so without #
only the point at which the setting was different, once directly at the beginning, and once in the middle of the config file.
More like an interval, not timeout. Please check carefully. These timeout parameters have been added only recently, so they likely missed in your config.yaml
.
However, if you added parameter for writeable check timeout to 5 minutes, but still receives errors of writeable timeout after 1 minute, this is mean that you either didnât save the file and/or did not restart the node after the change.
Perhaps the default value has changed recently.
Added to my summary post, thank you!
Whatâs device, OS, filesystem, how is disk connected, is it a SMR?
Well i am pretty sure i did restart, anyways, rolled back to 1.74.2 no problems anymore, but i wonder for the next version how can i do it. I dont think my node is picking up my config file correctly
This is equal to set a timeout to generously great value like a month or two, because the problem is not gone, itâs now hided, because checkers did not have this timeout at all and they will just hang forever instead of the crash, when they have a problem with a writeability or readability check.
My node suddenly goes offline every other day.
The only trace in log is this. Can anyone advice on solution?
2023-04-18T21:36:22.082+0200 FATAL Unrecoverable error {âerrorâ: âpiecestore monitor: timed out after 1m0s while verifying writability of storage directoryâ, âerrorVerboseâ: âpiecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:150\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:146\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75â}
See previous replys in this topic. The solution is there. You have to increase the timeout setting in 30s increments untill you get no restarts.
Hoping for help here as our Node âv1.77.0-rcâ has started behaving very strangely with it shutting down after a while. However, managed to solve it by enabling service restart on service Storj V3 Node.
Obviously not good as we lose suspension & Audit.
Previously had V.1.76.2 with the same error.
Attaching here are the last lines of the log.
Thanks for any tips/help getting around this.
Could it be an idea to downgrade?
2023-04-19T18:17:41.627+0200 INFO piecedeleter delete piece sent to trash {âSatellite IDâ: â12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDsâ, âPiece IDâ: âCZ37MKFF3IN2XONXYUPTXX6OWE75YZB7MTBV7XNI2V25â}6CLQONQ3
2023-04-19T18:17:41.651+0200 INFO piecestore upload started {âPiece IDâ: âETY2GG6XT3TUWOAISXK5HPFQK7TGX5X3IUOGP5DBJNZIO3XTXBAQâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: âPUTâ, âAvailable Spaceâ: 7570,757,907,247 Remote Address": â5.161.149.40:53872â}
2023-04-19T18:17:41.749+0200 INFO piecestore uploaded {âPiece IDâ: âETY2GG6XT3TUWOAISXK5HPFQK7TGX5X3IUOGP5DBJNZIO3XTXBAQâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: âPUT60â, âRemote Addressâ: 25 â: â5.161.149.40:53872â}
2023-04-19T18:17:42.844+0200 INFO piecedeleter delete piece sent to trash {âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âPiece IDâ: â5FG3AFF43HUGKCLS4DJD5ACSVJXDUSWLLPUKHMY3HNNTV5FRZELQâ}
2023-04-19T18:17:43.683+0200 INFO piecedeleter delete piece sent to trash {âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âPiece IDâ: âETY2GG6XT3TUWOAISXK5HPFQK7TGX5X3IUOGP5DBJNZIO3XTXBAQâ}
2023-04-19T18:17:44.050+0200 INFO piecestore upload started {âPiece IDâ: â5732ZUWXHQU2UNRFSSPTDJJZIE3FS524U33GUXV2RONC33QGBIBAâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: âSpacePUT207664098640â, âAvailableâ Remote Addressâ: â5.161.184.111:37590â}
2023-04-19T18:17:44.078+0200 INFO piecestore download started {âPiece IDâ: â52KZP5A2RBS5PLVNT7A5VSAPOKB5L4QHW3V3HIIRA3IJQQ36SGRAâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionSizeOffsetâ: âGETâ, "0 ": 8960, âRemote Addressâ: â5.161.111.220:61416â}
2023-04-19T18:17:44.203+0200 INFO piecestore uploaded {âPiece IDâ: â5732ZUWXHQU2UNRFSSPTDJJZIE3FS524U33GUXV2RONC33QGBIBAâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: âPUT3â, âRemote Address: 76â, âSizeâ: ": â5.161.184.111:37590â}
2023-04-19T18:17:44.337+0200 INFO piecestore downloaded {âPiece IDâ: â52KZP5A2RBS5PLVNT7A5VSAPOKB5L4QHW3V3HIIRA3IJQQ36SGRAâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionSizeOffsetâ: âGETâ, "0 : 8960, âRemote Addressâ: â5.161.111.220:61416â}
2023-04-19T18:17:44.403+0200 INFO piecestore download started {âPiece IDâ: â4NVKR57GLMNIN5ENWOZRYL6TTKYTFAAWMSTTSJN7ZURFQJRXOY7Aâ, âSatellite IDâ: â121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6â, âActionâ: âGET2â, âOffsetâ: â2â, âSize: 1696â. ": 2560, âRemote Addressâ: â167.235.66.196:26890â}
2023-04-19T18:17:44.602+0200 INFO piecestore downloaded {âPiece IDâ: â4NVKR57GLMNIN5ENWOZRYL6TTKYTFAAWMSTTSJN7ZURFQJRXOY7Aâ, âSatellite IDâ: â121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6â, âActionâ: âGET2â, âOffsetâ: â2â, âSizeâ: â1696â. : 2560, âRemote Addressâ: â167.235.66.196:26890â}
2023-04-19T18:17:44.758+0200 INFO piecestore download started {âPiece IDâ: âLHXKHEZJVWJAQTZASXYCXOCJXUQFE62AV6OPPN43MFNTE7KZKIQAâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: âGET_REPAIRâ, âSize: 0â, âOffsetâ: ": 6400, âRemote Addressâ: â5.161.217.169:33024â}
2023-04-19T18:17:44.904+0200 INFO piecestore downloaded {âPiece IDâ: âLHXKHEZJVWJAQTZASXYCXOCJXUQFE62AV6OPPN43MFNTE7KZKIQAâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: âGET_REPAIRâ, âSize: 0â, âOffsetâ: : 6400, âRemote Addressâ: â5.161.217.169:33024â}
2023-04-19T18:17:45.085+0200 INFO piecestore upload started {âPiece IDâ: âRAULCGNFQJ45S7EF4QLIZ3KIOVD47TWVGMKJMI26FBS35UDU55TQâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: "SpacePUT27987257987257988798798787987987987987987987999888885555 of, âAvail of of " Remote Addressâ: â5.161.184.111:37590â}
2023-04-19T18:17:45.145+0200 INFO piecestore uploaded {âPiece IDâ: âRAULCGNFQJ45S7EF4QLIZ3KIOVD47TWVGMKJMI26FBS35UDU55TQâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: â2PUTâ, âSize: Remote Address: 48â ": â5.161.184.111:37590â}
2023-04-19T18:17:45.309+0200 INFO piecestore uploaded {âPiece IDâ: â6Y6VC2ABX7H4AERMR5CQCZLJALWQYSG2VNJF3MLQ2OUI7JCC5YWAâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: â2PUTâ, âRemote Addressâ, âSizeâ: ": â38.88.241.42:13676â}
2023-04-19T18:17:45.313+0200 INFO piecestore upload started {âPiece IDâ: "NESEHTMN2UP65AONWAS2SKTQNG7CYD24SAFQPEZYLFT4TOOALX
Hi @Sharkey
This log is not helpful. Please post the entries just before crash. It should have ERROR or FATAL state shown.
At a guess it sounds like your issue may be this - Fatal Error on my Node
Hi @Regular! Super thanks for the quick response.
Here is the log text before it went offline and started up again.
Have also checked Storage space and there is about 9TB available.
Will run a chkdsk on the drive.
2023-04-19T16:10:57.843+0200 ERROR piecestore upload failed {âPiece IDâ: â4BKRV4W4R2PW2FAF5JOEUHJUWMWRYXXWN7TKH2FP3MPAATOAOEBAâ, âSatellite IDâ: â12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3Sâ, âActionâ: âcontextâ, âcancelledâ error: , âerrorVerboseâ: âcontext canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\ tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux ).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj. io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve .func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35â, âSizeâ: 6400, âRemote Addressâ: â5.161.146.178:17330â}
2023-04-19T16:10:58.094+0200 FATAL Unrecoverable error {âerrorâ: âpiecestore monitor: timed out after 1m0s while verifying writability of storage directoryâ, âerrorVerboseâ: âpiecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:163\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\ tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:155\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75â}
Yes, this is the same error. Check for options in the thread I linked - Fatal Error on my Node. Your node storage is running slowly and timed out for 1 minute whilst a write check was being performed. The disk should be fine for the data integrity itâs just slow to respond.
Well, this wrecked my node. had so many offline periods because of this, and after restart number 10 or so this one comes up
For those not fluent in Norwegian:
Cannot start service Storj V3 on local computer.
Error 1067: Process was unexpectedly closed/exited.
Even though my english is so so, maybe you have advice?
It also happened before i changed the timeout. Though i donât understand why my node should suddenly start with this with 3tb data onboard and no hickups in a long while
How about my config file not being picked up by the node? I set to 5minutes but it kept crashing after 1 minute delay not 5
Perhaps you broke config.yaml
. Whatâs last 20 lines in your logs?