My node is on Windows 10 GUI. The data used to stored in my C: drive. I recently connected two 8 TB HDD drives with SATA III 6 .0 gbps Data cables, merged them into one big E: drive and move the all Storj related folder to the E: drive.
You may be using # still at the start of that config.yaml line, which nulls any effect. Additionally, augment the writable-interval - you wouldnāt want the software stomping over itself endlessly.
By example, try this:
-# storage2.monitor.verify-dir-writable-interval: 2m0s
-storage2.monitor.verify-dir-writable-interval: 5m0s
-# storage2.monitor.verify-dir-writable-timeout: 1m0s
-storage2.monitor.verify-dir-writable-timeout: 2m30s
And by extension, consider doing likewise with your readable-interval/timeout lines; as your new config is failing under load.
You could have 12 Gbps controllers, and SMR drives would still suck. If thatās the case, consider splitting those drives again - running 2 nodes as recommended.
Some further alternatives, if theyāre CMR: try some caching software, larger FS clusters, defragment (shouldnāt be necessary as you just made this drive, but put it on weekly autopilot at least), ensure windows search and MS anti virus has an exclusion listed for that drive. Limit concurrent retain/gc/filewalkers in config, lots of posts here around various tweaks,etc. etc.
Ciao
Thanks Ciao! I will give it a try!
Bad idea. If 1 drive dies all data will be gone. Itās better to stick with 1 node per disk. Search for ātoolboxā in this forum on how to do it in Windows
Please undo this while you can. This is not only dangerous but also have a performance impact.
Split them to a separate disks and run an own node with own unique identity on each of them. Do not clone the identity, you need to generate a new one for the second node.
See
Hi,
After defraging the HDD and updating the time storj still crashing
PS C:\Windows\system32> Get-Content f:\storagenode.log | sls fatal | select -last 5
2024-06-11T13:24:13+01:00 FATAL Unrecoverable error {āerrorā: āsatellitesdb: context canceledā, āerrorVerboseā:
āsatellitesdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).SetAddressAndStatus:56\n\ts
torj.io/storj/storagenode/trust.(*Pool).Refresh:251\n\tstorj.io/storj/storagenode.(*Peer).Run:955\n\tmain.cmdRun:123\n
tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n
Loading...\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/c
obra.(Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCu
stomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(
service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78ā}
2024-06-11T16:03:12+01:00 INFO piecestore upload started {āPiece IDā:
āBAWFLGS6RJ2B5YKSBL2HQU6OAFATAL3BENJDFMPDEQSKZIR5MGIQā, āSatellite IDā:
ā1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGEā, āActionā: āPUTā, āRemote Addressā: ā109.61.92.78:33924ā,
āAvailable Spaceā: 5429872898913}
2024-06-11T16:03:12+01:00 INFO piecestore uploaded {āPiece IDā:
āBAWFLGS6RJ2B5YKSBL2HQU6OAFATAL3BENJDFMPDEQSKZIR5MGIQā, āSatellite IDā:
ā1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGEā, āActionā: āPUTā, āRemote Addressā: ā109.61.92.78:33924ā, āSizeā:
249856}
2024-06-12T02:33:01+01:00 FATAL Unrecoverable error {āerrorā: āpiecestore monitor: timed out after 3m0s while
verifying writability of storage directoryā, āerrorVerboseā: āpiecestore monitor: timed out after 3m0s while verifying
writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sy
nc2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Gro
up).Go.func1:78ā}
2024-06-12T11:39:17+01:00 FATAL Unrecoverable error {āerrorā: āpiecestore monitor: timed out after 3m0s while
verifying writability of storage directoryā, āerrorVerboseā: āpiecestore monitor: timed out after 3m0s while verifying
writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sy
nc2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Gro
up).Go.func1:78ā}
meanwhile using robocopy to copy all of the data into another 8 TB Drive fingers are crossed this will be moved over by tomorrow morning. and will try on the new drive letās see
Then you need to increase the writeability check timeout even more.
A bit different questions but with same storj version.
Node works well ingress ā¦egress Ā±200-400gb. but it restarting regularly during the day. I have never seen such behavior before.
grep last logs resulting this errors ā¦ something canceled filewalker
2024-06-17T16:20:52Z ERROR piecestore:cache error getting current used space: {"Process": "storagenode", "error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-17T16:23:21Z ERROR failure during run {"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-17T16:23:30Z ERROR piecestore download failed {"Process": "storagenode", "Piece ID": "WPO2DEIQTYR5SJ6RGIUJETUVFIWLVQWITR4KZ5HJQJULDGRXCTOA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 512, "Remote Address": "109.61.92.83:49100", "error": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled", "errorVerbose": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:621\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:302\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
I have just this log
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "5KTGEV4TY5KZI4GJD7Y3GJTWRBAKV5CRLNW2UKRAB3HPJKNVKK7A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.74:42930", "Size": 65536}
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "7UY6FLTSC2RPLEIAOCXNG5F6YHDHMYC6WXMG7YK2LOGQVWYMDJVA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.43:37074", "Size": 65536}
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "46DSVVCEDVIR576DURZ2PVKLTD52U5DKQHI6QQEDZQBISRW5ZWVA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.79:53044", "Size": 65536}
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "2G7FYLY63A7273YHKHA2F7PFMOQ355XWIBAEDTIU3Z2ZM75EX3SQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.205.235:46420", "Size": 65536}
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "MN3RYJMPCQADSWVSIJGPUSO4MPZIENPCZBQJKSDNGCP65JBDZBBQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.213.34:58706", "Size": 65536}
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "WS3IZPZINSRRE7PKE32KUFHLBVPVG7F5QNMBY2MKSBXBPKKFTIUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.37:48358", "Size": 65536}
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "URASIDTCYKNGL4ZHURODTSICJ3W3Q2W5VDSHVGWMGAA444DVO7HA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT_REPAIR", "Remote Address": "199.102.71.26:47048", "Size": 0}
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "COLFALI6CGZLCO77AELLQ7ZDGK7ZLERU2YUYFKQAX5YD42XYK46Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.43:46158", "Size": 65536}
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "POPHW34ASS5BOD6B5ZAA6GUDIW53OKIIEWEF4RIPJDKIPB7YUCUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.65:42484", "Size": 65536}
2024-06-18T18:11:59Z INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "Z5B47A7QNL43ACS62VR5QMO7BGZQ5QSTXIZSJHHVNSY7TUTCZBCQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.45:36108"}
2024-06-18T18:11:59Z INFO piecestore upload canceled (race lost or node shutdown) {"Process": "storagenode", "Piece ID": "CAAND2EWIHMXO7NNEZ5NOR62Z6WYGOREDANH76XAQGOZHQZEP7LQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.205.229:44362"}
2024-06-18T18:11:59Z INFO piecestore upload canceled {"Process": "storagenode", "Piece ID": "27XWB3SHCTCIXUW6SLFUQRAMIX7SOZVZEQUYIXR7JN57R3RZEAWA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.82:33520", "Size": 65536}
2024-06-18T18:13:48Z ERROR failure during run {"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory
2024-06-18 18:13:48,648 INFO exited: storagenode (exit status 1; not expected)
2024-06-18 18:13:49,652 INFO spawned: 'storagenode' with pid 479
2024-06-18 18:13:49,753 WARN received SIGQUIT indicating exit request
2024-06-18 18:13:49,754 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-06-18T18:13:49Z INFO Got a signal from the OS: "terminated" {"Process": "storagenode-updater"}
2024-06-18 18:13:49,757 INFO stopped: storagenode-updater (exit status 0)
2024-06-18 18:13:49,761 INFO stopped: storagenode (terminated by SIGTERM)
2024-06-18 18:13:49,762 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
Nothing more ā¦ itās restarting such way each hour I think.
You can increase that in your config.yaml or via Docker run command. Try 2m.
I have same thing time to time on one of my servers, All nodes just turn off at one time, hapens oneās a day or twice. Some time turn off part of nodes with same error
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory
Part of node on motherboard, part of nodes on sas3108 card and part of them on asmedia 10 sata board.
I do not find any other error on pc, like pci-e glitch at some point for some time.
cpu 30% load, 24GB ram 13 occupied
I checked All disks for error - no errors.
Out of interest, has this been happening for a long time or only since the stress testing began?
And are your databases on the node HDD or have you placed them on an SSD?
Only when stress began, databases on OS SSD. I added 2 TB NVME cache Yesterday, with promocache for node reading and writing, no problem since.
Yeah, these databases really are being pushed hardā¦
As @jammerdan mentioned many times, perhaps there needs to be a rethink for the object-trackingā¦
I have thisā¦
In log i see
2024-06-19T12:34:34Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: KGRFFTBTIE7BFWX73NNEOTPMFKRE5AYZ6P34LL65M3A7UDPKBBKQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.219.39:35718} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: FTFCJQX4GHHYEQC5XMOVV75VQCJSIMN4XGHAO4KX43WY5YHYOBEQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.233:53368} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: ABRDR22QP3B2LV5AILTPGDF67G3NBYBJPWFAFNYYGWB2XFGYTQWA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.225:37792} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: 5TDNZYRSHTQDMDS7RQFIY2P5WBINHDSDCFVCRBXKCQ3BLUIVKPLA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.239:59700} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: MXPETRBGWDEKO2GXDM7BD763N7AMG5YXR53WZVBASSYEAGBF7YKA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 121.127.47.26:37034} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: E6LCR35M2TY4L3MNPPIA3TB2RTND6ZDY7CWX6MYPIFNAYGYV4ZAQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.235:44154} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: UFZVO2R6BIBSRXTIDTVUYM2RAY2MFYKZUDMNLIB3HPPIARSGYHDQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 121.127.47.25:48894} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: UWKJ4CAYIX7EJTZOE2CPP7IIP7VMC2S3E64SWBS4OMQZJ37NRZJA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.84:44132} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: OW2F4UC7ZXADDRPMZ6ZINVH6DTEZLHFKJNMQQHNNNJ5OZIU3LRFQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.201.213:50256} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: Q5JBMWVPGZM22GYFRJLEHYBHZG5PZCPOATZ5KMTP3XHDVBQ6V2MQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 79.127.226.99:57122} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: ZMFFRSKRW27GRZOP3YOLQDRXSEG7RBTDCKPUV6UUUZDVJZYJEX4Q, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.68:39176} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: RHNQFWQQ3ZMCOABODGVFZLUH7FQA7QS7F3KGZESUSWIX65UZZOBA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.230:55856} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: G4TINSMWOYQALGUOWHY32TN6EOSF2VC377FE5JQDATJ5GO4F7K5Q, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 79.127.219.43:39786} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: BAWBVTQ5II7R2A5JGP23T5M6N3FFOHM66NUO737Q4DZEX6HDMINA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.201.210:34550} 2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: 2LQRIIF5ETRN2Y6BEWNRZFFLYZFPNZWHZVEN45QJQ25P43G7JO7A, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.75:51348}
and thisā¦
2024-06-19T12:35:41Z ERROR failure during run {āProcessā: āstoragenodeā, āerrorā: āpiecestore monitor: timed out after 1m0s while verifying writability of storage directoryā, āerrorVerboseā: āpiecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78ā}
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory
My node restarts every few minutes.
Unfortunately this is not an option for every node or SNO.
This whole concept does not work in a reliable way any more with the stress. First of all I believe all these history stats need to be seperated. This reduces size and data loss in case of corruption.
And I still believe if databases are required they need to run in the fastest possible storage which is RAM so that the IOs happen there and not on the disk. I donāt care how to achieve this but this is as fast as it can get.
And after that back to the design table and work out something completely new. Anything else is just band aid.
Edit: If they canāt work something out maybe they need to write their own database. I believe the Google founders did that back then when non of the existing databases met their needs for the mass of data they wanted to process.
I must say that I also have the same problem on many nodes. No matter what I did, nothing helped. All databases are located on SSD. The developers need to do something about this, because the percentage of node suspensions is falling.
check for disk errors, also check that windows not use HDDs for page files, I discovered also that this was the case, it overload sata controller on big data ammounts
In fact all relevant history (payment) is available from satellite. When deleting databases you only losing that stats from deleted satellites.
The only necessary database is piece_expiration.db. This one obviously can not be in RAM.