Fatal Error on my Node

My node is on Windows 10 GUI. The data used to stored in my C: drive. I recently connected two 8 TB HDD drives with SATA III 6 .0 gbps Data cables, merged them into one big E: drive and move the all Storj related folder to the E: drive.

You may be using # still at the start of that config.yaml line, which nulls any effect. Additionally, augment the writable-interval - you wouldnā€™t want the software stomping over itself endlessly.
By example, try this:
-# storage2.monitor.verify-dir-writable-interval: 2m0s
-storage2.monitor.verify-dir-writable-interval: 5m0s
-# storage2.monitor.verify-dir-writable-timeout: 1m0s
-storage2.monitor.verify-dir-writable-timeout: 2m30s

And by extension, consider doing likewise with your readable-interval/timeout lines; as your new config is failing under load.

You could have 12 Gbps controllers, and SMR drives would still suck. If thatā€™s the case, consider splitting those drives again - running 2 nodes as recommended.

Some further alternatives, if theyā€™re CMR: try some caching software, larger FS clusters, defragment (shouldnā€™t be necessary as you just made this drive, but put it on weekly autopilot at least), ensure windows search and MS anti virus has an exclusion listed for that drive. Limit concurrent retain/gc/filewalkers in config, lots of posts here around various tweaks,etc. etc.

Ciao

Thanks Ciao! I will give it a try!

Bad idea. If 1 drive dies all data will be gone. Itā€™s better to stick with 1 node per disk. Search for ā€œtoolboxā€ in this forum on how to do it in Windows

1 Like

Please undo this while you can. This is not only dangerous but also have a performance impact.
Split them to a separate disks and run an own node with own unique identity on each of them. Do not clone the identity, you need to generate a new one for the second node.
See

Hi,
After defraging the HDD and updating the time storj still crashing

PS C:\Windows\system32> Get-Content f:\storagenode.log | sls fatal | select -last 5

2024-06-11T13:24:13+01:00 FATAL Unrecoverable error {ā€œerrorā€: ā€œsatellitesdb: context canceledā€, ā€œerrorVerboseā€:
ā€œsatellitesdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).SetAddressAndStatus:56\n\ts
torj.io/storj/storagenode/trust.(*Pool).Refresh:251\n\tstorj.io/storj/storagenode.(*Peer).Run:955\n\tmain.cmdRun:123\n
tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n
Loading...\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/c
obra.(Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCu
stomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(

service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78ā€}
2024-06-11T16:03:12+01:00 INFO piecestore upload started {ā€œPiece IDā€:
ā€œBAWFLGS6RJ2B5YKSBL2HQU6OAFATAL3BENJDFMPDEQSKZIR5MGIQā€, ā€œSatellite IDā€:
ā€œ1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGEā€, ā€œActionā€: ā€œPUTā€, ā€œRemote Addressā€: ā€œ109.61.92.78:33924ā€,
ā€œAvailable Spaceā€: 5429872898913}
2024-06-11T16:03:12+01:00 INFO piecestore uploaded {ā€œPiece IDā€:
ā€œBAWFLGS6RJ2B5YKSBL2HQU6OAFATAL3BENJDFMPDEQSKZIR5MGIQā€, ā€œSatellite IDā€:
ā€œ1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGEā€, ā€œActionā€: ā€œPUTā€, ā€œRemote Addressā€: ā€œ109.61.92.78:33924ā€, ā€œSizeā€:
249856}
2024-06-12T02:33:01+01:00 FATAL Unrecoverable error {ā€œerrorā€: ā€œpiecestore monitor: timed out after 3m0s while
verifying writability of storage directoryā€, ā€œerrorVerboseā€: ā€œpiecestore monitor: timed out after 3m0s while verifying
writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sy
nc2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Gro
up).Go.func1:78ā€}
2024-06-12T11:39:17+01:00 FATAL Unrecoverable error {ā€œerrorā€: ā€œpiecestore monitor: timed out after 3m0s while
verifying writability of storage directoryā€, ā€œerrorVerboseā€: ā€œpiecestore monitor: timed out after 3m0s while verifying
writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sy
nc2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Gro
up).Go.func1:78ā€}

meanwhile using robocopy to copy all of the data into another 8 TB Drive fingers are crossed this will be moved over by tomorrow morning. and will try on the new drive letā€™s see

1 Like

Then you need to increase the writeability check timeout even more.

A bit different questions but with same storj version.

Node works well ingress ā€¦egress Ā±200-400gb. but it restarting regularly during the day. I have never seen such behavior before.

grep last logs resulting this errors ā€¦ something canceled filewalker

2024-06-17T16:20:52Z	ERROR	piecestore:cache	error getting current used space: 	{"Process": "storagenode", "error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-17T16:23:21Z	ERROR	failure during run	{"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-17T16:23:30Z	ERROR	piecestore	download failed	{"Process": "storagenode", "Piece ID": "WPO2DEIQTYR5SJ6RGIUJETUVFIWLVQWITR4KZ5HJQJULDGRXCTOA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 512, "Remote Address": "109.61.92.83:49100", "error": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled", "errorVerbose": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:621\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:302\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}

I have just this log

2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "5KTGEV4TY5KZI4GJD7Y3GJTWRBAKV5CRLNW2UKRAB3HPJKNVKK7A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.74:42930", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "7UY6FLTSC2RPLEIAOCXNG5F6YHDHMYC6WXMG7YK2LOGQVWYMDJVA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.43:37074", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "46DSVVCEDVIR576DURZ2PVKLTD52U5DKQHI6QQEDZQBISRW5ZWVA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.79:53044", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "2G7FYLY63A7273YHKHA2F7PFMOQ355XWIBAEDTIU3Z2ZM75EX3SQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.205.235:46420", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "MN3RYJMPCQADSWVSIJGPUSO4MPZIENPCZBQJKSDNGCP65JBDZBBQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.213.34:58706", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "WS3IZPZINSRRE7PKE32KUFHLBVPVG7F5QNMBY2MKSBXBPKKFTIUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.37:48358", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "URASIDTCYKNGL4ZHURODTSICJ3W3Q2W5VDSHVGWMGAA444DVO7HA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT_REPAIR", "Remote Address": "199.102.71.26:47048", "Size": 0}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "COLFALI6CGZLCO77AELLQ7ZDGK7ZLERU2YUYFKQAX5YD42XYK46Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.43:46158", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "POPHW34ASS5BOD6B5ZAA6GUDIW53OKIIEWEF4RIPJDKIPB7YUCUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.65:42484", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{"Process": "storagenode", "Piece ID": "Z5B47A7QNL43ACS62VR5QMO7BGZQ5QSTXIZSJHHVNSY7TUTCZBCQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.45:36108"}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{"Process": "storagenode", "Piece ID": "CAAND2EWIHMXO7NNEZ5NOR62Z6WYGOREDANH76XAQGOZHQZEP7LQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.205.229:44362"}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "27XWB3SHCTCIXUW6SLFUQRAMIX7SOZVZEQUYIXR7JN57R3RZEAWA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.82:33520", "Size": 65536}
2024-06-18T18:13:48Z	ERROR	failure during run	{"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory
2024-06-18 18:13:48,648 INFO exited: storagenode (exit status 1; not expected)
2024-06-18 18:13:49,652 INFO spawned: 'storagenode' with pid 479
2024-06-18 18:13:49,753 WARN received SIGQUIT indicating exit request
2024-06-18 18:13:49,754 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-06-18T18:13:49Z	INFO	Got a signal from the OS: "terminated"	{"Process": "storagenode-updater"}
2024-06-18 18:13:49,757 INFO stopped: storagenode-updater (exit status 0)
2024-06-18 18:13:49,761 INFO stopped: storagenode (terminated by SIGTERM)
2024-06-18 18:13:49,762 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

Nothing more ā€¦ itā€™s restarting such way each hour I think.

You can increase that in your config.yaml or via Docker run command. Try 2m.

1 Like

I have same thing time to time on one of my servers, All nodes just turn off at one time, hapens oneā€™s a day or twice. Some time turn off part of nodes with same error
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory

Part of node on motherboard, part of nodes on sas3108 card and part of them on asmedia 10 sata board.

I do not find any other error on pc, like pci-e glitch at some point for some time.
cpu 30% load, 24GB ram 13 occupied

I checked All disks for error - no errors.

Out of interest, has this been happening for a long time or only since the stress testing began?
And are your databases on the node HDD or have you placed them on an SSD?

Only when stress began, databases on OS SSD. I added 2 TB NVME cache Yesterday, with promocache for node reading and writing, no problem since.

Yeah, these databases really are being pushed hardā€¦
As @jammerdan mentioned many times, perhaps there needs to be a rethink for the object-trackingā€¦

1 Like

I have thisā€¦

In log i see

2024-06-19T12:34:34Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: KGRFFTBTIE7BFWX73NNEOTPMFKRE5AYZ6P34LL65M3A7UDPKBBKQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.219.39:35718}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: FTFCJQX4GHHYEQC5XMOVV75VQCJSIMN4XGHAO4KX43WY5YHYOBEQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.233:53368}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: ABRDR22QP3B2LV5AILTPGDF67G3NBYBJPWFAFNYYGWB2XFGYTQWA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.225:37792}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: 5TDNZYRSHTQDMDS7RQFIY2P5WBINHDSDCFVCRBXKCQ3BLUIVKPLA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.239:59700}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: MXPETRBGWDEKO2GXDM7BD763N7AMG5YXR53WZVBASSYEAGBF7YKA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 121.127.47.26:37034}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: E6LCR35M2TY4L3MNPPIA3TB2RTND6ZDY7CWX6MYPIFNAYGYV4ZAQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.235:44154}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: UFZVO2R6BIBSRXTIDTVUYM2RAY2MFYKZUDMNLIB3HPPIARSGYHDQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 121.127.47.25:48894}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: UWKJ4CAYIX7EJTZOE2CPP7IIP7VMC2S3E64SWBS4OMQZJ37NRZJA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.84:44132}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: OW2F4UC7ZXADDRPMZ6ZINVH6DTEZLHFKJNMQQHNNNJ5OZIU3LRFQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.201.213:50256}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: Q5JBMWVPGZM22GYFRJLEHYBHZG5PZCPOATZ5KMTP3XHDVBQ6V2MQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 79.127.226.99:57122}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: ZMFFRSKRW27GRZOP3YOLQDRXSEG7RBTDCKPUV6UUUZDVJZYJEX4Q, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.68:39176}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: RHNQFWQQ3ZMCOABODGVFZLUH7FQA7QS7F3KGZESUSWIX65UZZOBA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.230:55856}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: G4TINSMWOYQALGUOWHY32TN6EOSF2VC377FE5JQDATJ5GO4F7K5Q, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 79.127.219.43:39786}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: BAWBVTQ5II7R2A5JGP23T5M6N3FFOHM66NUO737Q4DZEX6HDMINA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.201.210:34550}
2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: 2LQRIIF5ETRN2Y6BEWNRZFFLYZFPNZWHZVEN45QJQ25P43G7JO7A, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.75:51348}

and thisā€¦

2024-06-19T12:35:41Z ERROR failure during run {ā€œProcessā€: ā€œstoragenodeā€, ā€œerrorā€: ā€œpiecestore monitor: timed out after 1m0s while verifying writability of storage directoryā€, ā€œerrorVerboseā€: ā€œpiecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78ā€}
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory

My node restarts every few minutes.

Unfortunately this is not an option for every node or SNO.

This whole concept does not work in a reliable way any more with the stress. First of all I believe all these history stats need to be seperated. This reduces size and data loss in case of corruption.
And I still believe if databases are required they need to run in the fastest possible storage which is RAM so that the IOs happen there and not on the disk. I donā€™t care how to achieve this but this is as fast as it can get.
And after that back to the design table and work out something completely new. Anything else is just band aid.

Edit: If they canā€™t work something out maybe they need to write their own database. I believe the Google founders did that back then when non of the existing databases met their needs for the mass of data they wanted to process.

I must say that I also have the same problem on many nodes. No matter what I did, nothing helped. All databases are located on SSD. The developers need to do something about this, because the percentage of node suspensions is falling.

check for disk errors, also check that windows not use HDDs for page files, I discovered also that this was the case, it overload sata controller on big data ammounts

1 Like

In fact all relevant history (payment) is available from satellite. When deleting databases you only losing that stats from deleted satellites.

The only necessary database is piece_expiration.db. This one obviously can not be in RAM.