Fatal Error on my Node

ao6ruru038 · June 10, 2024, 8:30pm

My node is on Windows 10 GUI. The data used to stored in my C: drive. I recently connected two 8 TB HDD drives with SATA III 6 .0 gbps Data cables, merged them into one big E: drive and move the all Storj related folder to the E: drive.

Julio · June 10, 2024, 8:57pm

You may be using # still at the start of that config.yaml line, which nulls any effect. Additionally, augment the writable-interval - you wouldn’t want the software stomping over itself endlessly.
By example, try this:
-# storage2.monitor.verify-dir-writable-interval: 2m0s
-storage2.monitor.verify-dir-writable-interval: 5m0s
-# storage2.monitor.verify-dir-writable-timeout: 1m0s
-storage2.monitor.verify-dir-writable-timeout: 2m30s

And by extension, consider doing likewise with your readable-interval/timeout lines; as your new config is failing under load.

You could have 12 Gbps controllers, and SMR drives would still suck. If that’s the case, consider splitting those drives again - running 2 nodes as recommended.

Some further alternatives, if they’re CMR: try some caching software, larger FS clusters, defragment (shouldn’t be necessary as you just made this drive, but put it on weekly autopilot at least), ensure windows search and MS anti virus has an exclusion listed for that drive. Limit concurrent retain/gc/filewalkers in config, lots of posts here around various tweaks,etc. etc.

Ciao

ao6ruru038 · June 10, 2024, 9:28pm

Thanks Ciao! I will give it a try!

donald.m.motsinger · June 10, 2024, 10:11pm

Bad idea. If 1 drive dies all data will be gone. It’s better to stick with 1 node per disk. Search for “toolbox” in this forum on how to do it in Windows

Alexey · June 11, 2024, 3:05am

Please undo this while you can. This is not only dangerous but also have a performance impact.
Split them to a separate disks and run an own node with own unique identity on each of them. Do not clone the identity, you need to generate a new one for the second node.
See

Shahzada · June 12, 2024, 9:39pm

Hi,
After defraging the HDD and updating the time storj still crashing

PS C:\Windows\system32> Get-Content f:\storagenode.log | sls fatal | select -last 5

2024-06-11T13:24:13+01:00 FATAL Unrecoverable error {“error”: “satellitesdb: context canceled”, “errorVerbose”:
“satellitesdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).SetAddressAndStatus:56\n\ts
torj.io/storj/storagenode/trust.(*Pool).Refresh:251\n\tstorj.io/storj/storagenode.(*Peer).Run:955\n\tmain.cmdRun:123\n
tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n
Loading...\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/c
obra.(Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCu
stomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(
service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78”}
2024-06-11T16:03:12+01:00 INFO piecestore upload started {“Piece ID”:
“BAWFLGS6RJ2B5YKSBL2HQU6OAFATAL3BENJDFMPDEQSKZIR5MGIQ”, “Satellite ID”:
“1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “PUT”, “Remote Address”: “109.61.92.78:33924”,
“Available Space”: 5429872898913}
2024-06-11T16:03:12+01:00 INFO piecestore uploaded {“Piece ID”:
“BAWFLGS6RJ2B5YKSBL2HQU6OAFATAL3BENJDFMPDEQSKZIR5MGIQ”, “Satellite ID”:
“1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “PUT”, “Remote Address”: “109.61.92.78:33924”, “Size”:
249856}
2024-06-12T02:33:01+01:00 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 3m0s while
verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 3m0s while verifying
writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sy
nc2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Gro
up).Go.func1:78”}
2024-06-12T11:39:17+01:00 FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 3m0s while
verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 3m0s while verifying
writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sy
nc2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Gro
up).Go.func1:78”}

Shahzada · June 12, 2024, 9:49pm

meanwhile using robocopy to copy all of the data into another 8 TB Drive fingers are crossed this will be moved over by tomorrow morning. and will try on the new drive let’s see

Alexey · June 13, 2024, 7:57am

Then you need to increase the writeability check timeout even more.

YuriyGavrilov · June 17, 2024, 4:40pm

A bit different questions but with same storj version.

Node works well ingress …egress ±200-400gb. but it restarting regularly during the day. I have never seen such behavior before.

grep last logs resulting this errors … something canceled filewalker

2024-06-17T16:20:52Z	ERROR	piecestore:cache	error getting current used space: 	{"Process": "storagenode", "error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:713\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-17T16:23:21Z	ERROR	failure during run	{"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-17T16:23:30Z	ERROR	piecestore	download failed	{"Process": "storagenode", "Piece ID": "WPO2DEIQTYR5SJ6RGIUJETUVFIWLVQWITR4KZ5HJQJULDGRXCTOA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "Offset": 0, "Size": 512, "Remote Address": "109.61.92.83:49100", "error": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled", "errorVerbose": "untrusted: unable to get signee: trust: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimitSignature:140\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:62\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:621\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:302\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}

YuriyGavrilov · June 18, 2024, 6:16pm

I have just this log

2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "5KTGEV4TY5KZI4GJD7Y3GJTWRBAKV5CRLNW2UKRAB3HPJKNVKK7A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.74:42930", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "7UY6FLTSC2RPLEIAOCXNG5F6YHDHMYC6WXMG7YK2LOGQVWYMDJVA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.43:37074", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "46DSVVCEDVIR576DURZ2PVKLTD52U5DKQHI6QQEDZQBISRW5ZWVA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.79:53044", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "2G7FYLY63A7273YHKHA2F7PFMOQ355XWIBAEDTIU3Z2ZM75EX3SQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.205.235:46420", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "MN3RYJMPCQADSWVSIJGPUSO4MPZIENPCZBQJKSDNGCP65JBDZBBQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.213.34:58706", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "WS3IZPZINSRRE7PKE32KUFHLBVPVG7F5QNMBY2MKSBXBPKKFTIUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.37:48358", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "URASIDTCYKNGL4ZHURODTSICJ3W3Q2W5VDSHVGWMGAA444DVO7HA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT_REPAIR", "Remote Address": "199.102.71.26:47048", "Size": 0}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "COLFALI6CGZLCO77AELLQ7ZDGK7ZLERU2YUYFKQAX5YD42XYK46Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.43:46158", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "POPHW34ASS5BOD6B5ZAA6GUDIW53OKIIEWEF4RIPJDKIPB7YUCUQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.65:42484", "Size": 65536}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{"Process": "storagenode", "Piece ID": "Z5B47A7QNL43ACS62VR5QMO7BGZQ5QSTXIZSJHHVNSY7TUTCZBCQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.219.45:36108"}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{"Process": "storagenode", "Piece ID": "CAAND2EWIHMXO7NNEZ5NOR62Z6WYGOREDANH76XAQGOZHQZEP7LQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "79.127.205.229:44362"}
2024-06-18T18:11:59Z	INFO	piecestore	upload canceled	{"Process": "storagenode", "Piece ID": "27XWB3SHCTCIXUW6SLFUQRAMIX7SOZVZEQUYIXR7JN57R3RZEAWA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "109.61.92.82:33520", "Size": 65536}
2024-06-18T18:13:48Z	ERROR	failure during run	{"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory
2024-06-18 18:13:48,648 INFO exited: storagenode (exit status 1; not expected)
2024-06-18 18:13:49,652 INFO spawned: 'storagenode' with pid 479
2024-06-18 18:13:49,753 WARN received SIGQUIT indicating exit request
2024-06-18 18:13:49,754 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-06-18T18:13:49Z	INFO	Got a signal from the OS: "terminated"	{"Process": "storagenode-updater"}
2024-06-18 18:13:49,757 INFO stopped: storagenode-updater (exit status 0)
2024-06-18 18:13:49,761 INFO stopped: storagenode (terminated by SIGTERM)
2024-06-18 18:13:49,762 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

Nothing more … it’s restarting such way each hour I think.

jammerdan · June 18, 2024, 6:18pm

You can increase that in your config.yaml or via Docker run command. Try 2m.

Vadim · June 18, 2024, 6:33pm

I have same thing time to time on one of my servers, All nodes just turn off at one time, hapens one’s a day or twice. Some time turn off part of nodes with same error
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory

Part of node on motherboard, part of nodes on sas3108 card and part of them on asmedia 10 sata board.

I do not find any other error on pc, like pci-e glitch at some point for some time.
cpu 30% load, 24GB ram 13 occupied

I checked All disks for error - no errors.

ACarneiro · June 19, 2024, 11:59am

Out of interest, has this been happening for a long time or only since the stress testing began?
And are your databases on the node HDD or have you placed them on an SSD?

Vadim · June 19, 2024, 12:02pm

Only when stress began, databases on OS SSD. I added 2 TB NVME cache Yesterday, with promocache for node reading and writing, no problem since.

ACarneiro · June 19, 2024, 12:08pm

Yeah, these databases really are being pushed hard…
As @jammerdan mentioned many times, perhaps there needs to be a rethink for the object-tracking…

klimas · June 19, 2024, 12:36pm

I have this…

In log i see

2024-06-19T12:34:34Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: KGRFFTBTIE7BFWX73NNEOTPMFKRE5AYZ6P34LL65M3A7UDPKBBKQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.219.39:35718}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: FTFCJQX4GHHYEQC5XMOVV75VQCJSIMN4XGHAO4KX43WY5YHYOBEQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.233:53368}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: ABRDR22QP3B2LV5AILTPGDF67G3NBYBJPWFAFNYYGWB2XFGYTQWA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.225:37792}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: 5TDNZYRSHTQDMDS7RQFIY2P5WBINHDSDCFVCRBXKCQ3BLUIVKPLA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.239:59700}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: MXPETRBGWDEKO2GXDM7BD763N7AMG5YXR53WZVBASSYEAGBF7YKA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 121.127.47.26:37034}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: E6LCR35M2TY4L3MNPPIA3TB2RTND6ZDY7CWX6MYPIFNAYGYV4ZAQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.235:44154}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: UFZVO2R6BIBSRXTIDTVUYM2RAY2MFYKZUDMNLIB3HPPIARSGYHDQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 121.127.47.25:48894}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: UWKJ4CAYIX7EJTZOE2CPP7IIP7VMC2S3E64SWBS4OMQZJ37NRZJA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.84:44132}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: OW2F4UC7ZXADDRPMZ6ZINVH6DTEZLHFKJNMQQHNNNJ5OZIU3LRFQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.201.213:50256}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: Q5JBMWVPGZM22GYFRJLEHYBHZG5PZCPOATZ5KMTP3XHDVBQ6V2MQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 79.127.226.99:57122}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: ZMFFRSKRW27GRZOP3YOLQDRXSEG7RBTDCKPUV6UUUZDVJZYJEX4Q, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.68:39176}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: RHNQFWQQ3ZMCOABODGVFZLUH7FQA7QS7F3KGZESUSWIX65UZZOBA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.230:55856}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: G4TINSMWOYQALGUOWHY32TN6EOSF2VC377FE5JQDATJ5GO4F7K5Q, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 79.127.219.43:39786}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: BAWBVTQ5II7R2A5JGP23T5M6N3FFOHM66NUO737Q4DZEX6HDMINA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.201.210:34550}

2024-06-19T12:34:58Z INFO piecestore upload canceled (race lost or node shutdown) {Process: storagenode, Piece ID: 2LQRIIF5ETRN2Y6BEWNRZFFLYZFPNZWHZVEN45QJQ25P43G7JO7A, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.75:51348}

and this…

2024-06-19T12:35:41Z ERROR failure during run {“Process”: “storagenode”, “error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”, “errorVerbose”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:178\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:167\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78”}
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory

My node restarts every few minutes.

jammerdan · June 20, 2024, 3:24am

Unfortunately this is not an option for every node or SNO.

This whole concept does not work in a reliable way any more with the stress. First of all I believe all these history stats need to be seperated. This reduces size and data loss in case of corruption.
And I still believe if databases are required they need to run in the fastest possible storage which is RAM so that the IOs happen there and not on the disk. I don’t care how to achieve this but this is as fast as it can get.
And after that back to the design table and work out something completely new. Anything else is just band aid.

Edit: If they can’t work something out maybe they need to write their own database. I believe the Google founders did that back then when non of the existing databases met their needs for the mass of data they wanted to process.

pdeline06 · June 22, 2024, 5:16pm

I must say that I also have the same problem on many nodes. No matter what I did, nothing helped. All databases are located on SSD. The developers need to do something about this, because the percentage of node suspensions is falling.

Vadim · June 22, 2024, 5:21pm

check for disk errors, also check that windows not use HDDs for page files, I discovered also that this was the case, it overload sata controller on big data ammounts

pangolin · June 22, 2024, 5:22pm

In fact all relevant history (payment) is available from satellite. When deleting databases you only losing that stats from deleted satellites.

The only necessary database is piece_expiration.db. This one obviously can not be in RAM.

2024-06-19T12:34:34Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: KGRFFTBTIE7BFWX73NNEOTPMFKRE5AYZ6P34LL65M3A7UDPKBBKQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.219.39:35718}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: FTFCJQX4GHHYEQC5XMOVV75VQCJSIMN4XGHAO4KX43WY5YHYOBEQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.233:53368}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: ABRDR22QP3B2LV5AILTPGDF67G3NBYBJPWFAFNYYGWB2XFGYTQWA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.225:37792}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: 5TDNZYRSHTQDMDS7RQFIY2P5WBINHDSDCFVCRBXKCQ3BLUIVKPLA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.239:59700}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: MXPETRBGWDEKO2GXDM7BD763N7AMG5YXR53WZVBASSYEAGBF7YKA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 121.127.47.26:37034}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: E6LCR35M2TY4L3MNPPIA3TB2RTND6ZDY7CWX6MYPIFNAYGYV4ZAQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.235:44154}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: UFZVO2R6BIBSRXTIDTVUYM2RAY2MFYKZUDMNLIB3HPPIARSGYHDQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 121.127.47.25:48894}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: UWKJ4CAYIX7EJTZOE2CPP7IIP7VMC2S3E64SWBS4OMQZJ37NRZJA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.84:44132}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: OW2F4UC7ZXADDRPMZ6ZINVH6DTEZLHFKJNMQQHNNNJ5OZIU3LRFQ, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.201.213:50256}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: Q5JBMWVPGZM22GYFRJLEHYBHZG5PZCPOATZ5KMTP3XHDVBQ6V2MQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 79.127.226.99:57122}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: ZMFFRSKRW27GRZOP3YOLQDRXSEG7RBTDCKPUV6UUUZDVJZYJEX4Q, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.68:39176}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: RHNQFWQQ3ZMCOABODGVFZLUH7FQA7QS7F3KGZESUSWIX65UZZOBA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.205.230:55856}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: G4TINSMWOYQALGUOWHY32TN6EOSF2VC377FE5JQDATJ5GO4F7K5Q, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, Remote Address: 79.127.219.43:39786}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: BAWBVTQ5II7R2A5JGP23T5M6N3FFOHM66NUO737Q4DZEX6HDMINA, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 79.127.201.210:34550}
2024-06-19T12:34:58Z	INFO	piecestore	upload canceled (race lost or node shutdown)	{Process: storagenode, Piece ID: 2LQRIIF5ETRN2Y6BEWNRZFFLYZFPNZWHZVEN45QJQ25P43G7JO7A, Satellite ID: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE, Action: PUT, Remote Address: 109.61.92.75:51348}