Node crashing after 20-30 min

wildraven · January 16, 2021, 2:04pm

my node keeps crashing, the uptime does not go over 1 hour, and in logs I see some error, any ideeas whats wrong ?

2021-01-16T15:58:21.669+0200 INFO Interrogate request received.
2021-01-16T15:58:29.785+0200 ERROR piecestore failed to add bandwidth usage {“error”: “bandwidthdb error: database is locked”, “errorVerbose”: “bandwidthdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:683\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func6:625\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:646\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:1004\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:29\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”}
2021-01-16T15:58:29.785+0200 INFO piecestore downloaded {“Piece ID”: “35B3JMQG4F7FLHDS3TG3KFX3F2HI5SM4YMTEMKV4RM32GLCVPCMA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “GET_REPAIR”}
2021-01-16T15:58:32.667+0200 INFO piecestore downloaded {“Piece ID”: “TCGQPVKTM4GQOENLVQOUMQSLIRBQAUSMRUWHJW56ZEEN64FNOUDA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “GET_REPAIR”}
2021-01-16T15:58:32.668+0200 INFO piecestore upload canceled {“Piece ID”: “XPZKVAQVUSGXROOV433BNW2I5WVDXAIG6CA5TDIREZMUDNDNHMAQ”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “PUT”, “Size”: 163840}
2021-01-16T15:58:32.669+0200 INFO piecestore upload canceled {“Piece ID”: “Y56CWX6KUGCX6FP337DQ5QB3BHIEGODRUQ5VOK7FDBQNR35ROFUQ”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Action”: “PUT”, “Size”: 163840}
2021-01-16T15:58:32.669+0200 INFO piecestore upload canceled {“Piece ID”: “JE3QBIQ7UWJDBYTYHLP6THXMDMLVGWUS2XN7LJ76QKL52XAUBSBQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “PUT_REPAIR”, “Size”: 163840}
2021-01-16T15:58:32.670+0200 INFO piecestore upload canceled {“Piece ID”: “5S2KTAOE6FVPK3L5RSABRZCXODPQ4FDLFYNZ5BZ7DW4O5HC6VGKA”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Action”: “PUT”, “Size”: 163840}
2021-01-16T15:58:33.920+0200 FATAL Unrecoverable error {“error”: “CreateFile E:\storage\blobs\qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa\7n/4ko4hg7c2imzmlopn5vvnxqzmzy5qxhyi5ecpnvdfeyx6skgsa.sj1: The file or directory is corrupted and unreadable.”, “errorVerbose”: “CreateFile E:\storage\blobs\qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa\7n/4ko4hg7c2imzmlopn5vvnxqzmzy5qxhyi5ecpnvdfeyx6skgsa.sj1: The file or directory is corrupted and unreadable.\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:787\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:280\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:496\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:661\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:81\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:80\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

deathlessdd · January 16, 2021, 2:07pm

How is your drive connected? This looks like your drive maybe dying.

wildraven · January 16, 2021, 2:08pm

its a new drive, internal 14 tb sata

deathlessdd · January 16, 2021, 2:09pm

Is this a new node or an migrated node? I would also run a scan check to make sure the drive it good.

wildraven · January 16, 2021, 2:11pm

I just migrated it to a new pc just becouse of the same problem on the old pc

what to scan ? got no ideea

deathlessdd · January 16, 2021, 2:12pm

Well it looks like your running it on windows, So I would just run a checkdisk on windows. But First I would stop the node.

wildraven · January 16, 2021, 2:13pm

the disk is ok I checked it, but in the errors there is something about database but I dont know how to check and what

deathlessdd · January 16, 2021, 2:14pm

You should run the check in cmd like this chkdsk E: /f /x /r
Quick check doesn’t do anything.

The error is bandwidth is lock which does this so you cant corrupt it

wildraven · January 16, 2021, 2:16pm

ok im checking again, eta 40 min…

deathlessdd · January 16, 2021, 2:19pm

Did you have any power outages?

wildraven · January 16, 2021, 2:19pm

yes I had…

deathlessdd · January 16, 2021, 2:21pm

For a windows node I would disable write-cache on that drive because if you don’t have a UPS you can corrupt data if you have power outages.

wildraven · January 16, 2021, 2:23pm

ok I will disable it

wildraven · January 16, 2021, 2:24pm

eta for check is now at 120 , something is not right

deathlessdd · January 16, 2021, 2:28pm

This is normal its not a quick scan and your drive is large.

wildraven · January 16, 2021, 2:28pm

ok, 300 min now…

deathlessdd · January 16, 2021, 2:30pm

It should be in 5 stages.

wildraven · January 16, 2021, 2:39pm

it found some orphan file record segments
.
.
.
its now on stage 2

Pac · January 16, 2021, 5:21pm

Feels like database files might need an integrity check maybe?
No idea how to do that on Windows though.

wildraven · January 16, 2021, 6:09pm

I have no ideea, does anyone know how to do that ?