Suddenly suspended - no recent changes

For posterity, in another branch of this discussion, we determined the issue is not with the installation or the computer, but with the (external) drive used for Storj data.

1 Like

I ran fsck on the disk, it ran for about 24 hours and found many errors, but finally completed. Now, the disk mounts properly on boot. Good progress!

Now, I am trying to repair the databases. But I find the instructions unclear in ways that make it difficult to proceed. https://support.storj.io/hc/en-us/articles/360029309111-How-to-fix-a-database-disk-image-is-malformed-

I get through step 4, no problem. sudo cp bandwidth.db ~/bandwidth.db.backup

With step 5, several issues, so I am stuck. First:
I think that I am supposed to choose between 5-1 (“Docker”) and 5-2 (“Direct installation”), as opposed to simply following the instructions in sequence as I have been at this point. But I’m not certain, and it is not specified whether starting with Step 5 this is now an “or” rather than a step-by-step approach.

Let’s assume I am correct, I’m supposed to do EITHER 5-1 OR 5-2. How do I choose?

I know that I used docker for my storagenode, but I think what is going on here is, I’m being asked to choose between installing sqlite3 on my server, or else run it from its own docker container. I already have sqlite3 installed. So, do I have a choice here, or am I supposed to simply use docker because my storagenode uses docker? If I am supposed to choose, what is the basis for my choice?

OK, let’s say I go with 5-1. The instructions say:

  • Docker (replace ${PWD} with an absolute path to the databases location, or simple switch the current location to there)

I have already switched my current working directory to the directory that contains the databases. So, I think that means that I do not have to replace ${PWD} with anything? I can just leave it as is? But the instructions seem to be in shorthand, so I’m not sure about this.

I assume that my whole setup is rather fragile until repaired, so I do not want to make an incorrect step and risk damaging it further.

Also, here is an updated dump from the logs, taken after I ran fsck:

pete@pete-mini-linux:~/storj$ docker logs --tail 50 storagenode|grep ERROR
2022-06-30T17:47:02.853Z	ERROR	piecestore	failed to add bandwidth usage	{"Process": "storagenode", "error": "bandwidthdb: database disk image is malformed", "errorVerbose": "bandwidthdb: database disk image is malformed\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:723\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:437\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-06-30T17:47:03.068Z	ERROR	collector	error during collecting pieces: 	{"Process": "storagenode", "error": "database disk image is malformed"}
2022-06-30T17:47:03.090Z	ERROR	bandwidth	Could not rollup bandwidth usage	{"Process": "storagenode", "error": "bandwidthdb: database disk image is malformed", "errorVerbose": "bandwidthdb: database disk image is malformed\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Rollup:324\n\tstorj.io/storj/storagenode/bandwidth.(*Service).Rollup:53\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/bandwidth.(*Service).Run:45\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-06-30T17:47:03.888Z	ERROR	piecestore	download failed	{"Process": "storagenode", "Piece ID": "O2X6LGPN6GZ4XX2YYBBY5LUVPUXN3PJLMJECVRIK3LUYAN3YEX5Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-06-30T17:47:04.765Z	ERROR	piecestore	failed to add bandwidth usage	{"Process": "storagenode", "error": "bandwidthdb: database disk image is malformed", "errorVerbose": "bandwidthdb: database disk image is malformed\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:723\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func6:662\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:686\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-06-30T17:47:08.326Z	ERROR	piecestore	failed to add bandwidth usage	{"Process": "storagenode", "error": "bandwidthdb: database disk image is malformed", "errorVerbose": "bandwidthdb: database disk image is malformed\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:723\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:437\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-06-30T17:47:10.743Z	ERROR	piecestore	download failed	{"Process": "storagenode", "Piece ID": "T72IM3GHE3AXRQLRUOSXHT4CFLYSN2TBH6H4AVRR46OXTCUPOFKA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-06-30T17:47:10.805Z	ERROR	piecestore	download failed	{"Process": "storagenode", "Piece ID": "T72IM3GHE3AXRQLRUOSXHT4CFLYSN2TBH6H4AVRR46OXTCUPOFKA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-06-30T17:47:15.626Z	ERROR	piecestore	failed to add bandwidth usage	{"Process": "storagenode", "error": "bandwidthdb: database disk image is malformed", "errorVerbose": "bandwidthdb: database disk image is malformed\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:723\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:437\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}

A friend helped me navigate past that. I ran the db check locally (not docker) and get the following on the bandwidth.db:

*** in database main ***
Page 7391: btreeInitPage() returns error code 11
On tree page 7390 cell 0: Extends off end of page
On tree page 7281 cell 81: invalid page number 3339564
On tree page 7281 cell 80: invalid page number 3339563
On tree page 7281 cell 79: invalid page number 3339562
On tree page 7281 cell 78: invalid page number 3339561
On tree page 7281 cell 77: invalid page number 3339560
On tree page 7281 cell 76: invalid page number 3339558
On tree page 7281 cell 75: invalid page number 3339556
On tree page 7281 cell 74: invalid page number 3339555
On tree page 7281 cell 73: invalid page number 3339554
On tree page 7281 cell 72: invalid page number 3339553
On tree page 7281 cell 71: invalid page number 3339551
On tree page 7281 cell 70: invalid page number 1779060746
On tree page 7281 cell 69: invalid page number 140642376
On tree page 7281 cell 68: invalid page number 1946573318
On tree page 7281 cell 67: invalid page number 74187858
On tree page 7281 cell 66: invalid page number 2114085890
On tree page 7281 cell 65: invalid page number 256511792
On tree page 7281 cell 64: invalid page number 1544372749
On tree page 7281 cell 63: invalid page number 190057274
On tree page 7281 cell 62: invalid page number 857649945
On tree page 7281 cell 61: invalid page number 908931125
On tree page 7281 cell 60: invalid page number 857649177
On tree page 7281 cell 59: invalid page number 908931125
On tree page 7281 cell 57: invalid page number 758134317
On tree page 7281 cell 55: invalid page number 758134317
On tree page 7281 cell 53: invalid page number 758134317
On tree page 7281 cell 51: Extends off end of page
On tree page 7281 cell 33: Extends off end of page
On tree page 7281 cell 32: invalid page number 3339592
On tree page 7281 cell 31: invalid page number 3339590
On tree page 7281 cell 30: invalid page number 3339589
On tree page 7281 cell 29: invalid page number 3339586
On tree page 7281 cell 28: invalid page number 3339584
On tree page 7281 cell 27: invalid page number 3339582
On tree page 7281 cell 26: invalid page number 3339581
On tree page 7281 cell 25: invalid page number 3339580
On tree page 7281 cell 24: invalid page number 3339578
On tree page 7281 cell 23: invalid page number 3339575
On tree page 7281 cell 22: invalid page number 3339574
On tree page 7281 cell 21: invalid page number 3339573
On tree page 7281 cell 20: invalid page number 3339572
On tree page 7281 cell 19: invalid page number 3339571
On tree page 7281 cell 18: invalid page number 3339569
On tree page 7281 cell 17: invalid page number 3339568
On tree page 7281 cell 16: invalid page number 3339567
On tree page 7281 cell 15: invalid page number 3339566
On tree page 7281 cell 14: invalid page number 3339565
On tree page 7281 cell 13: invalid page number 3339605
On tree page 7281 cell 12: invalid page number 3339602
On tree page 7281 cell 11: invalid page number 3339601
On tree page 7281 cell 10: invalid page number 3339598
On tree page 7281 cell 9: invalid page number 3339597
On tree page 7281 cell 8: invalid page number 3339596
On tree page 7281 cell 7: invalid page number 3339595
On tree page 7281 cell 6: invalid page number 3339593
Error: database disk image is malformed

Just logging progress here.

I got up to step 13 in the “repair database” steps. I ran the command:

sqlite3 storage/bandwidth.db ".read storage/dump_all_notrans.sql"

It has been several hours, and as far as I can tell the command is still running. (No feedback, so I’m not certain.) The database it is replacing was about 29 MB in size.

You can speedup a process if you have enough RAM to keep the database and uncompressed text copy in RAM, see example: Used_serial.db malformed - #4 by Alexey

Or if you do not mind to lose the statistic - you can use this instruction for each corrupted database: https://support.storj.io/hc/en-us/articles/4403032417044-How-to-fix-database-file-is-not-a-database-error

Thank you Alexey! I don’t know how to determine whether I have enough RAM (4 GB RAM, 29 MB bandwidth.db, mostly full 4TB drive) and I don’t know how dire the consequences are if I don’t have enough, so I just stuck with the more basic approach.

But, success! I finally got the databases repaired (bandwidth.db and piece_expiration.db were both corrupted, bandwidth.db took about 3-4 hours to repair).

I am now back online! It says I am suspended, but from what I gather from other posts, I believe that within a few days I will start getting and passing audits, which will at some point end the suspension. Glad to be back on track!

Next steps:

  1. I will make a few suggestions about how to improve the documentation – please let me know if there is a preferred way/place to suggest edits.
  2. I’ll be setting up a Dell i3 with internal SATA drives as a home file server, and setting up a Storj drive on that. (Maybe migrating this one, maybe setting up a 2nd node, haven’t decided yet.)
1 Like

You can either make a PR in the repo GitHub - storj/docs at gitbook-node-sync (to this branch) or submit an issue: Issues · storj/docs · GitHub

1 Like

Great, thank you.

This morning I see I am disqualified on US2. I’m not worried about that particular disqualification (US2 has been by far the least productive satellite for me, I’ve never earned more than 3¢/month from that.) But, I’m curious why, and whether this will happen with the other satellites.

In the dashboard, several things I have noticed in the last ~18 hours:

  • Some of the numbers (audit numbers, for instance) have gone down since I first recovered the server. I assume this is just a matter of “catching up” but it is curious. And maybe connected to the next…
  • My total disk usage dropped substantially, it was almost full (3.1 TB allocated) and now it is down to 2.42 TB used and 657 GB free. I initially figured this was just old info deleted by DCS users, but…that would probably sit in the “Trash” for 30 days, not be deleted immediately, right? So…does this reflect data that got corrupted by my disk issues? And could lead to further failed audits and DQ?
  • The “uptime” now is listed at 8 hours, but the server has been running for more like 18 hours. Strange. I don’t think the server is capable of rebooting itself even if there was some weird power issue last night (and I’m sure there wasn’t.)

Meanwhile, I’ve had 28 GB of bandwidth used since yesterday, so things do appear to be working well on the whole. Just hoping there isn’t a spreading cancer.

The audit score is not immediately pushed back to storagenode, it’s updated on every check-in (once a hour by default).

Data can be moved to trash in several cases: customers deleted their data, your node were offline long enough and repair worker recovered unavailable data to other nodes, so these pieces got deleted from your node; the identity has changed and a garbage collector started to remove foreign data not belonging to the new identity.

“Uptime” is just a time how long your node running from the last restart. So, perhaps your node were restarted - you can search for reasons in logs. Maybe your disk disconnecting from time to time and this will lead to drop of suspension score, audit score and your node can be disqualified on other satellites as well.

1 Like

Ah, I really do need to figure out if that disk is actively causing problems then.

Here’s a dump of some recent logs, does this help narrow it down @Alexey ?

pete@pete-mini-linux:~/storj$ docker logs --tail 200 storagenode |grep ERROR
2022-07-05T16:07:08.486Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "RIV2CHQXDS2BNIS6SHQLF3TDRWIRGZJVNEKNAPOFZDFMNOLFL2SA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:07:39.978Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "PEQRZ5337L3T4MYPDFBEVHM64JKLEAHRXLBPMKMQG5DMW3YAMYMQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_REPAIR", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:07:40.257Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "PEQRZ5337L3T4MYPDFBEVHM64JKLEAHRXLBPMKMQG5DMW3YAMYMQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_REPAIR", "error": "used serial already exists in store", "errorVerbose": "used serial already exists in store\n\tstorj.io/storj/storagenode/piecestore/usedserials.insertSerial:263\n\tstorj.io/storj/storagenode/piecestore/usedserials.(*Table).Add:117\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:498\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:07:40.317Z        ERROR   piecedeleter    could not send delete piece to trash    {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "UFJXL3GOU226G32WE6DQVD36BQOGDZLEHERDAN4GHK5XNLK5MJ6A", "error": "pieces error: v0pieceinfodb: sql: no rows in result set", "errorVerbose": "pieces error: v0pieceinfodb: sql: no rows in result set\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).Get:131\n\tstorj.io/storj/storagenode/pieces.(*Store).MigrateV0ToV1:404\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:348\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-07-05T16:07:48.568Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "DLMFBSOGRGVWLEGWABYJW7VQ3VUVIWLGYNNLYFTXQ7QGJJ22ESHA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:07:55.168Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "FM5Y5JBITW377NR3UG5E2I7JDFSKNQZMM6FZNL3EYWMQKFWFOGYA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:08:03.050Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "MSLDTRMZRXDIB3X4EUGMLLFUCDVTQZSTCWYPICI4UNDINUJL42LQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:08:26.042Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "RIV2CHQXDS2BNIS6SHQLF3TDRWIRGZJVNEKNAPOFZDFMNOLFL2SA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:08:33.200Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "AVAKB6FSLLDDQ2DQLVUZC2JD3VD63NJTS5U7XWIBO5CM5U3AD4ZQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_REPAIR", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:08:33.441Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "AVAKB6FSLLDDQ2DQLVUZC2JD3VD63NJTS5U7XWIBO5CM5U3AD4ZQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_REPAIR", "error": "used serial already exists in store", "errorVerbose": "used serial already exists in store\n\tstorj.io/storj/storagenode/piecestore/usedserials.insertSerial:263\n\tstorj.io/storj/storagenode/piecestore/usedserials.(*Table).Add:117\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:498\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:08:58.343Z        ERROR   piecedeleter    could not send delete piece to trash    {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "YVUZJZQO75SUFWRVOJGMIHLWOYPUDV7YUMBZTOZRPB2IFAUR4ZOA", "error": "pieces error: v0pieceinfodb: sql: no rows in result set", "errorVerbose": "pieces error: v0pieceinfodb: sql: no rows in result set\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).Get:131\n\tstorj.io/storj/storagenode/pieces.(*Store).MigrateV0ToV1:404\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:348\n\tstorj.io/storj/storagenode/pieces.(*Deleter).deleteOrTrash:185\n\tstorj.io/storj/storagenode/pieces.(*Deleter).work:135\n\tstorj.io/storj/storagenode/pieces.(*Deleter).Run.func1:72\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-07-05T16:09:07.936Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "PEXFMZOT47UTZLAWR3ML47CDHAPJWKPY4WSJOMYUTMXDIMXN57MA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:09:15.284Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "CP4AK6USV4TNUQWXJHMYGDDHLA63XXTQ4BDLVUQ564TADC7PDE4Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
2022-07-05T16:09:27.605Z        ERROR   piecestore      download failed {"Process": "storagenode", "Piece ID": "OBVCJNO6GTR73GNVDH2KMATZ6TWO2MCFVGXSGXFSSATVIXOBQ3PQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:546\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}

It seems like all the “download failed” items are not a big deal (based on the post “Error Codes: What they mean and Severity Level [READ FIRST]” )

The one quoted above does not seem to appear in that post, but it doesn’t sound like a big deal. Am I missing something? Or do you want to see more logs than that?

affects audit score immediately.

affects suspension score.

“GET” doesn’t affect neither audit nor suspension scores, however your node obviously have a data loss.

OK. Yes, when I ran FSCK on the disk, it ran for about 24 hours and found numerous errors. (I’m not sure why! I do not think it is hardware failure.)

So, I thought that when it went back online, if the data was “complete” enough everything would start going back to normal. But, it seems that instead, it takes some days for continued audits, to determine whether or not the damage was too severe to continue?

I suppose I was fooled, by the fact that the node became operational again, and started ingesting new data etc. I took that as a very good sign.

What do you think is my best course of action? Simply leave it online, and wait and see whether its data loss is too much to prevent eventual disqualification? That would make sense to me. I am in the process of setting up a new server with a new drive, so I could start a new node in parallel, and if this one is disqualified, I would be a few steps along the path of starting over.

Audits happening all the time while your node online. It is independent of failure state. As soon as you bring it online - it will be audited as usual.

Remained satellites will paid your node while it’s trusted (audit score is greater than 60%), so you can run the node while it’s not fully disqualified on all satellites.

Ok, that’s what I thought. But, to be more specific:

Is there anything I can do to improve my node, to reduce the possibility that it will be disqualified on more satellites? Or is “wait and see” my only real option?

Only restoring of lost data without overwriting the existing one.
The filesystem corruption doesn’t happen without a reason, there was something - power outage, cable disconnection, disk or controller dying or not enough power supply (especially true for external drives - they must have an external power supply, the only USB is not enough for long run).
So, if you can fix hardware problems (for example - take harddrive from the enclosure and connect it internally, or maybe migrate data and replace the disk if S.M.A.R.T. shows signs of dying, replace or reconnect the cable, replace or fix a power supply, so on and so forth), then it should not have more issues. But if unrecoverable data loss is already happened, then only time will show - will it survive or not, because random parts of data will be audited.

I see. I would like to move this drive (which is currently an external drive on a USB controller with a dedicated power supply, connected by USB 2) from the current system (a Mac Mini from 2007) to a new system (Dell from 2013) on internal SATA. I think I can probably figure out how to do that, and avoid any need to worry about the drive enclosure, its power supply, or the USB cable.

It may be that I shut down/rebooted the computer too many times without properly stopping the docker container first; that would be my first guess of why the data got corrupted. I don’t think it is losing power or getting disconnected intermittently or I think that would show up in my logs, also I think I would have to manually restart storj each time.

But, maybe I am missing something. At any rate, I appreciate your suggestions, and I think just focusing on moving the drive from the Mac to a Dell (and therefore from USB2 to SATA) is maybe the best place to focus my energy. If it gets further disqualified, I am OK with just starting over with a new node, but I will hope it doesn’t come to that!

This is usually not required - the system usually takes care of docker when you reboot. But system could disconnect USB too early for gracefully stopping the container.

1 Like

So, just to get some more specifics here. This is a snapshot from last night. As I pointed out before, US2 got disqualified last week. The other numbers have been kind of all over the place, but it appears that maybe that has “stabilized” a little, since last night all the numbers have improved just a little, so based on that and on what you’ve told me, I’m hopeful that the damage was one-time, and may not have been severe enough to get me disqualified on other satellites.
temptopost

As a side note, it sure would be nice if the dashboard had a facility for tracking these numbers over time, or at least for exporting them as a simple CSV for easier monitoring.