Failed to add bandwidth usage

Hello, I’m starting to get a lots of these errors and not sure why/how to correct them so after any help please. Have a RPi4 with 4TB HDD and been running since December

2022-03-09T12:22:06.684Z ERROR piecestore failed to add bandwidth usage {error: bandwidthdb: database is locked, errorVerbose: bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:722\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:434\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52}
2022-03-09T12:22:08.858Z ERROR piecestore failed to add bandwidth usage {error: bandwidthdb: database is locked, errorVerbose: bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:722\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:348\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52}
2022-03-09T12:22:12.149Z ERROR piecestore failed to add bandwidth usage {error: bandwidthdb: database is locked, errorVerbose: bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:722\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:434\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52}
2022-03-09T12:22:16.694Z ERROR piecestore failed to add bandwidth usage {error: bandwidthdb: database is locked, errorVerbose: bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:722\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:434\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52}
2022-03-09T12:22:22.159Z ERROR piecestore failed to add bandwidth usage {error: bandwidthdb: database is locked, errorVerbose: bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:722\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:434\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52}
2022-03-09T12:22:27.360Z ERROR piecestore failed to add bandwidth usage {error: bandwidthdb: database is locked, errorVerbose: bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:722\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:434\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:220\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:122\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:112\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52}

@tre4orbragg did you search similar reports from the forum? One that looks very similar is Database bandwidthdb is locked - #2 by SGC or Weird node behaviour - #16 by YourHelper1

Yeah but those didn’t have much clear direction in what to do. I’ve checked the database and now updated the config.yaml for the max connections to 20 and will see how that goes

As said i am facing a similar problem. Main reason i think it is because i am using a SMR disk (4TB like you btw) which can’t handle the load. You can check by yourself:

  1. if this happens when you accept lot of concurrent requests.
  2. how big is the problem, i mean the percentage of the errors you are getting. If those were all the errors for the day, then your node is more than fine. If the problem appears once in 1000 or more logs then you should just forget it. Analyzing the data from logs is nice but overthinking every error will probably do nothing more than increase your blood pressure.

Using an SMR disk seem like a classic rookie mistake (like i did of course) as everyone buys the cheapest disk to provide more storage but forgets the bandwidth which is the most important here. A SMR disk though can maybe bring more benefits in the long run, because by the time you have filled it (or almost filled it), you will be having a lot of and hopefully not so many writes, and the disk will have a better performance and of course more space for his price, compared to a more expensive CMR.

What you can do:

I currently have a limit for 7 concurrent requests (you can specify that on the config.yaml file). This of course mean less payouts but this is how much my disk can handle, otherwise ram gets filled up for no reason and then i get those errors that you mentioned…

Hope this helped, but you also have to thank @michaln for tagging the right person in the right problem :wink:

1 Like

Thanks, there have been a lot more errors but looks like a bit less by reducing the concurrent connections so playing with that to find a good balance with load and actually getting traffic still!

1 Like

I’m using Toshiba 6TB HDD, encounter the same error.
After add setting:

# Maximum number of simultaneous transfers
storage2.max-concurrent-requests: 20

Then it works!
No more “Failed to add bandwidth usage” errors, no more node suspended.

Also see: Your Node is Suspended nothing obviously wrong? - #3 by StoreMe2

This is a bad option - it cancel customers transfers due to “node is overloaded” (this is what they can see). It’s not a recommended solution, it’s a workaround for weak devices and SMR disks. The better solution is to move databases to SSD disk:

But not everyone has money to upgrade to 6TB SSD…

1 Like

Only databases. They are (very) small.

One other option is to add a configuration option for storagenode client to not create them in the first place, or create in-ram if some are needed to store transient data.

I would not mind losing statistics I don’t look at anyway if that means not buying even a small SSD and not sacrificing performance.

In fact, nothing prevents node operators to place the databases to ramdisk. Problem solved.

How? I see the storage dir contains many files, also many blobs, which one should I move?

Only *.db ones, however I wouldn’t recommend to use RAM to store them, or you need to use some scripting to copy them back to the disk before shutdown and copy them back to RAM before node start, seems complicated to me.
If you do not have SSD, it’s fine too, it was just a suggestion on case if you could have it.
SMR disks are not good unfortunately, and there is no good solution, except running several nodes each on own disk to spread the load, or using this limiting option. Unfortunately it affects both ingress and egress, so it will reduce your earnings.

1 Like

@Alexey, is this just to preserve local history for the node or there is another reason? I was under impression that if the databases don’t exist they will be recreated on launch. Is this not the case?

OK, I’ve moved all the *.db files into the SSD, and set storage2.max-concurrent-requests: 0. Now, all the error was gone!

Note: Remember to leave storage-dir-verification file in your storage dir, otherwise, it can not start.

1 Like

Yes, if databases are not exist, they will be recreated. However, you will:

  • lose your previous and current Stat;
  • you cannot disable filewalker, because database is empty, so it must be enabled; on SMR disk it could take days to finish.
1 Like

That’s not an issue.

That I did not know, thank you. in the hindsight — it’s obvious!

One solution here is to only the chunks database to be persisted on disk, while all other, “unimportant” stats related ones — in-ram. This will reduce IO pressure on the drive, and may be enough.

On a separate note:

SMRs are horrific at writes, but reads should not be affected.

On the other hand, having filewalker access each chunk at the start effectively pre-warms the in-memory filesystem metadata cache, thereby improving subsequent time-to-first-byte on actual requests, ultimately contributing to better payouts by winning more races. So I would keep filewalker enabled if only to facilitate that.

But of course if the hardware is so shoddy that even that is too much (I.e. low ram microcontroller based with software usb connected mass storage device) then I guess cheap SSD to keep databases and avoid filewalker would definitely be superior approach.

and more points of failure.

unfortunately not all SMRs built equally, some perform their optimizations while you read, so read become slow too.

1 Like

But so does moving all databases off-main drive.

Or do you mean keeping track of what is persisted where? It can be done outside of storage node, just via symlinks, but ultimately yes, while it’s extra things to keep track of, if hardware is inadequate, some compromises must be made somewhere: either add fragility, or pay with performance.

I dislike the whole idea of running nodes on odroids and other raspberry pies in the first place, but if SNO already went that route — few more points of failures won’t matter, especially if this helps avoid buying extra hardware (SSD or better drive)

Oh wow…

yes to both :slight_smile: and also, in additional to the point of failure when you moved databases you will add a point of failure by moving them partially. It even sounds as complicated.
By the way, I’m not sure that symlinks will work well for databases.

1 Like

A post was split to a new topic: Перенос баз данных на Windows