Failed to add bandwith usage

Pascal51882 · October 1, 2023, 2:06pm

Hi I am getting a lot of “failed to add bandwith usage” errors.
This file ist almost 8GB big. A lot for a database.
Tried to rename it but my node does not start with a new database (keeps restarting).

2023-10-01T15:53:39+02:00       ERROR   piecestore      failed to add bandwidth usage   {"process": "storagenode", "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:882\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func7.1:751\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func7:789\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:806\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}

arrogantrabbit · October 1, 2023, 5:24pm

Stop node, check filesystem, then follow this to check, repair, or recreate affected databases https://support.storj.io/hc/en-us/articles/4403032417044-How-to-fix-database-file-is-not-a-database-error, then make changes to your setup to prevent corrupting it in the future.

daki82 · October 1, 2023, 7:52pm

and if its docker remove container:

After reboot: Failed to add bandwidth usage

This is unlikely. The main point - you need to stop and remove the container, rename this database, move all remained *.db to the some folder, run the container using all your parameters, check logs, that all databases are created and new version of the storagenode is downloaded and started, it also should finish all DB migrations. After that you will stop and remove the container and move databases back with replace. In a result you would have a new database with a correct scheme (only this one), and all your previous databases, so the loss of stat would be minimal. Then run the node with all your parameters and check logs.

And yes, you need not only stop the node, but also remove the container.

I do not see in your logs any message about created databases.
If this is a second run, I would assume, that it should be online, but perhaps you need to use How to remote access the web dashboard - Storj Docs, or try to open a dashboard on the node’s host.

Alexey · October 2, 2023, 3:08am

As I can see, the database is not corrupted, it’s locked, meaning your disk subsystem is slow.
If you would follow mentioned suggestions, you will lose the bandwidth statistic, include past periods.
If you are ok to lose the statistics, go ahead. But if you want to preserve it, then you can try to dump data from this database and load it to a new database:

The other way is to move databases to SSD:

Pascal51882 · October 2, 2023, 10:31am

Is this really due to the hard drive or could it also be because the RAM usage is significantly higher? I locked the max RAM to 3GB and yesterday it was almost full. Now with 6GB I dont see the error at RAM usage about 2,5GB.

daki82 · October 2, 2023, 11:00am

please tell about hardware, disk type/number , filesystem etc.

could be

Pascal51882 · October 2, 2023, 11:56am

OS Unraid 6.12.3
Ryzen 5 5600G, 32GB RAM
WDC_WD120EMFZ 12TB WD White Label connected with SATA 6G (WD My Book or Elements)
Filesystem XFS, 6-7TB used

Node RAM usage between 800MB and 3GB (Docker limit 6GB)

daki82 · October 2, 2023, 12:08pm

maybe it can’t get up because of fragmentation. 6-7TB is an critical point. in ntfs also.
makes all sense to me. check fragmentation, try defragmentation.
i dont know how to do it in unraid xfs, but there seems to be possibilities.

arrogantrabbit · October 2, 2023, 2:44pm

Node buffers data when disk can’t keep up with IO. Limiting node ram is pointless. It will get killed. You need to improve your filesystem performance. There is a lot of discussion on this forum on how to do that. Start with increasing amount of ram available for the filesystem cache; you want metadata to be entirely in ram, this will remove a lot of IO from the disk.

Toyoo · October 6, 2023, 9:56pm

High RAM usage is the symptom, not root cause here. Slow disk I/O will result in both locked database and high RAM usage. Make your disk I/O faster and both symptoms will go away.

Pascal51882 · October 13, 2023, 1:53pm

I am not sure what to do about Disk I/O speeds. The disks only runs for Storj and are connected via SATA6.

Toyoo · October 13, 2023, 7:33pm

Sorry, I am unfamiliar with Unraid. You need to find the bottleneck in your setup. Given that this is supposedly a Linux-based system, you should probably start diagnosis from looking at the output of iostat as described here.

daki82 · October 13, 2023, 10:06pm

did you defragment it? this will improve disk speed.

also you can move databases to ssd? is the system on the same drive like the node?

its verry likely the wd-drive. white label is usualy shrugged from the encasing.

zip · October 14, 2023, 12:25am

Do not use XFS, it is very slow on deletes and the drive will be struggling on each GC run. It will be even worse as the node will grow.
I made the same mistake and I’m currently migrating nodes from XFS to ext4 as GC takes much more time on XFS than it takes on ext4. Not very happy as it is very time consuming, but with XFS when GC was running I had very low success rates, as low as 30% on server grade drives.

Alexey · October 14, 2023, 2:14am

That’s interesting, I didn’t know, that xfs is slower than ext4, I though they are similar regarding i/o, but xfs have some advantages (not used by storagenode though).
Thanks

Pascal51882 · October 14, 2023, 10:19am

@Toyoo
I dont get the database error anymore, but my RAM usage is really high. Like 6GB sometimes.

@daki82
Unraid OS runs in RAM and starts from a USB.
The Disk only runs the node. Everything else (Docker) is on fast M.2 or RAM.

I did not know I can defragment XFS. This will probably take a long time and I will probably have to stop the node for this?

Moving to EXT4 will probably take a long time, right? In the past I moved a 1TB Node to another Disk and it took over 1 day. How does it compare to NTFS? NTFS would be a lot easier to handle.

Alexey · October 14, 2023, 10:22am

Please NEVER use NTFS under Linux (UnRaid uses a Linux OS), things become even worse than with XFS. It will be corrupted and to fix issues you will be forced to connect this drive to Windows to fix them.

Pascal51882 · October 14, 2023, 12:03pm

Fragmentation looks not to bad or am I wrong?

@Alexey Cant confirm. I am running 1 NTFS node with 7TB with 0 problems and less “race lost” errors compared to XFS. The node is from the start of storj v3.

Toyoo · October 14, 2023, 12:47pm

Sorry, I don’t know Unraid, I cannot help you more than just suggesting you need to find what is bottlenecking your I/O.

daki82 · October 14, 2023, 2:29pm

In short, nobody can say if its bad. I have a similar drive wich also did stutter at the same filling. Of ~7tb. But with ntfs.

If the drive is alright nothing bad can happen if you run defrag for a week while node is running.
But i set it via yaml to be full. To reduce stress.
Also moved the dbs to an flashdrive.

With windows it was no problem.

Maybe we both have slow slug drives.