Hi I am getting a lot of “failed to add bandwith usage” errors.
This file ist almost 8GB big. A lot for a database.
Tried to rename it but my node does not start with a new database (keeps restarting).
2023-10-01T15:53:39+02:00 ERROR piecestore failed to add bandwidth usage {"process": "storagenode", "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:60\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).beginSaveOrder.func1:882\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func7.1:751\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download.func7:789\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:806\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:251\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
As I can see, the database is not corrupted, it’s locked, meaning your disk subsystem is slow.
If you would follow mentioned suggestions, you will lose the bandwidth statistic, include past periods.
If you are ok to lose the statistics, go ahead. But if you want to preserve it, then you can try to dump data from this database and load it to a new database:
Is this really due to the hard drive or could it also be because the RAM usage is significantly higher? I locked the max RAM to 3GB and yesterday it was almost full. Now with 6GB I dont see the error at RAM usage about 2,5GB.
maybe it can’t get up because of fragmentation. 6-7TB is an critical point. in ntfs also.
makes all sense to me. check fragmentation, try defragmentation.
i dont know how to do it in unraid xfs, but there seems to be possibilities.
Node buffers data when disk can’t keep up with IO. Limiting node ram is pointless. It will get killed. You need to improve your filesystem performance. There is a lot of discussion on this forum on how to do that. Start with increasing amount of ram available for the filesystem cache; you want metadata to be entirely in ram, this will remove a lot of IO from the disk.
High RAM usage is the symptom, not root cause here. Slow disk I/O will result in both locked database and high RAM usage. Make your disk I/O faster and both symptoms will go away.
Sorry, I am unfamiliar with Unraid. You need to find the bottleneck in your setup. Given that this is supposedly a Linux-based system, you should probably start diagnosis from looking at the output of iostat as described here.
Do not use XFS, it is very slow on deletes and the drive will be struggling on each GC run. It will be even worse as the node will grow.
I made the same mistake and I’m currently migrating nodes from XFS to ext4 as GC takes much more time on XFS than it takes on ext4. Not very happy as it is very time consuming, but with XFS when GC was running I had very low success rates, as low as 30% on server grade drives.
That’s interesting, I didn’t know, that xfs is slower than ext4, I though they are similar regarding i/o, but xfs have some advantages (not used by storagenode though).
Thanks
@Toyoo
I dont get the database error anymore, but my RAM usage is really high. Like 6GB sometimes.
@daki82
Unraid OS runs in RAM and starts from a USB.
The Disk only runs the node. Everything else (Docker) is on fast M.2 or RAM.
I did not know I can defragment XFS. This will probably take a long time and I will probably have to stop the node for this?
Moving to EXT4 will probably take a long time, right? In the past I moved a 1TB Node to another Disk and it took over 1 day. How does it compare to NTFS? NTFS would be a lot easier to handle.
Please NEVER use NTFS under Linux (UnRaid uses a Linux OS), things become even worse than with XFS. It will be corrupted and to fix issues you will be forced to connect this drive to Windows to fix them.
@Alexey Cant confirm. I am running 1 NTFS node with 7TB with 0 problems and less “race lost” errors compared to XFS. The node is from the start of storj v3.
In short, nobody can say if its bad. I have a similar drive wich also did stutter at the same filling. Of ~7tb. But with ntfs.
If the drive is alright nothing bad can happen if you run defrag for a week while node is running.
But i set it via yaml to be full. To reduce stress.
Also moved the dbs to an flashdrive.