Failed to add bandwith usage

Chrysen · October 14, 2023, 3:32pm

I also use unraid, but with the Unassigned Devices plugin and have formatted the HDDs in ext4.

With me, the nodes also partly use a lot of RAM, but as a buffer.
I found it a bit strange at first. if you look in the dashboard you see how much ram is really reserved.
In Linux there is no unused RAM.
If other services use more RAM, the buffer should automatically become smaller.

with this command you can check your RAM settings and adjust them if necessary.
sysctl -a | grep dirty

adjust if necessary e.g. with
sysctl vm.dirty_ratio=15

synonymous to the other parameters

I think in Unraid the following is preset

vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 10
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200

so up to 10% of the free RAM is used as buffer. If you need more RAM e.g. for a VM then the buffer will be smaller.

For a permanent change you have to add the desired commands in the file
/boot/config/go file.

Alexey · October 19, 2023, 3:18am

7 posts were split to a new topic: Is there any disadvantage by using a single rsync command for the whole storj folder?

Pascal51882 · October 20, 2023, 6:58am

I have no option in Unraid Unassigned Devices plus to format the drive in ext4

I read mounting is possible, but I did not find any information about formatting.
Unraid is based on Slackware. I dont want to go back to XFS

Walter1 · October 20, 2023, 7:17am

Debian 12 has a similar preset:

grafik

daki82 · October 20, 2023, 8:52am

maybe this could help? i don’t know unraid or linux.

@Alexey maybe zfs is an option here?

Chrysen · October 20, 2023, 9:12am

I mounted the hdds in a Linux vm and formatted it with gparted in ext4.

Pascal51882 · October 20, 2023, 11:27am

Thank you so much @Chrysen! I tried it before over the command line but it did not work.
My Lubuntu VM did the Job and now I am ready to copy everything back
Have to do this 3 more times but with more ext4 drives it should be faster than XFS to XFS.

By the way even NTFS is way faster than XFS on my server

arrogantrabbit · October 20, 2023, 2:47pm

You have ZFS there in the list. what do you need ext4 for?

Sigh. People will always go out of their way to shoot themselves in the foot.

Pascal51882 · October 20, 2023, 3:00pm

So you would prefer ZFS? What are the benefits and down sides?

I just want to ad this post:

arrogantrabbit · October 20, 2023, 3:30pm

I don’t want to repeat the whole discussion we had in the other topic, but to summarize:

ext4 wins slightly in perf on small datasets and resource constrained hosts. As the node grows, ext4 will choke unless you add enough ram to fit metadata, but then ZFS starts winning massively, even in a default config
with ZFS you can add an SSD as a special device to hold all metadata and small files. This offloads almost all IO from your HDD. No caching solution can come even close, especially for storj usecase: with SSD caching second and subsequent accesses are accelerated. With special device — all, including first. And with storagenode you have much more firsts than seconds.
edit: there are a number of other subtle advantages: compression by default, helps avoid wasting space on “partial sectors”, smart(er) in-ram cache implementation (see Adaptive Replacement Cache), support for snapshots and replication (moving storj datasets will be much faster with differential snapshots replication compared to the rsync approach), to name a few relevant to storagenode

In other words, when you have access to ZFS on a appliances that are not IoT, and I assume
your unraid runs on something better than a potato, choosing anything else makes no sense.

zip · October 20, 2023, 5:47pm

Have gone through ZFS briefly and it looks it is best suited for storage pools, something as a CEPH is, but not distributed. It also looks like a single drive ZFS is a no-no. This adds complexity and cost I would say, so unless OP would like to have a redundancy or a pool of drives, then I would stay with ext4 for single drive ops.

daki82 · October 20, 2023, 7:34pm

you should have learned that by now, i have tried my best…,my friend

Pascal51882 · October 21, 2023, 6:58am

Thank you for the details
Would i benefit from it without SSD Cache? My Node drives are 12-14TB big. Can I still use every drive standalone and get a performance benefit?
How komplex is the setup? Sound like more than formatting for me.

My server runs a Ryzen 5 5600G with 32GB Ram, but has 30 dockers and 2 small linux VMs.

arrogantrabbit · October 21, 2023, 5:14pm

Hard to tell. Just swapping ext4 with zfs on a single disk may provide some benefits due to more effective in-memory caching policy, some storage savings due to compression (not because of actually compressing data, but by effectively not wasting space on partial sectors), and effective data send/receive strategy if you need it in the future. SSD as a special device is what helps the most for the storagenode usecase. If you are not going to benefit from these – keep ext4. It works, right? Not broken-don’t fix.

However, if you run multiple nodes on a single host it makes very little sense to run them on separate drives: all your nodes share bandwidth for uploads and run in parallel for downloads (in storj terms). Artificially separating them into individual fixed space and performance buckets is counterproductive.

Instead, I would create a pool of of those disks, add a single (or multiple) vdev; (redundant if you want – you probably want to use the storage not just for storage nodes, every other usecase benefits from redundancy. Disks do die.). This alone will accelerate performance because now the pool can load balance the workload between multiple disks, and if you use multiple VDEVs – also between the vdevs. You will never end up in the situation when one disk is choking and others are idle.

As a next step you could add a single SSD (or a mirror) as a special device to hold metadata and small files, dramatically improving responsiveness of your storage appliance. (Note, if you do it after a pool has been receiving data for some time, to redistribute metadata from existing data you might want to do send/receive to a new dataset). If you had separate pools for each node you would need to add SSDs to every one of them. extra cost and quite pointless.

I can see the benefit if you run a node on odroid h4 or raspberry pi: you could have an HDD for data, SSD for metadata, and get a very low power super resposnive “storj appliance”, just for fun.

For the sake of precision, let’s clarify the terms here. ZFS does support SSD read cache, it’s called L2ARC, but this is not what we are discussing here. I don’t recommend using L2ARC for storage node if you can use special device instead.

L2ARC: slowly (on purpose) populating read cache, intended to accelerate random reads: ZFS L2ARC
SLOG: Special Log Device, intended to offload some of the IO associated with synchronous writes from the main pool; it’s write-only, the only time it is being read from is after power loss. Storage node does not benefit from sync writes, you can turn them off, and hence, slog is not useful. You can read more here: SLOG Screenshots
Special device: allows to keep all metadata from the pool, and optionally, small files (up to specified size smaller than the record size) on an SSD. That’s what provides dramatic performance improvement. Disks are very bad at handling small files: seek time is much larger than transfer time, they spend most of the time seeking and very little time transferring. SSD on the other hand, is every fast at random IO, but expensive to store large blobs. Special vdev allows you to have the best of both worlds: SSD-level performance for small files, and HDD level cost for large files: large files are … well… large, so the seek time becomes smaller fraction of an overall find-and-fetch time; as a result you are using each device for what they do best in the most cost effective way possible.

Alexey · October 22, 2023, 3:18am

It’s simple and reliable enough. You may also add nodes while you have resources. If one disk die, you will lose only this small amount of common data. In case of array without redundancy (which you seems would like to suggest), the whole node will be lost, i.e. all common data will be lost.
So you do want to have a redundancy. This is mean that at least one disk will be wasted for redundancy.

By the way, usually the chocking drive could froze the whole system and zfs is not a panacea, the independent disks will likely continue to work in this case unlike zfs:

And also - how easy to extend a redundant array and rebalance it in zfs (to increase performance as usually happen in the legacy RAID)?

arrogantrabbit · October 22, 2023, 3:39am

I don’t. Considering the array is used for something else, and only unused capacity are shared with storagenode, it’s assumed one already has redundantcy. Either way, the array reliability is driven by that other requirements. Node gets to use what’s available. Running node separately seems to be discouraged by storj. It’s not “unused resources” if I’m running hardware for storj that would not otherwise be running.

Array without redundancy is fine too. The probability of failure is multiplied by the number of drives and it’s still low enough, considering that losing node is 100% harmless. But this is far from the recommendation to use unused resources for the node if this is the whole bunch of disks spinning just for storj - be that separately or in the pool

ZFS will offline the disk that is experiencing timeouts. I did not look into that other thread and what was misconfigured there. I’ve lived though dying disk and ZFS simply offline it when io times started exceeding the threshold. It would have been silly if the disk dying in the most common way would bring down the whole array, don’t you think?

A few options. As a background: ZFS pool is a collection of virtual devices. Redundancy is achieved at the virtual device level, not pool level. For example, you can have 6 raidz1 vdevs consisting of say 5 disks each in a single pool. Each such device wil have 4 disks worth of space and can tolerate 1 disk failure. The pool load balances between these 6 virtual devices. The other types of virtual devices are added to the pool as needed — L2ARC, slog, special — and affect pool performance.

So, to add capacity to the pool you can either grow one of vdevs by replacing each disk there with larger one (you can’t add disks to vdevs, but you can replace them; unlike conventional raid the fault tolerance level is maintained throughout the replacement) or by adding a whole new vdev to the pool. It does not have to have the same number of disks, but it should ideally have the same fault tolerance level — because otherwise it’s silly or wasteful.

After it’s added, new data will be mostly sent to the new vdev — as it has more space then the others. If you want to rebalance all data right away — you can ZFS send|ZFS receive to a new dataset and then delete old dataset.

Alexey · October 22, 2023, 3:50am

Yes, this looked for me weird, however I did see this many times. I do not know for sure - why, but software (and some hardware) RAID is very sensitive to hardware malfunctions. Maybe something is wrongly implemented on the motherboard/controller or in OS. Sometimes it working as expected, but sometimes just freezing.
I believe @littleskunk uses Linux, but maybe this problem is solved differently in freeBSD and the chocking drive did not freeze the whole system.

arrogantrabbit · October 22, 2023, 3:53am

Possibly the controller issues: if the whole controller hangs if one of the disks stop responding — nobody can do anything. I’ll go read that thread.

Pascal51882 · October 22, 2023, 11:47am

@Alexey a node is spamming the bandwith database locked error (8GB file). I read that its okay to delete the file and let it recreate (I dont care about stats). I renamed it and the node recreated the file but the nodes keep stopping:

22.10.2023
13:41:10
Error: Error during preflight check for storagenode databases: preflight: database "bandwidth": expected schema does not match actual:   &dbschema.Schema{
22.10.2023
13:41:10
- 	Tables: []*dbschema.Table{
22.10.2023
13:41:10
- 		(
22.10.2023
13:41:10
- 			s"""
22.10.2023
13:41:10
- 			Name: bandwidth_usage
22.10.2023
13:41:10
- 			Columns:
22.10.2023
13:41:10
- 				Name: action
22.10.2023
13:41:10
- 				Type: INTEGER
22.10.2023
13:41:10
- 				Nullable: false
22.10.2023
13:41:10
- 				Default: ""
22.10.2023
13:41:10
- 				Reference: nil
22.10.2023
13:41:10
- 				Name: amount
22.10.2023
13:41:10
- 				Type: BIGINT
22.10.2023
13:41:10
- 				Nullable: false
22.10.2023
13:41:10
- 				Default: ""
22.10.2023
13:41:10
- 				Reference: nil
22.10.2023
13:41:10
- 				Name: created_at
22.10.2023
13:41:10
- 				Type: TIMESTAMP
22.10.2023
13:41:10
- 				Nullable: false
22.10.2023
13:41:10
- 			... // 12 elided lines
22.10.2023
13:41:10
- 			s"""
22.10.2023
13:41:10
- 		),
22.10.2023
13:41:10
- 		(
22.10.2023
13:41:10
- 			s"""
22.10.2023
13:41:10
- 			Name: bandwidth_usage_rollups
22.10.2023
13:41:10
- 			Columns:
22.10.2023
13:41:10
- 				Name: action
22.10.2023
13:41:10
- 				Type: INTEGER
22.10.2023
13:41:10
- 				Nullable: false
22.10.2023
13:41:10
- 				Default: ""
22.10.2023
13:41:10
- 				Reference: nil
22.10.2023
13:41:10
- 				Name: amount
22.10.2023
13:41:10
- 				Type: BIGINT
22.10.2023
13:41:10
- 				Nullable: false
22.10.2023
13:41:10
- 				Default: ""
22.10.2023
13:41:10
- 				Reference: nil
22.10.2023
13:41:10
- 				Name: interval_start
22.10.2023
13:41:10
- 				Type: TIMESTAMP
22.10.2023
13:41:10
- 				Nullable: false
22.10.2023
13:41:10
- 			... // 12 elided lines
22.10.2023
13:41:10
- 			s"""
22.10.2023
13:41:10
- 		),
22.10.2023
13:41:10
- 	},
22.10.2023
13:41:10
+ 	Tables: nil,
22.10.2023
13:41:10
- 	Indexes: []*dbschema.Index{
22.10.2023
13:41:10
- 		s`Index<Table: bandwidth_usage, Name: idx_bandwidth_usage_created, Columns: created_at, Unique: false, Partial: "">`,
22.10.2023
13:41:10
- 		s`Index<Table: bandwidth_usage, Name: idx_bandwidth_usage_satellite, Columns: satellite_id, Unique: false, Partial: "">`,
22.10.2023
13:41:10
- 	},
22.10.2023
13:41:10
+ 	Indexes:   nil,
22.10.2023
13:41:10
  	Sequences: nil,
22.10.2023
13:41:10
  }
22.10.2023
13:41:10
22.10.2023
13:41:10
	storj.io/storj/storagenode/storagenodedb.(*DB).preflight:429
22.10.2023
13:41:10
	storj.io/storj/storagenode/storagenodedb.(*DB).Preflight:376
22.10.2023
13:41:10
	main.cmdRun:110
22.10.2023
13:41:10
	main.newRunCmd.func1:32
22.10.2023
13:41:10
	storj.io/private/process.cleanup.func1.4:402
22.10.2023
13:41:10
	storj.io/private/process.cleanup.func1:420
22.10.2023
13:41:10
	github.com/spf13/cobra.(*Command).execute:852
22.10.2023
13:41:10
	github.com/spf13/cobra.(*Command).ExecuteC:960
22.10.2023
13:41:10
	github.com/spf13/cobra.(*Command).Execute:897
22.10.2023
13:41:10
	storj.io/private/process.ExecWithCustomOptions:113
22.10.2023
13:41:10
	main.main:30
22.10.2023
13:41:10
	runtime.main:250
22.10.2023
13:41:10
2023-10-22 13:41:10,441 INFO exited: storagenode (exit status 1; not expected)
22.10.2023
13:41:11
2023-10-22 13:41:11,442 INFO waiting for processes-exit-eventlistener, storagenode-updater to die
22.10.2023
13:41:12
2023-10-22 13:41:12,443 WARN received SIGQUIT indicating exit request

Edit:
Nevermind fixed it with your guide

Pascal51882 · November 1, 2023, 11:05am

@Alexey I made a stupid mistake with the last node. I terminated the node for the last --delete run. I executed the command and changed the path in the Unraid Docker GUI. Instead of Save I pressed apply which started the node. It ran for 3-4 minutes. How bad is this for the node? Can I clean this up? The node is now ready and I don’t see any error messages for now.
Thank you for your help