Metadata cache ext4 (RAM maxed)

agente · May 15, 2024, 2:31pm

Hi, I maxed out my system RAM and is not enough for my nodes. Less than 1gb per TB.
I have ext4 in a Ubuntu 22 server. How can I add a 4tb SSD specialized for metadata caching? First cache in RAM then SSD. My system is slowing down with this massive gargabe deleting

striker43 · May 15, 2024, 2:52pm

You could create a swap partition on your SSD, but I guess you would get better results with something like LVM.

Roxor · May 15, 2024, 3:09pm

This article on metadata devices is where a lot of people start (and will give you idea on sizing for Storj nodes).

Consumer motherboards have been capped at 128GB RAM forever - it does feel easier to start using some SSD space for speed rather than switch to EPYC/Xeon/ThreadRipper etc. Good Luck!

agente · May 15, 2024, 3:50pm

Is all about ZFS or am I wrong? I’m trying to add a metadata cache on ext4

Ambifacient · May 15, 2024, 4:07pm

I don’t think you can directly control what content is cached in RAM in ext4 as you can do in ZFS. Indirectly you can tune the vfs_cache_pressure parameter.

As mentioned earlier someone used LVM to achieve storing metadata on SSD: Ext4 speedup by storing metadata and data on separate devices

Roxor · May 15, 2024, 4:16pm

My fault: I thought you wanted to switch to something that had metadata devices (like ZFS). I’m not sure what you could do if you stayed on ext4.

agente · May 15, 2024, 4:16pm

I have vfs pressure at 10 (seems optimal for metadata priority) but I don’t have enough RAM as I said and I cannot install more. So I need to implement a disk metadata cache. Bcache? someone is using it for storj?

Roxor · May 15, 2024, 4:21pm

Have you seen this?

agente · May 15, 2024, 4:38pm

Yes. Read entire post but I cannot move from ext4

lyoth · May 15, 2024, 8:33pm

I recommend lvm with ext4. I set mine up with 28GB ssd write and read cache.

The write cache gives you a buffer, and it slowly writes to HDD when the HDD has some spare io.
Disk metadata is cached as well, it takes me an hour to lazy filewalk on a 4-5TB node.

Toyoo · May 15, 2024, 11:42pm

4 TB is, uh, expensive. But it’s your choice. Unless you have >50 TB worth of node data, a 480 GB SSD should be enough.

The usual tools are: LVMcache and bcache. Of those two I prefer the former—I used to be a fan of bcache, it’s faster, but it seems it doesn’t handle unclean shutdowns well. And if you are already using LVM, adding an LVMcache is easier.

In LVMcache choose the cache mode.
In bcache choose the writethrough mode—it’s slower but more reliable and good enough for nodes.

If you are not using LVM already, implementing either of these will be difficult. If you are not comfortable with careful partitioning setup, I would advise against trying to do it in place and instead copy data somewhere else, then set up LVM and caching, and only then copy data back.

agente · May 16, 2024, 4:30pm

I don’t use lvm unfortunately. I think I need to start e new setup with lvm and lvm cache. I have a lot more of 50tb…
It will be a nightmare thanks for help.
I will consider xfs too…

Btw for all of you… don’t let your nodes to grow up more than your ram can manage. 2gb per tb is the minimum

Ambifacient · May 16, 2024, 5:01pm

You can also play with ZFS and either special devices for storing metadata on SSD, or L2ARC cache for caching metadata on SSDs. L2ARC can even be persistent between reboots and can be hot removed without impacting the array or single disk ZFS setups.

lyoth · May 16, 2024, 5:30pm

LVMcache is still a better compared to ZFS for storagenode. Everything can be removed and added without any down time. Cache persist between reboot as well. Also benefit of getting near max speed when upgrading drives.
With ZFS once you add a special devices, you can’t remove it. You would need to create a new ZFS and transfer the files over.

ACarneiro · May 16, 2024, 8:03pm

That much? I have nodes around the 10TB mark and they are really not using much more than about 1 or 2 GB of RAM at any one time?

agente · May 16, 2024, 8:10pm

container itself yes… but if you want to store metadata in RAM for speedup FW, GC… you need more. much more!

agente · May 22, 2024, 9:50am

I’m playing a little with LVMcache. I have 6 spare 2.5 sata hdd in raid1+0 and I created everything and activated lvmcache in cache mode (4.9T in total… so chunksize set to 8M). I attached a new hdd for testing and everything worked. My question is… if I have a node in sdc simply formatted in ext4 with data, can I attach it to vg without losing data? Or I need to move everything in another hdd and back after attached?
Thanks! we are trying to tuning our nodes

Toyoo · May 22, 2024, 6:43pm

It is in theory possible to add LVM in-place, but it’s rather dangerous. It’s much safer to make a full copy.

lyoth · May 22, 2024, 8:19pm

You can create the new LVM, and then use

partclone.ext4

to clone regular partition to the LVM. I did this and it works fine.
Requires down time, but still faster than doing a full copy.

Alexey · May 25, 2024, 8:40am

Yes… but… it’s extremely dangerous.