Tuning EXT4 vfs_cache_pressure

agente · May 6, 2024, 5:09pm

We are all in the middle of a mess. GC here… trash cleanup there… growing ingress data…
I want to talk here about vfs_cache_pressure. I think an important setting after all tunings discussed on filesystem.
I tried 100… 80… never gone less (afraid of making a mistake). Someone have done some test?

arrogantrabbit · May 6, 2024, 5:16pm

100 is a default.

What issue are you seeing that you are trying to solve? Memory starvation? Reducing the value will make it worse.

agente · May 6, 2024, 5:56pm

Optimize ram for storj. I assumed that the server will only be used as a storj node.
I want the ram to be used almost primarily for metadata and not file cache.

andrew2.hart · August 2, 2024, 7:46am

I’ve tried vfs_cache_pressure at 10 for a few weeks, now changed to 5. I’m not sure I can see any difference

EasyRhino · August 2, 2024, 9:03pm

I only have one ext4 drive, and i’m migrating it to zfs.

But messing with vfs_cache pressure didn’t seem to make it use any more memory for cache or buffer.

For ext4:

mounting with noatime is definitely easy to do and should help performance, and is fairly harmless.

formatting with 128byte inode is worth considering, but I haven’t tried it.

possibly mounting with data=writeback instead of data=journaled might help performance, at the risk of corruption in a disk outage. Doesn’t seem to help with read activity (like a filewalker)

Other folks have worked on setting up bcache or lvmcache with a SSD paired with a ext4 drive.

andrew2.hart · August 4, 2024, 7:56am

After a couple of days running with vfs_cache_pressure on 5
time du -hs /node5/node5/storage/*
1.1T /node5/node5/storage/blobs
40M /node5/node5/storage/piece_expiration.db
148G /node5/node5/storage/trash
…
real 12m8.094s
user 0m4.851s
sys 0m41.259s

There’s something wrong with those figures

/node5/node5/storage/trash# du -hs */*
3.8G pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/2024-07-28
16G pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/2024-08-01
70G pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/2024-08-03
647M qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/2024-07-30
20G ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-07-28
10G ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-07-29
8.2G ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-07-31
7.6G ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-08-02
3.7G v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/2024-07-28
2.0G v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/2024-07-30
3.4G v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/2024-08-01
3.5G v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/2024-08-04

Alexey · August 4, 2024, 8:03am

with -h you would get a binary measure units, so 1.1T is 1.1TiB and this is 1.21TB in SI, used on the dashboard, 148G is 148GiB and this is 158.91GB in SI. Next time I would suggest to use --si instead of -h.
However, if you running 1.109, it again has issues with updating databases on delete.

In short - you need to enable the scan on startup if you disabled it (it’s enabled by default) and restart the node to correct the usage in the databases.
Right now to reduce a time for the filewalkers you may enable a badger experimental feature:

andrew2.hart · August 4, 2024, 9:42am

I understand that, thanks. I have 148GiB of trash and the dashboard shows 41.52GB because 1.109 is not updating the databases.

So, 12 minutes to du the whole 1.1TiB plus .14TiB trash seems quite quick enough. Except I did notice ssh being clunky after the du run, like it was swapping, so I think vfs_cache_pressure=5 is too low, possibly

agente · August 4, 2024, 9:48am

I’m using pressure=1 for a long time now. No problem at all.

andrew2.hart · August 4, 2024, 9:49am

How much RAM do you need to run at =1?

agente · August 4, 2024, 2:36pm

I don’t think is relevant. Is not 0. With pressure=0 you MUST have enough RAM for everything.
RAM needs depend if you gave one node or multiple nodes

andrew2.hart · August 5, 2024, 3:48pm

I wanted to add a point since it might help someone debug something. The trash figure is being updated ( I have not restarted the node )

So I think/propose that the trash is being removed from the trash folders and the db is updating there but when the data is moved to the trash folders it is not updating.
The node has uptime 154h 50m so it hasn’t restarted itself either.
I wonder if the trash total will go negative?
Let me know if there is anything I can test for you, sorry I can’t code

Alexey · August 6, 2024, 5:04am

I think at the moment, we can do nothing except restart and allow the filewalker to fix a discrepancy. I would hope that the next release has a fix for the trash updates too.