High I/O during node startup

Egaria · October 4, 2023, 8:19pm

Hello, I noticed that the I/O was high for 2-3 days when restarting the node, this includes when an update is deployed since it restarts the node.

I would like to know if there is a possibility to counter this problem, because between 1.87.1 and 1.88.3 it took around 20 days and due to these updates the I/O was at full tilt for around 6 days…

Thanks.

arrogantrabbit · October 4, 2023, 8:23pm

This is file walker enumerating files on start.

If you have enough ram, this will take about an hour, (and if you have SSD caching or tiered storage, – minutes) not days, and is a good thing: it pre-warms the in-ram metadata caches and as a result your node becomes more responsive, wins more races, and you earn more money.

If you don’t have enough ram to fit metadata blocks in their entirety, or any other solution that can speed up random metadata fetches (SSD cache, tiered storage, special device on ZFS) then file walker does not provide performance benefit and is net negative; you can disable it in the configuration file. Your node will still be waisting time on metadata fetches and losing more races than necessary, but you’ll avoid this startup spike.

I don’t recommend doing so, instead, improve performance of your disk IO subsystem. For example, if you have at least 8GB of ram, switch to ZFS, even with one disk, and add SSD as a metadata special device. This will decimate IO hitting the disk and help win more races.

Egaria · October 4, 2023, 8:32pm

Adding an SSD drive will be complicated, but in terms of RAM it only uses around 300MB out of the 2GB. On the graph you can see the times when it has finished enumerating the files, and the RAM is not moving.

arrogantrabbit · October 4, 2023, 8:57pm

It’s not about what node is using, but instead what is available on the system unused: that’s what the system can use for the disk cache. If you have total 16GB and all processes + system use 4GB – then 12 GB is available to accelerate disk access.

This of course depends on the OS – some OSes are better than others in implementing the disk caching.

Egaria · October 4, 2023, 9:00pm

I don’t really understand what you mean.
On the system there is only one node running.
On the other hand, I just noticed that there was around 300MB of “used” RAM but 1.5GB of “cache + buffer” ram, is that what you’re talking to me about?

This VM with 2 cores (3.5ghz) and 2GB of ram

arrogantrabbit · October 4, 2023, 9:36pm

Yes. Each file access involves metadata lookup. This is extra IO. OS tried to cache filesystem data, including metadata, in available ram, so the next time the same file needs to be accessed — no disk IO will be required to fetch metadata.

Describe your setup. Is filesystem running in the VM or are you using NFS or other network protocols? Why are you using vm and not, say, container, or run natively, so that each app benefits from a single host managed cache?

If filesystem runs in VM it has vastly insufficient amount of ram to cache any significant amount of metadata and this explains why filewalker runs forever: most of accesses resultl in cache misses and require roundtrip to HDD.

For comparison, my node of about 9TB spends under 10 min scanning on start, with most IO at the beginning, first minute or two, and then subsiding to almost zero, as more and more pages are ingested into cache. In fact, the process is CPU bound, not disk IO, because there is no to little actual IO generated to HDDs— that host has metadata stored on an SSD and there is about 20GB of free ram available.

Egaria · October 5, 2023, 3:52pm

So from what I understand, if I increase the RAM, it will go faster.
I will go from 2GB to 30GB to see a change appear.

The node is run in a docker container on my VM.