Badger cache filewalker test results

What’s the needed version for this bagder again?
What size would be the database for 21TB of data?

I think that no one know yet, I have only data for 0.9 TB the whole map size of cache map.

What if the node is restarted during the initial pass? What will happen to caching? Will the progress made be reset or saved? The next time the node starts, will caching continue from where it stopped or will it start from 0?

1 Like

I think you’re conflating the used-space filewalker with the cache.

Basically if the node needs the metadata of a piece, it will check the cache. If it is there, use it, otherwise go check the filesystem/disk and then store it in the cache.

So if the used-space filewalker is 50% done and the node restarts, it should be able to get through the first 50% quickly by using the cache, and then read metadata from the disk for the remaining 50% to fill the cache.

2 Likes

I am seeing around ~3GB for 12TB of data, so I’d estimate around 6GB for 21TB, but could be more. 10GB is probably safe.

But this is just an assumption. Has anyone checked how this will work in practice?

Try running the badger cache without running used-space, you’ll see over time the cache grows without a file walker.

I will move one of my 16TB nodes tomorrow and will give it a try.

What is the name of the file in which this cache is stored?

it looks like this, it not 1 file.

Wait what?? There is no recalulation, deletes, updates, shrinks on the badger db? It just grows and grows?

Cache is populated without a filewalker running (ie normal usage), hence “pre-warmed”.

So it accounts also for deletes? Meaning it deletes the db recordings of a piece that was deleted?

It should. I’m thinking of using badger if we get a custom directory to save it in (ie on SSD) + leaving the node running for a couple of weeks to get the cache filled up by normal usage + gc, then run the used-space filewalker. That’s the ideal usage for me.

The path should be configurable. It would be stupid if it’s hardcoded to data dir. The main reason we use custom paths for databases is to avoid lockups and db malformats. I get this would be no different.
But, thinking about it, I wonder what are the IOPS for the badger db? Could a USB 3.0 stick coup with them?

1 Like

Yeah, but databases were in the storage directory (inconvenient choice), so you couldn’t bind another path there. As far as I understood from other topics, there is a dedicated directory for this cache meaning that you can just add a specific bind/mount on that path.

See up here: Budger cache filewalker test results - #4 by DisaSoft

Don’t worry. We can slap on a BadgerDB index to cache the BadgerDB index files.

6 Likes

Yeah, before we’re going to call it a butcher cache.

1 Like

@elek information mostly for you.
Ok I have bene abled to kill budger cache. one abrupted restart, and all done.
Node not starting, in logs only 2 rows that wallet and email readed from conf, thats all.
Renamed cache map and node started, generated new map filestatcache

4 Likes

Here are some results from today’s SLC bloom filters. All of the following nodes have had the badger cache warmed up. Some of these nodes were also clearing trash during the GC process. All of these are on XFS, no special filesystem metadata sauce.

Node 1: Walked 14M pieces trashing 250k pieces in 2 hours.

Node 2: Walked 18M pieces trashing 250k pieces in 5.25 hours.

Node 3: Walked 15M pieces trashing 320k pieces in 6 hours.

Node 4: Walked 17M pieces trashing 280k pieces in 2.5 hours.

Node 5: Walked 12M pieces trashing 8k pieces in 5.5 hours. Low trashing as I have stopped ingress for this node.

Overall quite pleased, even though GC walker doesn’t query the metadata for every piece.

4 Likes