Badger cache: are we ready?

I see the previous poster’s point…with the big slow nodes, it takes several runs to get a full badger cache and get through a walker. And a windows user reported than he had corrupted badger cache after windows restarts.
It could be nice to protect the work done by splitting it by satellites and by folder so that a corruption doesn’t lose everything and it could be nice if the code said cache is corrupted, so regenerate, rather than crash…
I think that while it is experimental and optional it is ok but there are some good points here?

5 Likes

This need to be investigated. I wouldn’t expect it to be corrupted due to a normal restart.

By the way, I accidentally found a solution for the docker node when the badger cache is enabled: you shouldn’t enable the badger cache in the config.yaml file, you need to use a command line option

after the image name, so when you exec the key issue command in the docker container, it didn’t try to lock a filecache, because it’s not enabled in the config.
This also explains why it works for my Docker node :joy: , because I enabled it exactly like that, because it’s more convenient if you use a docker compose:

services:                                                                                      (3/3 results) [1190/2149]
  storagenode2:
    container_name: storagenode2
    restart: always
    stop_grace_period: 300s
    image: storjlabs/storagenode:latest
...
    command:
      - --pieces.enable-lazy-filewalker=false
      - --pieces.file-stat-cache=badger
    #   - --healthcheck.details=true
    #   - "--operator.wallet-features=zksync"
1 Like

Since I use commands, can you check if I wrote the commands correctly? The -p and -e parts have been omitted.

sudo docker run -d --restart unless-stopped --stop-timeout 300 \
    --log-driver json-file \
    --log-opt max-size=10m \
    --log-opt max-file=5 \
    --mount type=bind,source="/volume2/storj/identity",destination=/app/identity \
    --mount type=bind,source="/volume2/storj/data",destination=/app/config \
    --mount type=bind,source="/volume2/storj/data/node.log",destination=/app/logs/node.log \
    --name storagenode storjlabs/storagenode:latest \
    --pieces.enable-lazy-filewalker=false \
    --pieces.file-stat-cache=badger \
    --log.output=/app/logs/node.log \
    --server.address=":28967" \
    --console.address=":14002" \
    --debug.addr=":5999" \
    --log.level=error \
    --filestore.write-buffer-size 4MiB \
    --pieces.write-prealloc-size 4MiB \
    --storage2.piece-scan-on-startup=true \
    --operator.wallet-features=zksync-era,zksync

Yes, all looks correct. However, this one:

should be changed to

--operator.wallet-features=zksync-era

because zkSync Lite doesn’t supported any longer:

Also,

   --pieces.write-prealloc-size memory.Size                   deprecated (default 4.0 MiB)
1 Like

Oh right! I should have changed it, thank you! Glad you got the rest correct. Thanks Alexey!

2 Likes

What is the expected size of badger cache directory ??
Cheers

150-200MB per stored TB

Badger ON and fully loaded.
I usually have 3/4 hours for a full used space on saltlake. I noticed that the GC took much longer.

perhaps the GC processed a bigger BF? I saw a reduction in a processing time, not much, but still.

GC is also using the badger cache? Thought it would only be used for the used space?

It is used in any filewalker, which may request stat. However, it unlikely would speed up a move or deletion. Only the stat requests (to calculate the used space, filter by a creation date, etc.). So, anyway it would process a little bit faster, see my results there:

ext4? post must be 40 characters

Worse, NTFS over 9p network protocol (WSL2).

Is the badger cache accessed/updated only on startup-piece-scan?
Or is it continuos updated with each piece stored or deleted?
I want to know if it’s chewing on the SSD…
Regarding Filewalker, dose it scan only the blobs? Or the trash too?

It’s continuous update with each piece requested, if it’s was missing. So, not so much help for deletions, only the stat requests, which are pretty fast anyway for the deletion, but the deletion itself is a slow function on any OS and seems any FS, even with ZFS, see Unlink performance on FreeBSD.

both. If you would call an endpoint /mon/ps on a debug port, you can see that.

Wait a minute…, does it mean we could have badger enabled but used space file walker off and still have maybe a little bit but growing benefit of the cache?

Yes, that’s the point. The cache is used for every request to the piece store and updated accordingly. So, it improves not only the used-space-filewalker duration, but also almost every other filewalkers as well.
Not so significant change like for used-space-filewalker, especially for a deletion duration, but it also can speedup a garbage collection too. Not sure is there any change in TTL collector, but it should be (I just didn’t check).
Also, the download requests should be processed a little bit faster, but I do not have numbers and have no idea how to compare.

well then, let’s see how this will do…

1 Like

This means that if you keep it on SSD, it will burn through that SSD very fast.
I think I will generate the badger cache on SSD and than move it on the storage HDD.
I hope it will not run into some IOPS bottleneck and crash.

If you have enough RAM cache will mostly run on RAM