Badger cache: are we ready?

andrew2.hart · August 24, 2024, 2:34pm

I see the previous poster’s point…with the big slow nodes, it takes several runs to get a full badger cache and get through a walker. And a windows user reported than he had corrupted badger cache after windows restarts.
It could be nice to protect the work done by splitting it by satellites and by folder so that a corruption doesn’t lose everything and it could be nice if the code said cache is corrupted, so regenerate, rather than crash…
I think that while it is experimental and optional it is ok but there are some good points here?

Alexey · August 25, 2024, 3:14am

This need to be investigated. I wouldn’t expect it to be corrupted due to a normal restart.

By the way, I accidentally found a solution for the docker node when the badger cache is enabled: you shouldn’t enable the badger cache in the config.yaml file, you need to use a command line option

after the image name, so when you exec the key issue command in the docker container, it didn’t try to lock a filecache, because it’s not enabled in the config.
This also explains why it works for my Docker node , because I enabled it exactly like that, because it’s more convenient if you use a docker compose:

services:                                                                                      (3/3 results) [1190/2149]
  storagenode2:
    container_name: storagenode2
    restart: always
    stop_grace_period: 300s
    image: storjlabs/storagenode:latest
...
    command:
      - --pieces.enable-lazy-filewalker=false
      - --pieces.file-stat-cache=badger
    #   - --healthcheck.details=true
    #   - "--operator.wallet-features=zksync"

Dr.Ko · August 25, 2024, 7:20am

Since I use commands, can you check if I wrote the commands correctly? The -p and -e parts have been omitted.

sudo docker run -d --restart unless-stopped --stop-timeout 300 \
    --log-driver json-file \
    --log-opt max-size=10m \
    --log-opt max-file=5 \
    --mount type=bind,source="/volume2/storj/identity",destination=/app/identity \
    --mount type=bind,source="/volume2/storj/data",destination=/app/config \
    --mount type=bind,source="/volume2/storj/data/node.log",destination=/app/logs/node.log \
    --name storagenode storjlabs/storagenode:latest \
    --pieces.enable-lazy-filewalker=false \
    --pieces.file-stat-cache=badger \
    --log.output=/app/logs/node.log \
    --server.address=":28967" \
    --console.address=":14002" \
    --debug.addr=":5999" \
    --log.level=error \
    --filestore.write-buffer-size 4MiB \
    --pieces.write-prealloc-size 4MiB \
    --storage2.piece-scan-on-startup=true \
    --operator.wallet-features=zksync-era,zksync

Alexey · August 25, 2024, 7:31am

Yes, all looks correct. However, this one:

should be changed to

--operator.wallet-features=zksync-era

because zkSync Lite doesn’t supported any longer:

Also,

   --pieces.write-prealloc-size memory.Size                   deprecated (default 4.0 MiB)

Dr.Ko · August 25, 2024, 8:17am

Oh right! I should have changed it, thank you! Glad you got the rest correct. Thanks Alexey!

ItsHass · August 26, 2024, 5:21am

What is the expected size of badger cache directory ??
Cheers

JWvdV · August 26, 2024, 5:34am

150-200MB per stored TB

agente · August 26, 2024, 6:59am

Badger ON and fully loaded.
I usually have 3/4 hours for a full used space on saltlake. I noticed that the GC took much longer.

Alexey · August 26, 2024, 8:08am

perhaps the GC processed a bigger BF? I saw a reduction in a processing time, not much, but still.

JWvdV · August 26, 2024, 4:22pm

GC is also using the badger cache? Thought it would only be used for the used space?

Alexey · August 27, 2024, 4:17am

It is used in any filewalker, which may request stat. However, it unlikely would speed up a move or deletion. Only the stat requests (to calculate the used space, filter by a creation date, etc.). So, anyway it would process a little bit faster, see my results there:

agente · August 27, 2024, 7:46am

ext4? post must be 40 characters

Alexey · August 27, 2024, 7:55am

Worse, NTFS over 9p network protocol (WSL2).

snorkel · September 8, 2024, 7:51pm

Is the badger cache accessed/updated only on startup-piece-scan?
Or is it continuos updated with each piece stored or deleted?
I want to know if it’s chewing on the SSD…
Regarding Filewalker, dose it scan only the blobs? Or the trash too?

Alexey · September 9, 2024, 6:59am

It’s continuous update with each piece requested, if it’s was missing. So, not so much help for deletions, only the stat requests, which are pretty fast anyway for the deletion, but the deletion itself is a slow function on any OS and seems any FS, even with ZFS, see Unlink performance on FreeBSD.

both. If you would call an endpoint /mon/ps on a debug port, you can see that.

jammerdan · September 9, 2024, 7:15am

Wait a minute…, does it mean we could have badger enabled but used space file walker off and still have maybe a little bit but growing benefit of the cache?

Alexey · September 9, 2024, 7:21am

Yes, that’s the point. The cache is used for every request to the piece store and updated accordingly. So, it improves not only the used-space-filewalker duration, but also almost every other filewalkers as well.
Not so significant change like for used-space-filewalker, especially for a deletion duration, but it also can speedup a garbage collection too. Not sure is there any change in TTL collector, but it should be (I just didn’t check).
Also, the download requests should be processed a little bit faster, but I do not have numbers and have no idea how to compare.

jammerdan · September 9, 2024, 7:49am

well then, let’s see how this will do…

snorkel · September 9, 2024, 12:14pm

This means that if you keep it on SSD, it will burn through that SSD very fast.
I think I will generate the badger cache on SSD and than move it on the storage HDD.
I hope it will not run into some IOPS bottleneck and crash.

Vadim · September 9, 2024, 1:11pm

If you have enough RAM cache will mostly run on RAM