Badger cache filewalker test results

EasyRhino · August 21, 2024, 4:42pm

wait does that log meant that the garbage collection for 1 satellite took 25 hours?

Julio · August 21, 2024, 5:15pm

Yup… so if you’re keeping score in total that’s about 58 hours, 58 minutes, to move garbage which then will need another run to delete it in 7 days. And it’s only probably about 116GB of data at that. That retain rate is about 47 GIGs per day, so it’d take about 1 month to clear ~1,4TB. Good thing TTL (SLC) is thing, or it’d never even cycle before it all expired.

2 potatoes

jammerdan · August 22, 2024, 2:16am

Retain is limited to 1 concurrent process: https://review.dev.storj.io/c/storj/storj/+/13081 it might be that with badger cache on more than one retain could run in parallel and speed things up.

And also there is this setting in config.yaml:

# how many piece delete workers
# storage2.delete-workers: 1

It sounds like this setting can change the number of parallel deletions. This might increase the speed of deletion too.

Julio · August 22, 2024, 4:51am

yeah it sure might in crease speed, and/or put more parallel IO pressure, to make it more efficient… to a point, then less.

EasyRhino · August 22, 2024, 8:59pm

I thought I would share my numbers, I set up the badger on my most problematic, sad node.

Node setup: single core, only 1GB RAM (and 1 GB swap), and the storage is mounted over NFS. Yes, Alexey has already scolded me for this. This node had occassional problems of filewalkers not finishing with “context canceled”, or just taking days and days to run and not finishing before a restart.

I used docker and set up the badger cache so it was on SSD, (just like the storj db files were already on SSD). The data size for storj is about 11TB.

The node seemed to struggle under the increased load. It seems to restart sometimes (the “online” time in the dashboard resets, and the filewalkers, if previously running, start again without a completion or failure message in log. I’m not sure what to look for to see why the restart is happening.

But while the filewalkers are running and the node is still operating, the filewalkers running, the badger cache is being built, uploads still uplaoding… the node seems to get ram constrained, in that the swap file fills up.

At the moment the node is in more of a steady state. There is still a used space filewalker running but the badger cache is more built up and the disk is full so no more uploads. And now the system shows about 500MB of swap occupied but also 500MB of ram free. Oh and docker stats shows 330MB used by storagenode, but I’ve seen over 500MB and maybe 800MB when things were busier).

Size-wise, my badger directory is just over 900MB. That’s with 11TB of storj data, although a used space filewlaker is still running for US1. (correction: 1.7GB after all filewalkers done)

Reliability-wise, I have the spontaneous reboot issue, which may or may not be new, and also still have had filewalkers fail for context canceled, which is definitely an old issue.

Subjectively I definitely see more activity on the SSD hosting the badger cache. From virtually none to some pretty signficant usage, often in short spikes.

Also, when the filewalker is running and using badger cached info, it seems like it actually works the storage array more. The hard drive with the data and the cache SSD with the metadata are showing more transactions and higher busy %. It’s almost like running four non-badger filewalkers .

Oh, and time, the most important part.
Before, running used space filewalker for SLC took… really long? I wasn’t even able to get a single run to finish it. But at least 16 hours. At least.

After the badger is built, a used space filewalker for SLC takes 1 hour.

So TL;DR, the cache hasn’t really made the node more reliable, but the filewalkers are now so much faster, now they actually have a chance to finish before an error.

Roxor · August 22, 2024, 9:47pm

Compared to sometimes running for days, or not completing at all: that’s a huge improvement!

Alexey · August 23, 2024, 4:03am

If it’s working in your configuration - I have no objections. Just 1GB of RAM is too small for any network filesystem and storagenode. Because storagenode would buffer all writes to the ram, if the disk subsystem cannot keep up, and NFS unfortunately provokes this. For NFS you need at least 4GB of RAM to be stable. Also depends on NFS configuration on both, the server and the client. But seems you already did correctly, since your node is still alive, just too low amount of RAM.

I guess that either because of OOM (you may search in journalctl for OOM) or because of failed readability/writeability check (you may search for Unrecoverable and/or FATAL in your logs).

This is a good result, thank you! I didn’t expect that with NFS it could use so low amount of RAM.

EasyRhino · August 23, 2024, 6:35am

ah that explain what’s I’ve seen if the NFS connection goes down. the storagenode starts buffering, the docker runs out of ram, and then the whole node sort of runs out of ram and almost becomes unusable. Often requires a full reboot. Of course, if the NFS connection goes down, then no amount of RAM will save the node. If only someone had warned against using NFS mounts

Alexey · August 23, 2024, 6:57am

Yes, but in most setups NFS provokes more high RAM usage: Search results for 'memory usage #nfs order:latest' - Storj Community Forum (official)
So, your setup seems a little better (it could be possible that you used a separate network for NFS shares).

here we go: Step 1. Understand Prerequisites - Storj Docs

EasyRhino · August 23, 2024, 4:12pm

Good guess, journalctl shows this OOM thingy:

kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=init.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-fbf712c8388ece9816713d7808d15995045edb6b5e38da9512b68092b819493f.scope,task=storagenode,pid=38792,uid=0

Do you know, is this the whole OS running out of RAM, or is it controlled by the limit I set in docker? (I had set a 800M limit) It feels like it’s OS level but I’m pretty ignorant on this stuff.

Julio · August 23, 2024, 11:01pm

try this config option:
filestore.write-buffer-size: 256.0 KiB
…or 128 KiB, whatever.
And/Or:
storage2.max-concurrent-requests: 5
… or 50, whatever.
default: 0 (infinite, or 1000 probably)

So your node isn’t forced to max out and go Boom!

2 cents

lyoth · August 24, 2024, 4:13am

This is on a 19.5TB node, initial filewalk with badger took like 5+ days, it had multiple restart due to update, so didn’t have the initial filewalk total time. This is after badger cache, and it only took a little less than 2 hours to finish

2024-08-23T20:49:18Z    INFO    pieces  used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-08-23T20:52:42Z    INFO    pieces  used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "Total Pieces Size": 979614147840, "Total Pieces Content Size": 978551361792}
2024-08-23T20:52:42Z    INFO    pieces  used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-08-23T20:53:36Z    INFO    pieces  used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "Total Pieces Size": 192093495552, "Total Pieces Content Size": 191733043456}
2024-08-23T20:53:36Z    INFO    pieces  used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-08-23T22:10:19Z    INFO    pieces  used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Lazy File Walker": false, "Total Pieces Size": 11891298817280, "Total Pieces Content Size": 11869725311744}
2024-08-23T22:10:19Z    INFO    pieces  used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-08-23T22:42:52Z    INFO    pieces  used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "Total Pieces Size": 5919990784810, "Total Pieces Content Size": 5902267177258}

Alexey · August 24, 2024, 11:58am

It could be both. But if you have OOM (which I will expect with NFS and only 1GB of RAM), then the OS perhaps do not have even 800MB free (not all OSes are equal…).
Theoretically, the limit should advertise to the container only this amount of available RAM and the application could consider this (run a GC more often if it’s close to the limit, or evict some buffers, etc.), so the application could manage, how much RAM it actually can take… Not sure that this is implemented in storagenode…

Alexey · August 24, 2024, 12:02pm

It’s infinite by default (0).

Alexey · August 24, 2024, 12:04pm

Thanks, you just confirmed my assume, that even if the node cannot manage to finish the filewalker, with a badger cache it would be able to finish it actually after several restarts (because every time it would move further).

Vadim · August 24, 2024, 6:34pm

one thing that I see that after badger cache activating windows use big amount of RAM, i think it caching this files, so it indirect use.

it is Server with 17 nodes.

Alexey · August 25, 2024, 3:55am

It’s also do the same for the docker node:

comment                  CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O   PIDS
badger=true, lazy=false  d8aab63f299d   storagenode2   10.15%    1.135GiB / 24.81GiB   4.57%     251GB / 102GB     0B / 0B     93
badger=false, lazy=true  3d20fef76e67   storagenode5   7.08%     136MiB / 24.81GiB     0.54%     54.9GB / 21.9GB   0B / 0B     86

$ free
              total        used        free      shared  buff/cache   available
Mem:       26010664     2481748    12575016       11488    10953900    23109976
Swap:       7340032           0     7340032

However, seems it’s not like so for the memory constrained systems, like @snorkel’s one or @jammerdan’s. But I want to reverify it when their systems would finish the used-space-filewalker with the badger cache enabled. Would be interesting to see a memory and CPU footprint. I would expect that it should become normal when all scans are completed.

EasyRhino · August 25, 2024, 5:09am

Correcting myself, my badger cache for 11TB of data is about 1.7GB once I finally for sure finished all filewalkers.

jammerdan · August 28, 2024, 12:05pm

The badger cache seems to be running but I cannot spot a log entry associated with it.
Am I missing something?

Vadim · August 28, 2024, 12:08pm

log level Info? there is not a lot go to log about budger itself