It could log at startup that it is activated. I did not see anything about it.
When I briefly had my log level at DEBUG I indeed think I saw messages about the file stat cache. But I donāt usually run at debug.
i just look to see if the size of the badger db is growing, and can look for activity on the separate SSD I have the cache living on (mounted separate via docker)
So just this afternoon, I had two different nodes which were using the badger cache throw errors that prevented the whole node from working.
Not sure what was going on special, except the nodes had been doing a lot of garbage collection
2024-08-28T13:07:09-07:00 ERROR db.filestatcache Received err: file does not exist for table 350. Cleaning up... {"Process": "storagenode"}
2024-08-28T13:07:09-07:00 ERROR failure during run {"Process": "storagenode", "error": "Error opening database on storagenode: file does not exist for table 350\n\tstorj.io/storj/storagenode/storagenodedb.cachedBlobstore:231\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:250\n\tmain.cmdRun:67\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:271", "errorVerbose": "Error opening database on storagenode: file does not exist for table 350\n\tstorj.io/storj/storagenode/storagenodedb.cachedBlobstore:231\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:250\n\tmain.cmdRun:67\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:271\n\tmain.cmdRun:69\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:271"}
Error: Error opening database on storagenode: file does not exist for table 350
storj.io/storj/storagenode/storagenodedb.cachedBlobstore:231
storj.io/storj/storagenode/storagenodedb.OpenExisting:250
main.cmdRun:67
main.newRunCmd.func1:33
storj.io/common/process.cleanup.func1.4:392
storj.io/common/process.cleanup.func1:410
github.com/spf13/cobra.(*Command).execute:983
github.com/spf13/cobra.(*Command).ExecuteC:1115
github.com/spf13/cobra.(*Command).Execute:1039
storj.io/common/process.ExecWithCustomOptions:112
main.main:34
runtime.main:271
2024-08-28 13:07:09,793 INFO exited: storagenode (exit status 1; not expected)
2024-08-28 13:07:11,797 INFO spawned: 'storagenode' with pid 47
I was able to work around it by moving all the badger cache files to a different folder and effectively starting over from scratch.
Was it the update to 1114? Are your 2 nodes windows?
Another set of results. Cache on SSD. Sizes are different but improvements are so obvious that this does not matter much.
Without cache:
2024-08-28T11:55:23Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-08-31T06:40:12Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "Total Pieces Size": 3612477414725, "Total Pieces Content Size": 3598692620101}
Cache enabled:
2024-08-31T15:14:08Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-08-31T15:19:54Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "Total Pieces Size": 1038449295202, "Total Pieces Content Size": 1035869582690}
Without cache:
2024-08-31T06:40:12Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-08-31T07:39:53Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "Total Pieces Size": 414286876928, "Total Pieces Content Size": 413831412480}
Cache enabled:
2024-08-31T15:19:54Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-08-31T15:20:19Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "Total Pieces Size": 412080359680, "Total Pieces Content Size": 411629355776}
Without cache:
2024-08-31T07:39:53Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-08-31T09:18:26Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Lazy File Walker": false, "Total Pieces Size": 190795233536, "Total Pieces Content Size": 190500340992}
Cache enabled:
2024-08-31T15:20:19Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-08-31T15:20:20Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Lazy File Walker": false, "Total Pieces Size": 37880249344, "Total Pieces Content Size": 37864313344}
Without cache:
2024-08-31T09:18:26Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-08-31T09:37:48Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "Total Pieces Size": 68187273728, "Total Pieces Content Size": 68070129664}
Cache enabled:
2024-08-31T15:20:20Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-08-31T15:20:22Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "Total Pieces Size": 60985184256, "Total Pieces Content Size": 60885697024}
Now you see why I tell all the time make cache configurable that we can put it to ssd.
I think itās not what @jammerdan have tested. Those are results without a badger cache and with the badger cache, not when it was on HDD or on SSD.
@jammerdan please correct me if Iām wrong.
Correct.
āWithout cacheā: Without any badger cache.
āCache enabledā: With badger cache. But badger cache is on SSD so the results would likely be quite slower with badger cache on HDD.
You may be correct, because I noticed recently that on the node with the badger cache enabled (itās on the same HDD with data) the US1 satellite tooks a lot of time to scan (not days like before, but longer than Saltlake):
2024-08-29T18:15:15Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW"}
2024-08-29T18:15:18Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Lazy File Walker": false, "Total Pieces Size": 110854656, "Total Pieces Content Size": 110829568}
2024-08-29T18:15:18Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-08-29T20:03:23Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Lazy File Walker": false, "Total Pieces Size": 1766839000320, "Total Pieces Content Size": 1763175659264}
2024-08-29T20:03:23Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-08-29T20:23:57Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "Total Pieces Size": 137937089536, "Total Pieces Content Size": 137714464256}
2024-08-29T20:23:57Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-08-30T10:50:51Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "Total Pieces Size": 2360423466410, "Total Pieces Content Size": 2354016213418}
2024-08-30T10:50:51Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-08-30T12:53:46Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "Total Pieces Size": 661815707136, "Total Pieces Content Size": 660938693632}
The node is still busy with the GC.
Did you reboot your server after each test? If not, these results might not be very meaningful because after the first run of the used-space-filewalker without Badger, the metadata would already be in RAM. This could boost the performance of subsequent runs of the used-space-filewalker even without Badger.
They have a low power device with low RAM, so itās unable to cache it fully.
But you of course correct, thus I compare results of several days difference.
That is an excellent thought.
I must admit, I did not reboot or manually cleared the cache before the filewalker with cache enabled. However the timestamps of the logs show, that there are up to almost 9 hours between the finishing without cache and the start with cache enabled. I would doubt that after that time there is still much meta data left in the RAM cache. But I donāt know for sure.
But as coincidence strikes, I have new data at least for one satellite on that node that happened to run filewalker again:
2024-09-02T21:51:56Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-09-03T04:58:05Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "Total Pieces Size": 1028755594490, "Total Pieces Content Size": 1026224779002}
Yes, it took more time but still much much faster than the initial filewalker without any caching. And the number of pieces has changed, so if it has read new data from the slow HDD into the cache it would be expected, that this would take some extra time. So I think the results are still good (on a heavily used HDD).
Edit:
Additional results are flowing in:
2024-09-03T04:58:05Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-09-03T05:24:26Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "Total Pieces Size": 425165638400, "Total Pieces Content Size": 424707436800}
2024-09-03T05:24:26Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-09-03T05:41:53Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Lazy File Walker": false, "Total Pieces Size": 33298512640, "Total Pieces Content Size": 33283112704}
2024-09-03T05:41:53Z INFO pieces used-space-filewalker started {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-09-03T05:42:45Z INFO pieces used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "Total Pieces Size": 59897545472, "Total Pieces Content Size": 59799487232}
You are likely correct. If you didnāt have a special device for metadata, it is likely evicted for that time.
I agree with that.
My first test resultsā¦
Synology DS216+ 8GB RAM, 2 nodes on 2 Ironwolves 8TB, ext4, noatime, no SSD, no USB drive.
Logs, databases, badger cache on storage drives.
Tested on only one drive; the other has FW disabled. Hot start, no DSM reboot.
First FW run, badger cache in the making, lazzy off, 4.4TB data - 18.5h.
Second FW run - 1h and 7min.
This is impresive. The problem is with the instability of this cache, if the UPS fails to shut down the node properly in case of an outage.
The other problem is that in the filestatcache dir, files are created each day, taking up space;
after 3 days, I have 262 files, taking 3.25GB in total. They are numbered 000xxx.sst, where xxx is a number contantly increasing. Now itās at 806. Each file has from a few KB to a few MB, even 15MB in size. Is this normal? Are they cleaning anytime soon?
Itās normal. About 4/5gb with 10/12tb nodes.
Can decrease if node data decrease tooā¦ of course.