Badger cache: are we ready?

I have already turned on the badger cache, but on the same disk where node is running. In the meantime, I found a free ssd at home. If I stop the node and copy the contents of the cache to a folder on the ssd and attach it there in the docker parameter, does it continue from there? Or should I start over with the new mount point? Can multiple node caches go to this ssd in separate folders? Thanks

I have 108 windows GUI nodes, no docker

1 Like

As long as you mount the new folder correctly, it should resume just fine with the already generated cache.

This should also work fine. You cannot point multiple nodes to the same cache, but multiple caches work fine in the same drive (at least I have not encountered any issues yet).

1 Like

Thanks for the reply, I’ll try to set it up

Nice, I will test it, too.

Some Questions:
→ If the Cache is on SSD and the SSD dies, the Node will survive without Cache, will it?
→ Are there some Estimations about the Size of the cache in Relation to the Node size?

Thx in Advance :slight_smile:

1 Like

@elek now I have first problem with budger cache, it looks like it broke on windows update/restart. But this is not main problem. After it broke, node not start any more and do not write any logs that it broken or something. After delete of cache map is started and working again.

You can set your log to debug level to see which messages are logged.

As I know what the problem, I already fixed it, node should work as much as possible.
Next time I will try.

2 Likes

Yes, it is. But you would likely need to delete it and restart to fill it again.

Does anybody know what to expect from the cache db in a low memory environment?

I have it enabled on around 80 nodes now from about 2 weeks ago, so far everything is running fine and the cache dir is growing nicely. For a large (10TB) node the size seems to end up being around 2GB.

Some of these nodes have only 1GB of RAM though. Since the DB is mostly random access, will I gain any performance even if most of the DB is not in RAM?

I have the option of placing the cache dir on an SSD. I presume in docker using

--mount type=bind,source="/mnt/ssd/filestatcache",destination=/app/config/storage/filestatcache

would be correct?

2-3 days after launching, my nodes started to crash.
When you turn on debugging, there is either nothing in the log, or at most this:

2024-08-15T20:19:24+03:00 DEBUG db.filestatcache First key="{-\xe9\xd7,.\x93_\x19\x18\xc0Xʯ\x8e\xd0\x0f\x05\x81c\x90\bps\x17\xff\x1b\xd0\x00\x00\x00\x005W-Q\xd9_\xb0\x11I\x96\xf4\xfb\xd5%\x9cyX\xcf\x19\xc2\xe7\xe1̄or\xa8{n\xb1\x1c*\xff\xff\xff\xff\xfeJW\x19"

#2

2024-08-15T20:22:30+03:00	DEBUG	db.filestatcache	First key="\xa2\x8bO\x04\xe1\v\xae\x85\xd6\x7fLl\xb8+\xf8\xd4\xc0\xf0\xf4z\x8e\xa7&'RM\xebn\xc0\x00\x00\x00M\xab\xaa\xb2\x05\xad\x9a\xbb:\xf7'\xb9\xbf\x95\xa7L'Q\xbf\xaf9\xe5\xec\x1a\xeb\xa3\xc0\xc1\x18XB\xf9\xff\xff\xff\xff\xfe\xb5\x00\xf7"

2024-08-15T20:22:34+03:00	DEBUG	db.filestatcache	51 tables out of 187 opened in 3.017s

#3

2024-08-15T17:14:58+03:00	DEBUG	db.filestatcache	First key="\xa2\x8bO\x04\xe1\v\xae\x85\xd6\x7fLl\xb8+\xf8\xd4\xc0\xf0\xf4z\x8e\xa7&'RM\xebn\xc0\x00\x00\x00;\x19\xac\xeb[\x84\xad\xc1\b\xb6\xc3Ô\x10\xca\xdcV\x18\xb2\xe2\a2j\xb9\x1b\xda<\x16F\x16\xb7\xf6\xff\xff\xff\xff\xfe\xea\x1b\xbc"

2024-08-15T17:14:59+03:00	ERROR	db.filestatcache	Received err: Opening table: "D:\\filestatcache\\000963.sst" error: failed to initialize table error: failed to read index. error: failed to verify checksum for table: D:\filestatcache\000963.sst error: actual: 2662894890, expected: 431282182 error: checksum mismatch
github.com/dgraph-io/badger/v4/y.init
	/go/pkg/mod/github.com/dgraph-io/badger/v4@v4.2.0/y/checksum.go:29
runtime.doInit
	/usr/local/go/src/runtime/proc.go:6527
runtime.doInit
	/usr/local/go/src/runtime/proc.go:6504
runtime.doInit
	/usr/local/go/src/runtime/proc.go:6504
runtime.doInit
	/usr/local/go/src/runtime/proc.go:6504
runtime.doInit
	/usr/local/go/src/runtime/proc.go:6504
runtime.doInit
	/usr/local/go/src/runtime/proc.go:6504
runtime.main
	/usr/local/go/src/runtime/proc.go:233
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1598. Cleaning up...
2024-08-15T17:14:59+03:00	ERROR	failure during run	{"error": "Error opening database on storagenode: Opening table: \"D:\\\\filestatcache\\\\000963.sst\" error: failed to initialize table error: failed to read index. error: failed to verify checksum for table: D:\\filestatcache\\000963.sst error: actual: 2662894890, expected: 431282182 error: checksum mismatch\ngithub.com/dgraph-io/badger/v4/y.init\n\t/go/pkg/mod/github.com/dgraph-io/badger/v4@v4.2.0/y/checksum.go:29\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6527\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\n\tstorj.io/storj/storagenode/storagenodedb.cachedBlobstore:231\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:250\n\tmain.cmdRun:67\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "errorVerbose": "Error opening database on storagenode: Opening table: \"D:\\\\filestatcache\\\\000963.sst\" error: failed to initialize table error: failed to read index. error: failed to verify checksum for table: D:\\filestatcache\\000963.sst error: actual: 2662894890, expected: 431282182 error: checksum mismatch\ngithub.com/dgraph-io/badger/v4/y.init\n\t/go/pkg/mod/github.com/dgraph-io/badger/v4@v4.2.0/y/checksum.go:29\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6527\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\n\tstorj.io/storj/storagenode/storagenodedb.cachedBlobstore:231\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:250\n\tmain.cmdRun:67\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n\tmain.cmdRun:69\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-08-15T17:14:59+03:00	FATAL	Unrecoverable error	{"error": "Error opening database on storagenode: Opening table: \"D:\\\\filestatcache\\\\000963.sst\" error: failed to initialize table error: failed to read index. error: failed to verify checksum for table: D:\\filestatcache\\000963.sst error: actual: 2662894890, expected: 431282182 error: checksum mismatch\ngithub.com/dgraph-io/badger/v4/y.init\n\t/go/pkg/mod/github.com/dgraph-io/badger/v4@v4.2.0/y/checksum.go:29\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6527\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\n\tstorj.io/storj/storagenode/storagenodedb.cachedBlobstore:231\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:250\n\tmain.cmdRun:67\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "errorVerbose": "Error opening database on storagenode: Opening table: \"D:\\\\filestatcache\\\\000963.sst\" error: failed to initialize table error: failed to read index. error: failed to verify checksum for table: D:\\filestatcache\\000963.sst error: actual: 2662894890, expected: 431282182 error: checksum mismatch\ngithub.com/dgraph-io/badger/v4/y.init\n\t/go/pkg/mod/github.com/dgraph-io/badger/v4@v4.2.0/y/checksum.go:29\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6527\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6504\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\n\tstorj.io/storj/storagenode/storagenodedb.cachedBlobstore:231\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:250\n\tmain.cmdRun:67\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n\tmain.cmdRun:69\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tstorj.io/common/process.ExecWithCustomConfig:72\n\tstorj.io/common/process.Exec:62\n\tmain.(*service).Execute.func1:107\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

@elek Budger cache not ending work properly some times, I turn off server from windows to move to new case, after turn on 4 nodes cache is broken nodes not start, after delete of cache files working again.

Seems like a corruption. I would like to suggest to stop the node, check and fix the filesystem, then remove the cache and restart the node.

The file system is fine (there are no problems and there never were), deleting the cache or disabling it solved the problem.
BUT I put the log here for the developers as feedback that there is a problem and it needs to be fixed.

2 Likes

For that we need to know, what’s caused its corruption?
Did you have restarts or power cuts?

There were no power outages.
Some of the nodes stopped after the restart, and some stopped during operation and the service was no longer restored.

How they were restarted? Due to a some error, or you issued the command, or the server restart or is it restarted by the updater?

There were no power outages.
There were no emergency situations.
All restarts were only initialized by me for maintenance purposes.

Interesting. I have restarted my node with the badger cache enabled and it’s survived the restart.
Something weird.