We use an in-memory cache to write to the database, by default it syncs every hour.
Its not about a in-memory cache by itself.
It’s about that they solution of the sqlite lock contention issue
produced of the file-walker random IO issue.
It’s basically impossible anymore to run a node on a single HDD
dedicated or shared with an low IO process.
The solution to add file-system metadata cache in memory or separate dedicated flash storage seams more a workaround then anything else.
What I could extract from the forum, is that a node requires to overcome the file-walker issue around 1GB fs-cache per 1TB data. It makes low HW like raspberry-pi 3,4,5 already not meat this requirement and to run the node as a background task on a server seams on spare/backup drives not feasible.
Then there is the issue with the file directory fragmentation (ext4 and maybe others).
This could be solved to restructure the storage directory to add files only in the current-week directory and files from previous weeks could only be deleted but none added.
But this would require a file-locator component that would be based on a fast directory (database)
If required, even the current hash-map directory layout could be used in parallel.
I know that simpler is better, but to answer the question of the current inventory
to make full scan in random mode on the drive makes little sense.
But that’s is what the badger cache is all about, I thought? Serves the same purpose as a read cache like LVM-hotspotcache or L2ARC.
I mean: aren’t you solving a problem, for which already a solution is in trial mode?
I wrote him this even, the badger cache would resolve the file-walker random IO issue.
But not the directory fragmentation and stale/ghost data from partially uploaded chunks.
The badger cache would be require to use it as a file-locator, but then it would not be a “cache” anymore. It would then be more a “honey badger”
The main idea is to drop new chunks only in the latest directory example: storage/2024-32/0/chunkId
this would require a lookup in a fast map/index over a file-locator
Benefits of a file-locator in the “honey badger”
- only the latest directory would require a check on restart or node crash.
- on a new week, the previous week directory would be read and delete only and could be garbage collected or maybe differently optimized in the future
- Multiple drives could be used per node, too extend a node or copy “weeks” slowly over
- A flash+hdd mode for new/hot data and an archive hdd’s
- most likely others …
The filewalker will update the database only when it’s finished the scan. So, usually they should not lock a database while working.
We implemented a different cache with badger, so it is used to cache a metadata and speed up all filewalkers.
The badger cache should help a low power devices to survive. However, it requires to disable a lazy mode (because the badger even more sensitive to multiple accesses of different processes).
and
This problem I don’t understand or better a journal “event-sourcing” would solve it.
In dotnet I could make in a hand full days, but I am at golang a noob.
Maybe I will implement a tech-demo fork.
I will go currently with the workaround with a new rsp5 the zfs+special device metadata with a 5GB/1TB ratio over a partition on a NVMe
Maybe you can just try writing high-level pseudocode? Might be enough to show the idea.
There is a limit in the badge cache implementation, which requires an exclusive access. This is also why it’s an experimental feature.
I forgot to ask, did the disabled sqlite implementation create/open the db-file with the WAL
feature enabled?
sqlite creates/opens a db file in legacy mode 2001 style with WAL
disabled. The writes will modify the file-data in place and they will block reads until the transaction finishes.
With WAL
enabled, regular writes will not block reads on the db file.
https://sqlite.org/wal.html
This would be more then the half of my above approach description and most likely this was the issue with the storj-sqlite implementation.
The other optimization/features that a journal offers would only be necessary in high OP scenarios >50k op/s. where one file upload would generate around 2 + chunks
operations. Don’t really think that even the big nodes have this action, or?
For completion here are the features that would a journal extend beside sqlite-Wal:
-
pass-through
where events do not even gets committed into the journal and instantly get applied/processed. This is feature is near the same as a file upload would store its state in memory and writes to the sqlite db only at the end once. -
Differencing
pub+sync
andpub+flush
it is related to pub/sub and event-sourcing topic. The producer, in our case the file-upload process, would know best when it needs critical events/data to be persistedflush
and processed at a later stage (can be minutes later) or when it needs to know that all processed events got fully processedsync
. As an example forsync
, a file upload needs to respond to the uploader and to guarantee that a follow up read-file of the uploader would succeed. As an example forflush
, a delete-fileS would publish multiple events followed by a singleflush
.
Why do you think so?
$ ls -l /mnt/x/storagenode2/storage/*.db*
-rw-r--r-- 1 root root 75304960 Oct 12 09:59 /mnt/x/storagenode2/storage/bandwidth.db
-rw-r--r-- 1 root root 32768 Oct 12 10:29 /mnt/x/storagenode2/storage/bandwidth.db-shm
-rw-r--r-- 1 root root 4152 Oct 12 10:29 /mnt/x/storagenode2/storage/bandwidth.db-wal
I can see a wal files.
You also may check the code.
The sqlite of the file catalog got removed from the source and was replaced by the file-walker
approach with the reason that the sqlite-db is bottle-nacking the file uploads.
It is not removed, please, check the code.
SQLite databases are still used.
However, we implemented a badger cache:
And the ultimate feature of
Please, use them!
Here’s me complaining (yet again!) without any single reason, just out of the blue. 4 months later and we (=baremetal) still don’t have a way to configure where badger is stored.
Yeees. But if you so concerned, and run the node without docker, you can still reroute it with a symlinks.
I can help, please post a description of your environment
Not that concerned, thank you.

-i 65536
Hi @Toyoo, do you think the -i 65536 argument on mke2fs is still a safe value today? I ask because you wrote in another thread (Recommended GB of RAM / TB stored? - #7 by Toyoo), that the average piece size has gone down alot.
Looking at the numbers on my nodes I do think it should be good enough for the forseeable future, but I would probably step down any new file systems to ~45000.
With hashstore on the horizon this number probably matters less now though.
The average piece size for all of my nodes is over 200K, so no problem.
Thanks @Toyoo and @alpharabbit
Merry Christmas everybody
With the startup-piece-scan now reporting the pieces size and pieces number, it’s easy to calculate the average piece size.
For my 6TB node, the average piece size is: 203845 bytes.