So I move the sqlite db of storage node to a separate ssd disk to improve the performance and after that I notice that the ssd lifespan of my ssd is decrease like crazy, around 1%/4 days
using the, inotifywait cmd I do see a huge amount of disk activities with *.wal db file
``inotifywait -m -e modify dbpath
So my question is, is there any setting/config to mitigate/avoid this ?
The wal files are indeed written to frequently, but these are not synchronous writes, so they should not be contributing that much to SSD wear. So you will see a lot of them in inotify, but the kernel should be smart enough to only send a write command after collecting several writes to these files. If this is not the case, then you indeed have a problem—but that problem is with kernel resources, not node settings.
As such, I’d try looking for other sources of writes.
I’m think this is the issue in my case, and to test my assumption, I already move sqlite db file to a separate hdd disk and from that point my ssd lifespan doesnot decrease at that rate anymore
From my understand so correct me if I’m wrong, *.wal are normal physical db files which need to write to disk so with storagenode with hunder thousand of small segment the amount of data it need to write to disk I think is huge
No I use crucial ssdddddddddddddddddddddddddd
.wal files are written sequentially, and with no forced synchronous writes, so in theory they shouldn’t be a problem. But if moving the database files elsewhere solved the problem, then this clearly was a problem for your case.
Frankly, I would really like to learn why it happened.
What file system are you using on that SSD? Do you have any RAID there, or any other volume management subsystems?
This looks like write amplification. Perhaps your SSD pretends to have 512 byte sector size.
This is about the wal file: Write-Ahead Logging. You may be able to control it somewhat.
If I were you I would put databases to tmpfs.
Micron SSD vary drastically in how they handle writes. They sell a separate class of SSDs optimized for small writes. Whatever they use for the consumer Crucial brand is likely the bottom of the barrel.
That might be true as well…
yeah, I also really want to know. I’m using zfs and I set the record size to 4KB, base on arrogantrabbit suggestion I think I should set it to 512?
That is a way to go I think but I don’t see any setting via config file that allow to do that, storagenode config only have setting for dbpath
I would not.
By messing with sqlite you will introduce instability. Since databases storj is creating have negative utility value (zero pros, all cons) you can keep them in ram on tmpfs instead.
It will be more stable and use zero SSD endurance.
This will radically solve the problem.
If for some reason you want them to persist, you can copy them to randisk at start and copy them back on persistent storage on shutdown.
But realistically, how often do you shutdown…
how often do you shutdown…
=> Usually 1 or 2 months
I mean I want to move all *.wal to ram but still want to keep all *.db file persist on disk
Moving wal to ram is entirely pointless. It’s a journal, to save the day when power is lost during transaction. Keeping it in ram defeats the purpose.
Read them into Ramdisk on boot and offload them back to persistent storage on shutdown. Two lines of script
Ah, sorry, I can’t help with zfs. But there are some experts on zfs on the forum…
4 kb record size is too small. Don’t change defaults unless you have a very good reason to do so. Default on ZFS is 128k, and compression on.
I understand your SSD is in its own separate pool?
You would need to destroy is and create a new one with 4096 sectors side. (
zpool create -o ashift=12 …). You can look up the current pools ashift with
zfs get ….
*.wal file will be delete everytime I stop that node so I think I could change the recordsize and start the node again
This is not about record size.
ashift hints zfs about the physical sector size of the underlying storage device. It’s a setting for storage devices, not specific dataset.
Recordsize is a maximum size of a logical block used to store data. It’s a setting for individual datasets.
You cannot change the ashift without removing the device first.
Here is an interesting article (I skimmed though it, did not read it thoroughly, but here is a relevant quote:
ZFS admin is strongly advised to be aware of the actual
sector size of his or her
devices, and manually set
ashift accordingly. If
ashift is set too low, an astronomical read/write amplification penalty is incurred—writing a 512 byte “sectors” to a 4KiB real
sector means having to write the first “sector”, then read the 4KiB
sector, modify it with the second 512 byte “sector”, write it back out to a new 4KiB
sector, and so forth, for every single write.
In real world terms, this amplification penalty hits a Samsung EVO SSD—which should have
ashift=13, but lies about its sector size and therefore defaults to
ashift=9 if not overridden by a savvy admin—hard enough to make it appear slower than a conventional rust disk.
ah ok, I think I read and check about this before, I might need to to check it again for my ssd