Storj sqllite db write intensive

bebeo92 · February 13, 2024, 10:39am

So I move the sqlite db of storage node to a separate ssd disk to improve the performance and after that I notice that the ssd lifespan of my ssd is decrease like crazy, around 1%/4 days

using the, inotifywait cmd I do see a huge amount of disk activities with *.wal db file
``inotifywait -m -e modify dbpath

So my question is, is there any setting/config to mitigate/avoid this ?

Toyoo · February 13, 2024, 11:24am

The wal files are indeed written to frequently, but these are not synchronous writes, so they should not be contributing that much to SSD wear. So you will see a lot of them in inotify, but the kernel should be smart enough to only send a write command after collecting several writes to these files. If this is not the case, then you indeed have a problem—but that problem is with kernel resources, not node settings.

As such, I’d try looking for other sources of writes.

bebeo92 · February 13, 2024, 1:38pm

I’m think this is the issue in my case, and to test my assumption, I already move sqlite db file to a separate hdd disk and from that point my ssd lifespan doesnot decrease at that rate anymore

From my understand so correct me if I’m wrong, *.wal are normal physical db files which need to write to disk so with storagenode with hunder thousand of small segment the amount of data it need to write to disk I think is huge

Roxor · February 13, 2024, 1:44pm

Is your drive a 980 Pro?

bebeo92 · February 13, 2024, 2:55pm

No I use crucial ssdddddddddddddddddddddddddd

Toyoo · February 13, 2024, 10:01pm

.wal files are written sequentially, and with no forced synchronous writes, so in theory they shouldn’t be a problem. But if moving the database files elsewhere solved the problem, then this clearly was a problem for your case.

Frankly, I would really like to learn why it happened.

What file system are you using on that SSD? Do you have any RAID there, or any other volume management subsystems?

arrogantrabbit · February 13, 2024, 10:19pm

This looks like write amplification. Perhaps your SSD pretends to have 512 byte sector size.

This is about the wal file: Write-Ahead Logging. You may be able to control it somewhat.

If I were you I would put databases to tmpfs.

Micron SSD vary drastically in how they handle writes. They sell a separate class of SSDs optimized for small writes. Whatever they use for the consumer Crucial brand is likely the bottom of the barrel.

Toyoo · February 13, 2024, 10:26pm

That might be true as well…

bebeo92 · February 14, 2024, 2:11am

yeah, I also really want to know. I’m using zfs and I set the record size to 4KB, base on arrogantrabbit suggestion I think I should set it to 512?

bebeo92 · February 14, 2024, 2:12am

That is a way to go I think but I don’t see any setting via config file that allow to do that, storagenode config only have setting for dbpath

arrogantrabbit · February 14, 2024, 2:39am

I would not.

By messing with sqlite you will introduce instability. Since databases storj is creating have negative utility value (zero pros, all cons) you can keep them in ram on tmpfs instead.

It will be more stable and use zero SSD endurance.

This will radically solve the problem.

If for some reason you want them to persist, you can copy them to randisk at start and copy them back on persistent storage on shutdown.

But realistically, how often do you shutdown…

bebeo92 · February 14, 2024, 4:01am

how often do you shutdown…
=> Usually 1 or 2 months
I mean I want to move all *.wal to ram but still want to keep all *.db file persist on disk

arrogantrabbit · February 14, 2024, 4:19am

Moving wal to ram is entirely pointless. It’s a journal, to save the day when power is lost during transaction. Keeping it in ram defeats the purpose.

Read them into Ramdisk on boot and offload them back to persistent storage on shutdown. Two lines of script

Toyoo · February 14, 2024, 10:42am

Ah, sorry, I can’t help with zfs. But there are some experts on zfs on the forum…

arrogantrabbit · February 14, 2024, 4:04pm

4 kb record size is too small. Don’t change defaults unless you have a very good reason to do so. Default on ZFS is 128k, and compression on.

I understand your SSD is in its own separate pool?

You would need to destroy is and create a new one with 4096 sectors side. (zpool create -o ashift=12 …). You can look up the current pools ashift with zfs get ….

bebeo92 · February 15, 2024, 1:26am

*.wal file will be delete everytime I stop that node so I think I could change the recordsize and start the node again

arrogantrabbit · February 15, 2024, 1:33am

This is not about record size.

ashift hints zfs about the physical sector size of the underlying storage device. It’s a setting for storage devices, not specific dataset.

Recordsize is a maximum size of a logical block used to store data. It’s a setting for individual datasets.

You cannot change the ashift without removing the device first.

Here is an interesting article (I skimmed though it, did not read it thoroughly, but here is a relevant quote:

ZFS admin is strongly advised to be aware of the actual sector size of his or her devices, and manually set ashift accordingly. If ashift is set too low, an astronomical read/write amplification penalty is incurred—writing a 512 byte “sectors” to a 4KiB real sector means having to write the first “sector”, then read the 4KiB sector, modify it with the second 512 byte “sector”, write it back out to a new 4KiB sector, and so forth, for every single write.

In real world terms, this amplification penalty hits a Samsung EVO SSD—which should have ashift=13, but lies about its sector size and therefore defaults to ashift=9 if not overridden by a savvy admin—hard enough to make it appear slower than a conventional rust disk.

bebeo92 · February 15, 2024, 2:37am

ah ok, I think I read and check about this before, I might need to to check it again for my ssd