What is the worst that would happen if I delete all of my SQLite databases and restart storagenode?

cdhowie · May 22, 2020, 8:41pm

If that is the case then ZFS is lying and sync() does not actually commit the data to storage, which means ZFS would not be considered suitable for any database that wishes to provide durability, and I have a really hard time believing that.

When SQLite issues a sync() it is asking the storage layer to commit the data to persistent storage. This system call must not return until the relevant write buffers have been committed. This would instruct ZFS to commit the write buffer in RAM to the SSD cache. Then, later, it can migrate the data from the SSD cache to a lower storage tier (but that’s not relevant here).

My point is that, not only is there higher throughput to RAM and lower latency, but a sync() on a ramdisk is effectively a no-op since there is no persistent storage.

A SQLite write requires at least two sync()s: one to commit the contents of the WAL, another to commit the contents of the DB after applying the change. ZFS must push the write buffers to the SSD when sync() is issued or the database cannot guarantee durability (a power cut would then render the database either corrupt or missing information).

All the discussion about how amazing ZFS’ various caching mechanisms are is entirely irrelevant to this point. A sync() on a persistent filesystem must move the data to persistent storage somewhere.

This double-sync is the slowest part of SQLite writes (unless writing a large amount of data), as it must wait for the disk to store the information and report success. This latency contributes to lock contention, and it would be substantially faster on a ramdisk since there is nothing to do. That’s my point.

SGC · May 22, 2020, 8:47pm

just because you write something on disk doesn’t mean you have to deleted it in memory, just like just because you write it to disk doesn’t mean you cannot write it to SLOG and then write it out later because you already made a note that it was changed in the SLOG and thus … all future reads until flush comes from the SLOG essentially… or RAM if it wasn’t evicted meanwhile…
and no matter what happens stuff written to SLOG is then written to hdd, even if you pull the plug and power on the server again… it simply writes it from SLOG to HDD when the pool driver is booted.

it does get kinda advanced, but there isn’t really many safer ways of dealing with the issue… even tho there might be some comparable methods.

but it’s the slog that needs to report the success… and with a PLP SSD that would be basically instant… i think mine is usually in microseconds… not sure its a PLP tho… but still PLP would most likely just make it faster and you can get specific RAM IO cards for that using super caps and flash memory backup to flush to incase of a power loss…

so its really just a matter of if you want to be fast and safe, but yeah sure you might see an increase running it in RAM, but i doubt it will make your storagenode better aside from being faster at sync writes. which then isn’t sync writes and thus makes the whole sync argument useless and you could just as well just run ZFS with sync disabled.

it’s fast tho, but stuff acts odd at times… xD the storagenode will run on it for days without issue…
didn’t test longer than that… because i finished what i was doing and my webgui’s was really acting up because they couldn’t figure stuff out because of sync being disabled. i guess

and from what i could see the storagenode only ended up being slower… maybe also because it was waiting for sync ack but not getting them… i duno exactly how zfs deals with running on sync disabled.

its fast tho… uncomfortably fast

cdhowie · May 22, 2020, 8:56pm

Which is where lock contention comes from.

Agreed.

You need sync writes for the piece data, though. Unless you’re proposing having a separate volume for the database with sync off… in which case… you might as well use a ramdisk since you’re not concerned with durability.

SGC · May 22, 2020, 9:02pm

well only time my node would get db locked was when it wanted to do deletions, it would spam like 6000 deletions and it would lock up the database or something…
i’m still running the old version and i turned off my max concurrent to see if it’s still an issue with the db locked thing.
granted this old piece of junk cannot keep up with new stuff, but it should be fast enough to do this limited work pretty well… it also seems i managed to actually get my zfs running ashift 12 to write in 512 byte blocks… lol O.o tho my success makes me mildly worried because 2 of my 6tb drives are 4kn… so thats not going to be good long term… but will be interesting to see what happens… lol

what does a zfs pool do when it runs out of space on 2 out of 5 drives in one of its raidz1’s

JoshGarza · May 23, 2020, 4:30pm

If the software itself stored all the info in temp databases located in RAM and after a period (30min, 1 hour, whatever) it merged all the info in the storage node databases would be nice.

littleskunk · May 23, 2020, 4:37pm

That would mean you lose all orders for the past 30 minutes if you restart the storage node.

JoshGarza · May 23, 2020, 4:49pm

I would only expect that behaviour in a power failure. Let’s reduce these 30 min to 5 min… merging databases takes miliseconds anyways and I prefer to lose these 5 min orders than getting DQ because a locked db.

I have to say I have not seen a “db locked” error for many many many months in all my >8TB nodes, but it seems it is an open discussion nowadays. I just wanted to share an idea.

littleskunk · May 23, 2020, 5:04pm

order db != serials db

SGC · May 23, 2020, 5:16pm

Okay here goes, was doing some reorganizing of nic’s on my pcie, and had to take the storagenode offline briefly. still running 1.3.3 so not sure how useful this actually is…
uptime has been 70 hours and shut it down briefly

the big spike in is when i started the node, and tho my zfs pool isn’t optimal yet… however still pretty powerful. i’m also running a scrub in the background,

as you can see this i nothing normal over the last 24hours… the increase after noon is after i started a scrub, which is still running… scanning my pool at 500-700mb/s and peaked at 1GB/s earlier.
sustains about 500mb/s read at somewhere around 1200 IOPS

The point the IO caused by the storagenode booting this ran through the logs.
basically just deletions and when it ended the process the IO causing IO wait on my CPU because of my HDD latency dropped back to normal ranges…

the activity on the ingress and egress was minimal … 200k to 300k either way.
didn’t get any db locked that i noticed, but i would almost bet that with max concurrent on i would have.
i duno what exactly it is that the storagenode is doing there… but it seems excessive, ill try again tomorrow with max concurrent on my regular 20 and then if i get db locked, then ill upgrade to 1.4.2…
whatever the storagenode is doing during boot seems like something that might be optimize friendly or should at the very least be extended over a longer time frame or given a lower priority so it doesn’t keel peoples system.

   2020-05-23T15:52:35.347Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "LUEF5DFV7W4JSZHX33VVC3JTZGLJTKU4VWT2H6MBKMQYUUB645AQ", "Status": "enabled"}
2020-05-23T15:52:35.437Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "7H6TRQAFC7XRBSDXMJWDMARG22KM32ZTVW47VXZ4QRB26T3CSSAQ", "Status": "enabled"}
2020-05-23T15:52:36.691Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "HNGAADWIOLBU7HOVJA3CI43HHICH3CJRXS2F7N2YUU5BJCDCBE7A", "Status": "enabled"}
2020-05-23T15:52:36.815Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "SURFVYP3D5N7QFPAJPWW5RAYLIXLKYE3M42MUVJYEEDHFUU3N7DA", "Status": "enabled"}
2020-05-23T15:52:36.894Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "7BCLPG63LRRTTY4E2BNOQE3CCTLB5MCJHOJCZETME4EB7VE2U4NQ", "Status": "enabled"}
2020-05-23T15:52:37.401Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "SAZ3CH6DBKVBBKYSPAATFFMA4N7PXZXY3ONVLHS5RXQM2CTQ3EHA", "Status": "enabled"}
2020-05-23T15:52:38.679Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "CFOJC22EICA4HMAU4LJFOU5EQQSWBPIHIZXD4N4QLVC7CZMHBVJQ", "Status": "enabled"}
2020-05-23T15:52:38.725Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "MHZRW7PO4NO27N74KNAHPZW7ZFLWNCUC7YZRDMQM55AUU6K7OS3A", "Status": "enabled"}
2020-05-23T15:52:38.760Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "JOLPUJFPCRKIU54HY4A3JUWALLS6RQGARVMJ7UAKMBZHR22TN6QQ", "Status": "enabled"}
2020-05-23T15:52:38.955Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "5CHQPA3S73H5HNPGBBCN52N5JYLTWBSGQOV6HTTPGWBKNROGPYBQ", "Status": "enabled"}
2020-05-23T15:52:39.178Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "XWWTS2KJV7KRY6HTPN73I5NSTE5HXSI3VGZMBG5OVOLDXCP7MSTQ", "Status": "enabled"}
2020-05-23T15:52:39.416Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "CLQ3EELWYITUVPELGY6HFEXOVGFG3GM4NAQOD2ZQYWS7ZFYPS6PA", "Status": "enabled"}
2020-05-23T15:52:39.623Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "JELMW5VLYPWTRW2RL5PMPCMHBYGMCO7WSD4UP53YWBW225PH3TDA", "Status": "enabled"}
2020-05-23T15:52:39.911Z        INFO    piecestore      upload started  {"Piece ID": "V4H42QEGBAWRXJ55KXOLE4BPQ6QWE5HMOHK74H76N4JNUERPZNZQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT", "Available Space": 15667071544098}
2020-05-23T15:52:40.232Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "2ZHNUI667OEQMH4VVTF3EPG5Q2SE6CT6Y2MKP25RNECP6DE6NJIQ", "Status": "enabled"}
2020-05-23T15:52:41.261Z        INFO    piecestore      uploaded        {"Piece ID": "V4H42QEGBAWRXJ55KXOLE4BPQ6QWE5HMOHK74H76N4JNUERPZNZQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT"}
2020-05-23T15:52:42.767Z        INFO    piecestore      upload started  {"Piece ID": "ZS34WZK5BHZDOWBSPNJMQVWE72R7IXETGC35ZU6HKEPFLTM532JQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT_REPAIR", "Available Space": 15667069496866}
2020-05-23T15:52:43.175Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "2PB4JF7UOB2JTETWEVF734UYQSWUYIPFTL2M3X2RCMYEISZMPSFA", "Status": "enabled"}
2020-05-23T15:52:43.625Z        INFO    piecestore      upload canceled {"Piece ID": "ZS34WZK5BHZDOWBSPNJMQVWE72R7IXETGC35ZU6HKEPFLTM532JQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT_REPAIR"}
2020-05-23T15:52:44.981Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "UOZRAOICPLJ4NYNBC4VVV62VNT366UNDINN2V7HV525P2BA4XQOA", "Status": "enabled"}
2020-05-23T15:52:45.074Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "RHRYB7NZJEIUH6QQYQWDQQD3X6OZXAFX3HYGSMJLJKI5VMDUTHYA", "Status": "enabled"}
2020-05-23T15:52:45.486Z        DEBUG   retain  About to delete piece id        {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "IYHW32P6RJG7Q2QJQB7HQWBVPEABZ5B4EOILLE346CWH5CZO4RBA", "Status": "enabled"}
2020-05-23T15:52:46.472Z        DEBUG   retain  Deleted pieces during retain    {"num deleted": 211, "Retain Status": "enabled"}
2020-05-23T15:52:47.404Z

JoshGarza · May 23, 2020, 5:16pm

Good to know. The idea was about increasing the performance of the sqlite dbs.
Excuse me if i dont know much about the Storj db’s.

JoshGarza · May 29, 2020, 6:53am

Lovely

snorkel · January 26, 2024, 9:28pm

Is this still true, today, at ver 1.95.1? Are there any updates on this for reducing the inconveniences of loosing databases?

Toyoo · January 26, 2024, 10:51pm

Still true.

Orders are not stored in an SQLite database anymore, but in a separate set of files, so unless you delete them specifically, the system will still account for these orders.

No idea about this one.

This likely described the “v0” piece files, which aren’t created for pieces for years now. So does not apply anymore.

“Used space file walker” fixes this problem, but needs to finish before disk space accounting works. Before it finishes, which sometimes takes days, you are at risk.

Alexey · January 27, 2024, 4:25am

You are correct.

This should not happen anymore since orders are moved out of databases, unless the node will lose these orders from the filesystem.