Has it ever been considered to use in-memory databases?

Toyoo · February 29, 2024, 10:21pm

I still fail to see your point. You are advocating for increasing complexity of storage nodes a lot simply to cater for lack for proper OS support for basic tools. I would like someone to ask Microsoft to actually implement a decent ram disk support.

arrogantrabbit · February 29, 2024, 10:50pm

Perhaps even better approach would be to turn off writes to databases altogether. With a command line switch.

I personally don’t care about that data or dashboard pretty plots. At all.

Node does not need that data either, as far as I understand, everything mission critical relies on a filesystem.

Then why waste resources writing it in the first place?

jammerdan · March 1, 2024, 1:15am

That was part of my idea. If databases would be memory only with a setting that SNO could control how frequently to backup, then he could set it to something like 0 meaning never.
So databases would never touch the disks and would be 100% memory only.

jammerdan · March 1, 2024, 1:54am

My idea is to rely on the built-in sqlite functions that are present on all nodes instead of 3rd party or OS functions that aren’t even present when it comes to Windows for example.
I don’t know how much this would increase complexity of the node software or if this even can be done, hence my question as thread title. But for me it sounds like the better approach to do it on the storagenode level because the storagenode controls everything around the database, like creation, access, connections. And it has the access to these built-in functions. This would also ensure that such a solution would keep working across updates, restarts, failures etc, across different OS and setups like Docker or non Docker. Trying to achieve this with a 3rd party scripts and relying on 3rd party solutions without access to the built in sqlite functions sounds not like the better solution to me.
I admit as a SNO I would very much like it to not having to think about Ramdisks or SSDs and moving databases back and forth and instead rest assured that the databases run with maximum speed in memory barely touching the disks and don’t have to worry that the solution keeps working across updates, crashes or whatever issue a node might encounter.

arrogantrabbit · March 1, 2024, 2:16am

Right, but since databases have pure cosmetic purpose why not a step further and turn them off completely and eliminate whatever computational and memory overhead they add.

jammerdan · March 1, 2024, 2:29am

I am not sure if I follow you completely. How do you monitor the nodes progress without that data?

arrogantrabbit · March 1, 2024, 2:37am

What progress? And why does it need to be monitored?

All issues are reported in log (it would be actually nice to surface errors and warnings in the ui).

Connectivity status also does not require databases.

How much data it transferred and stored is visible on the firewall and disk array respectively. And that is non-actionable. So why should I want to know it, let alone have node calculate it when I have external tools for that.

Payment information — I’ll know when I get paid. What’s the point in seeing “projected payout”?

Basically, all that work to provide useless/non-actionable data in the UI most users never look at anyway, wasting IOPS and/or ram.

jammerdan · March 1, 2024, 2:45am

I believe a lot of SNOs use the earnings script and I think that relies on the databases.
Recently I had to check the online performance of a node and the API access came handy for that to see on a day per day basis when the online score will increase again.

Generally I like to see the performance over time and I am a fan of the projected payout.
So I would probably not abandon them completely. But an option to have less writes to disk to have maximum IOPS not for databases but for customer data, would be great.

arrogantrabbit · March 1, 2024, 2:54am

Oh, absolutely, I’m not saying yank all that code.

Just to provide a command line options to skip anything that has to do with collecting local stats in general and writing databases in particular, for those who don’t care about stats but do care about performance (and maybe SSD writes)

It would be a cleaner solution than dancing around with ram disks and other workarounds.

jammerdan · March 1, 2024, 6:03am

Maybe even more. Some debug information like currently running tasks like filewalker etc. would be nice as well.

thepaul · March 1, 2024, 10:12pm

The database writes are not all cosmetic. For example, the orders your node has accepted from uplinks are stored in sqlite (for now). Without keeping track of those, your node would not be able to submit the orders back to the satellite for payment.

Alexey · March 2, 2024, 7:25am

I thought it’s moved to a filesystem orders “database” (BoltDB?)

thepaul · March 4, 2024, 2:52pm

Has it? I didn’t think so, but maybe it changed while I wasn’t looking. Either way, I guess, I mean that not all writes to “file-backed databases” are cosmetic.

jammerdan · March 14, 2024, 5:03am

https://www.sqlite.org/c3ref/backup_finish.html

The backup API copies the content of one database into another. It is useful either for creating backups of databases or for copying in-memory databases to or from persistent files.

Sounds like the way to go. Like:

Create in-memory database when node starts
Backup data from persistent files to in-memory databases with backup API
Periodically (configurable by SNO) call backup API to save state of in-memory database to disk
Final backup of in-memory database to persistent disk files on node stop

@Alexey
Can you pass this idea to the team for an answer? I really would like to know if using the in-memory databases instead of persistent database files has been considered or would be possible.
At least it sounds like something like this could reduce disk I/O required for the database operations significantly.

Alexey · March 14, 2024, 7:58am

and in case of killing by a timeout on stop (the stop took longer for example), this in-memory database will be lost and probably the database file too.

And the whole process is too complicated in my opinion. You may test how long it would take for example for the copy of bandwidth.db and/or piece_expiration.db.

P.S. I shared with the team, even if you didn’t convince me

jammerdan · March 14, 2024, 8:17am

Probably. But that’s also one reason why I think trying to catch those cases with external scripts is much harder than from the inside the storagenode code. Maybe some cases would require an additional “emergency” backup before failure.
And also that’s why I am suggesting these periodically backups. I don’t know how long a backup takes, but if you can do them ever 60/90 seconds or even 5 minutes, that is all you would lose in case of such an event.

I don’t know how effcient the backup API function is that sqlite provides, that’s true.
On the other hand it was said in this thread that the workflow even could be scripted with 3rd party tools and RAM disks already today. So maybe it can be done fast.

Thank you. I am not a coder so I cannot tell if it can be done or makes sense or how much work it would be. But we are talking about moving databases to SSDs for performance reasons a lot. And nodes do get bigger these days. So I believe it is natural to think of moving the databases to the even more faster RAM completely instead of hammering disks unrelentingly with IO.

Egon · March 14, 2024, 9:06am

I guess the main load of the database should be in orders table, which we definitely wanted people to persist – not due to functionality, but because losing information from there would lead to not being able to get paid for the traffic provided. It of course could be a configurable tradeoff that a SNO could make.

With regards to orders there can be a solution to flush in batches – i.e. only save to disk when you have 500 orders (or it hasn’t been flushed for 15 minutes). This is easier to implement than trying to mirror the database from disk.

With regards to pieceexpiration table, that should be going away soon to ensure we can implement object retention on the satellite side. I.e. deletion would be completely managed by garbage collection and piece expiration won’t be sent to storage nodes any more.

With regards to something like bandwidth there could be two approaches.

we keep an in-memory tracking of latest hour or 15min and then only save traffic once per interval. During a graceful shutdown / restart, it would flush it. When there’s a crash or OOM, that would be lost.
remove internal metrics tracking entirely from SNO and instead create a way that the SNO can send metrics to a designated metrics handler.

The 2. approach seems nice, especially from maintenance reduction from Storj side, but it creates a barrier for less experiences operators.

As for UPS, it’s not sufficient to make a RAM disk resilient. storage nodes also need to handle storagenode server crashes, out-of-memory situations and OS level crashes. In other words, as long as the information is in RAM it should be treated as lost information - the question becomes, what is it reasonable to lose.

I guess the first step for figuring out what to tackle first is to figure out which of the databases cause the most IOPS.

jammerdan · March 14, 2024, 9:13am

This sounds very interesting.

I agree. Orders should not get lost (easily). But are we still using orders db? I though we have evolved to files. At least I have an order folder on my nodes. Or do they first get written into the database and then flushed to files?

This sounds good. Maybe SNO could adjust the value as he likes?

How can I do that?

Egon · March 14, 2024, 11:31am

Orders should not get lost (easily). But are we still using orders db?

Ah indeed, I haven’t touched the storagenode side for a while. It’s using a flat file, which should be better.

How can I do that?

I haven’t done that myself, but the first results I got back were using dtrace or SystemTap.

So, unfortunately, I don’t have a “here run this command” to recommend.

jammerdan · March 14, 2024, 12:11pm

If location of persistent database remains configurable by SNO, we could select a SSD. That would/should make it fast.