Why did you choose to use a sqlite db?

Alexey · July 1, 2019, 9:16pm

Why did you choose to use a sqlite db? Since you ship the software with docker, you could use an internal mariadb database? Would need a little bit more RAM but would be a lot more performant I assume
https://community.storj.io/channel/chat?msg=thi7A4evX8swtZ7ZW

jocelyn · July 2, 2019, 2:14am

@jocelyn tagging #community

stefanbenten · July 2, 2019, 9:41am

There are a few simple requirements and reasons for why we chose SQLite as the database backend for the Storagenodes.

List of Requirements:

Needs to be easy to install (preferable not require any user interaction)
Easy to migrate/move (have the ability to move the database in case of a failure prediction of the hardware its running on
Have the ability to be used with standalone binaries besides/instead of docker and not requiring other installs like MariaDB, MongoDB, PostgreSQL and similar on your PC
(This means, that when someone runs our binary in the near future, we dont like to force the user to have to install those additional databases, especially as it makes the operation much more complicated)

Besides these requirements that just came to my mind (there are probably more), we know that the current implementation (as of right now) has problems regarding the performance.
The engineers are working on solving that as soon as possible. In fact, a properly designed (and used) SQLite database can be at least on par with an alternative database operated on the same machine (given the resource constraints are not endless).

KernelPanick · July 2, 2019, 11:40pm

Those interested in following / learning more about the database performance can watch this issue

BlackDuck · July 3, 2019, 4:54am

The reasons already described by @stefanbenten SQLite has its limitations, but slow is not one of them. As long you don’t need multiple connections over a network it is a great choice.

The main performance problem is the chosen dbchema. At the last time i looked in the db i saw that near 30% of the data are used only by the id of one satellite which is written over and over in one table.

The db structure need a bit of normalization, not to much or everything, because normalization could decrease performance, but in some points it will result in a smaller and faster db design.

Second it should be checked if everything in this database is really needed.

BrightSilence · July 3, 2019, 3:35pm

In my opinion the biggest issue is the file level write lock when one thread is writing data to the db. SQLite locks the entire file for writes when a thread is writing to it. This essentially makes the db writes single threaded even though the transfers are not. Making the file smaller would speed up those writes but wouldn’t necessarily solve the issue of simultaneous writes.

jocelyn · July 3, 2019, 4:35pm

hey @BrightSilence! It is great to see you posting here – very glad to have you on the thread!!

zeebo · July 3, 2019, 5:10pm

You are correct that SQLite has a very coarse-grained locking policy around writes. That said, I believe we’ll have plenty of low hanging fruit before we’re limited by how many writes we can perform. To @BlackDuck’s point, I also think the schema and how the tables are used are causing the majority of the problems.

jocelyn · July 3, 2019, 5:18pm

good point @zeebo …

Eioz · July 12, 2019, 10:12pm

Hello,

Just store infos on txt file with s3
Then get info on txt file with s3
Distribute db (plain text) files on storj nodes

This method ensure high performance when combined to cache and load balancing.
Also store db architecture at different geographic zones levels and maintain the blockchain like this.

Plus : add tiered storage cache to your network like 200 TB 2 X with multiple disk storage speed.

Hope this can help

littleskunk · July 13, 2019, 3:56am

Lets play that game for a moment. A customer would like to upload data to your storage node. Now you open a connection to S3 and tell the customer to wait a second? Blockchain would be even worse because that would mean you have to wait for confirmations before you can tell the customer that the file is successfully stored.

We want to outperform s3. A requirement for that is a low latency for uploads and downloads. Everything that keeps the latency low is good. Any additional round trip like s3 or a blockchain is bad.

stanfieldr · July 17, 2019, 11:28pm

If infodb is the bottleneck for a node which causes uploads to more likely fail, can an optional option be added to use tmpfs as a sort of buffer for the upload?

BlackDuck · July 18, 2019, 4:45am

In most time the bandwidth or latency to the target node will be the problem/limiting factor.

Pentium100 · July 20, 2019, 6:46pm

Except during high load for those who have fast connections.
AFAIK, the db still has these problems:

Limited to 1 concurrent write (sqlite limit). The last updates improved the performance, but I wonder if the problem is going to show up again at, say, 100mbps.
Can be easily corrupted - the number of SNOs that had a corrupted database is a bit too high for my liking. If the db can be corrupted by a crash or just accessing it (reason for stopping the node and copying the db before running the earnings script) it does not inspire confidence.

It looks though as the node can kinda-function without it (at least in some cases deleting info.db seems to be a solution for a corrupt db) and not fail audits.

BlackDuck · July 20, 2019, 7:21pm

These kind of limitations are shared in all Databases, sqlite lock the hole database, other systems doing this by table or row depending of the kind of request. To get some kind of data integrity looking is necessary. The devs did already a lot, and the end of optimisation is not reached. Btw there is no need for concurrent access from other apps, this will mean writes can easily queued within the SN.
I fully aggree to you it is to high, but i wouldn’t call it a database corruption although it feels like that and is also called that here. The database byitself is not corrupted, the data within the database has not the state which the SN expect. Some of the problems are logical problems that are caused by incorrect error handling. And this is fixable. Removing the database will not be the solution in that case, it will only move the problem else were. For example i saw errors that tables were missing, the point is this should not be possible, for that are the migration scripts and db levels there. How could it be that a db level got increased without apply the migration? Or other current issues the SN tries to write a data set which is already there and will break a constraint. Booth will not be fixed by removing the database, the logical issue is still there, it will come up with a other face.

Pentium100 · July 20, 2019, 7:33pm

Locking a row vs locking the whole table is a big difference for multithreaded applications, provided that the application is written in such a way that takes advantage of it.
The migration problems would happen no matter the DBMS I agree, but it seems that you can have problems with the current database just by having the power fail etc at a bad moment. Or just accessing it read-only while the node is running (say, by running the earnings script). At least with MySQL/InnoDB you can always do a SELECT while other processes are writing to the database.

Having the ability to keep the database on SSD (RAID1) may also improve performance.

littleskunk · July 20, 2019, 11:21pm

It would be a bad trade. Our application needs short repsonse time. SQLite is best for that. Switching to another database might improve the database locks but that has a downside. If you don’t belive that I would like to see a trace file from you. Show us how much time the storage node is loosing because of SQLite.

You can even run a test on a SSD and compare the trace files.

curl localhost:7777/mon/trace/svg?regex=Upload > upload.svg
(By default the storage node will open a random debug port)

On my storage node with a HDD i see a query for used_serials and it takes only 260µs. The other queries in my trace file are executed after the upload is finished.

BlackDuck · July 21, 2019, 11:06am

SQLite is highly resistant to corruption even agains powerfail SQLite Corruption. But wrong or missing transactions could be a problem. Other things that named in this link are similar to other databases.

As littleskunk said we are talking about µs of locks. A storage node needs to be easy and slim, it is just a piecestore.

No one limits you to put your database from putting on a SSD.

Running unofficial tools is your risk!

BrightSilence · July 21, 2019, 9:16pm

Especially if those unofficial tools specifically warn to run it on a copy of the db.

With recent updates I don’t think the database is a bottleneck anymore. There may be optimizations that can still be done, but it’s not SQLite that’s the problem, but rather the database design/usage itself.

As for the uniqueness error, it is my believe that these occur because the rollup insert and the subsequent delete are not executed in a transaction and as a result the insert can finish while the delete gets interrupted by something. However, transactions have a performance impact as well.

As for the missing tables, I haven’t looked into the code, but it might similarly be that the migration does not happen in a transaction and it could partially complete.

nitiso · November 23, 2020, 1:34pm

SQLite in comparison with other popular databases is very lightweight (that’s clear from the name). It means that it is easy to learn, perfomance is excellent, though size is limited. SQLite is a nice decision when we don’t want to dive into world of databases, but want only one realiable solution.