So on the one hand, we have an employee stating that it’s safer to make a copy of the database before running a read-only script against it because that could corrupt the database. This seems to indicate that SQLite is pretty fragile, as this shouldn’t break a database. (I suspect the base cause of this would be journal recovery during writes by another process as I don’t think
flock() advisory locks on aufs propagate to the host or other aufs mounts that use the same directory.)
On the other hand, we have another employee claiming that any database can get corrupted (true but we were/are talking about likelihood and not possibility) as an argument that SQLite is fine and Postgres/etc would not solve the problem.
Only one of those can really be true in this specific situation – database servers are specifically designed to deal with multiple connections and concurrent readers/writers. They would increase reliability in exactly these kinds of situations.
SQLite particularly in combination with Docker’s lack of
flock() propagation with shared volumes (due to the aufs limitation) makes it a ticking time bomb for anything that tries to use the database concurrently, even though a reasonable person without knowledge of aufs (as I was before researching this specifically because of this issue) would assume that SQLite’s locking mechanism should handle the concurrent access.
As a side note, this is the same argument I have in regards to the advice to use
-t 300 when stopping the container to avoid database corruption: a database should not be corrupted by unclean shutdown. That’s what journals are for. So either we don’t have to use
-t 300 or SQLite is too fragile to be used in production like this, but the advice from Storj is conflicting on this point. (“SQLite is fine” on the one hand but "use
-t 300" on the other – pick one, it can’t be both.)
It seems rather clear at least to me that more robust options should be offered in addition to SQLite – or the
flock() problem should be solved by abandoning Docker.
At any rate, all of this discussion aside, it should 100% be safe to run the script against the database concurrently within the storagenode container since everything would be using the same aufs mount and hence be able to see each other’s locks. Perhaps the advice for using these scripts should be to
docker cp scripts into the storagenode container and
docker exec them there.