After reboot PC node did not start. File is not a database

Jice · June 12, 2023, 12:42pm

Hi all. I got Debian 11, i386 PC, 2 RAM DDR3
After restart node did not start and I see this

 docker logs --tail 20 storagenode
2023-06-12T12:38:44.260Z        INFO    Configuration loaded    {"process": "storagenode", "Location": "/app/config/config.yaml"}
2023-06-12T12:38:44.261Z        INFO    Anonymized tracing enabled      {"process": "storagenode"}
2023-06-12T12:38:44.262Z        INFO    Operator email  {"process": "storagenode", "Address": "myemail@gmail.com"}
2023-06-12T12:38:44.262Z        INFO    Operator wallet {"process": "storagenode", "Address": "0x47FC36170236Dac0fa5f6bBE"}
Error: Error starting master database on storagenode: database: notifications opening file "config/storage/notifications.db" failed: file is not a database
        storj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:347
        storj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:324
        storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:299
        storj.io/storj/storagenode/storagenodedb.OpenExisting:266
        main.cmdRun:62
        main.newRunCmd.func1:32
        storj.io/private/process.cleanup.func1.4:399
        storj.io/private/process.cleanup.func1:417
        github.com/spf13/cobra.(*Command).execute:852
        github.com/spf13/cobra.(*Command).ExecuteC:960
        github.com/spf13/cobra.(*Command).Execute:897
        storj.io/private/process.ExecWithCustomOptions:113
        main.main:30
        runtime.main:250
2023-06-12 12:38:44,283 INFO exited: storagenode (exit status 1; not expected)

and SQLite tell this -

 sqlite3 notifications.db "PRAGMA integrity_check;"
Error: file is not a database

How may I start node?
or should I only fix it like this guide? https://support.storj.io/hc/en-us/articles/4403032417044-How-to-fix-database-file-is-not-a-database-error

Stob · June 12, 2023, 2:01pm

Yes. Run the fix as the guide states.

Jice · June 13, 2023, 7:43pm

is notifications.db important db? I cant restore it, I think I need to delete it. I will lost some stats?

JWvdV · June 13, 2023, 8:21pm

Just remove it, and you will find out. But there’s a chance the node won’t start. Then you probably have to remove them all, if you’re not able to recover this particular database. And probably other databases will turn out to be corrupted as well. Then you’re losing stats indeed. But it’s what now important to you: a running node or full stats.

But I actually think, the databases are really a pain in the ass of STORJ. Never came across an application with so many failing databases. The developer team should consider another solution in my opinion. It’s probably even in the settings they use, because sqlite is actually being used very much.

This for example:
https://www.sqlite.org/howtocorrupt.html

Perhaps even pragma checks / repairs of databases could be automated by STORJ itself…

Jice · June 13, 2023, 9:30pm

well node actually did not start yes I removed them all and start new clean one. Not really big deal because this node was about 10-15 days only and reason of failure was corrupted ext4 fs, I fix it with fsch. But still for future if node will run about 6+ month it will really sad if some databases will crush with no restore only delete.
Thread may be closed

JWvdV · June 14, 2023, 12:30am

You can just remove all databases, then they will be recreated as soon as you restart the node. You don’t have to start the node all over, as like from scratch.

My experience up to now is that databases are usually not corrupted at their own, but usually multiple at the same time. So, I on purpose put them on another disk than the data files (because of the many writes), which is partitioned as btrfs which is being snapshotted. In the end I only lose some hours of stats, but that’s it.

In my opinion, STORJ should take those back-up measures on it’s own. For example write at start all sqlite databases to a tmpfs-device, and write them back with another name from time to time which is being renamed to the original name as soon as the file is successfully being closed.

Alexey · June 14, 2023, 4:03am

You may recreate any database without removing them all @JWvdV , reducing the Stat lose:

JWvdV · June 14, 2023, 6:04am

Alright.

But just out of curiosity: are there plans on automatically handling those database errors? Especially because they are that frequent coming along? I mean:

Just recreate missing databases on the fly isn’t that hard…?
Running pragma check with recovery trial or recreate empty database if not working, wouldn’t be so hard to automate?
Working with a more back-up idea, like I suggested in the previous post? (E.g. copying the database to a working file like database.db database-working.db, closing them each N hours, check in the folder whether file is still well-formed, rename the file over the previous database, restart the cycle, …) or create a tmpfs-disk in memory which even brings down IO to disks. If anywhere in the process there is a problem mentioned in the sqlite-manual, you’re of without a problem in most cases.

Just trying to improve the work flow, because SNO isn’t the most well-paid job and I’m not kidding I never came across a program with that many problems with sqlite as STORJ. As you can see, many run into trouble by just power cycling their node, like pragma settings aren’t right or something. Recoverability isn’t guaranteed. Speaking for myself it already took place 3 times last 2 months on the six nodes in running, which were all not recoverable. My nodes are all behind UPS and only one time it happened because of an OOM-failure, but the other times it just happened after power cycling. I already put the databases on another drive (SSD), in advance because I saw this error coming along in the forum so many times. I finally decided to use BTRFS for the host filesystem, so it would take me the least time as possible to recover from such errors while keeping as many stats as possible

arrogantrabbit · June 14, 2023, 6:45am

I hope not.

These corruptions are a litmus test. A wake up call. Silencing these will be akin silencing smoke alarm. Fire will still get you even if you make fire alarm reset itself. This is data loss, and needs to be prevented by the operator. If the operator cannot make sure their node does not reset abruptly – perhaps they should not be hosting a node. It’s a bare minimum that is required: keep it running. Even graceful restart of the storage appliance is an ordeal that shall virtually never happen, much less abruptly.

Lol. Those who don’t power cycle the nodes and not kill the storagenode process don’t run into these issues… so… how about… don’t do it!?

This is thre times too much. What changes did you implement to prevent this in the first place? Or at the very least, after the first time?

It is against ToS to prevent node from using specified amount of memory. If the disk subsystem is insufficiently performant node can use abnormal amount of memory – either disk subsystem needs to be fixed, or more memory provided. Under no circumstances shall node be killed.

Also, don’t power cycle compute devices, much less storage appliances. if you want to do that – you need to ensure all wires are atomic, but if you do – you won’t be able to run the node as you will run out of IOPS even at the current utilization.

Ensure that graceful OS shutdown waits enough time for the process to exit, and does not kill them abruptly (this is often an issue on windows)

Do those SSDs support PLP? How will putting databases on an SSD help with abrupt power resets? Data in flight will still get lost.

Advice not to use BTRFS was posted on the forum many times too…

If you are hoping that snapshots will save you – they won’t. You can’t snapshot open database and expect to have it in the consistent state in the snapshot. If you want to backup databases you need to ether have all client connections closed and disks synced at the time of snapshotting, or you’ll need to backup the exported data from live databases.

I strongly believe automatic repair shall not be implemented. Data loss is a catastrophe, and shall be noticed, measures shall be taken to prevent it from occurring again, and only then the node shall be started again. Otherwise it’s a massive waste of time.

JWvdV · June 14, 2023, 7:33am

I now understand, why it’s an arrogant rabbit ;p

This is data loss, but selectively to the databases. So these nodes have never missed out on an audit. So I doubt that it has the same root cause. Also because 2 of 3 times, there were no *.db-{wal,shm}-files.

At the beginning, I was tweaking some things concerning networking. Besides, in the end you sometimes end up rebooting your PC due to some updates or something. Cheap advice, isn’t hard…

Kind of annoying, but they all have assigned 2GiB at least. But in this case it was a hung process on the host, why one qemu-VM was killed.

No, but it’s just normal rebooting the system in which is already happening. So I hypothetised it might be due to hung/slow processes, taking longer than the timeout; maybe worsened by the fact the data and databases were in the same (hard) drive. Besides, SSD is faster in writes so you have less time the data is in flight.
Besides I did a full memtest, turning out without issues.
I even pointed my VMs to CPUs, so they would be less likely to stall each other.

So you advice me to stop the node for a while?

No, but these btrfs snapshot are filesystem consistent. So no syncing troubles. Besides, the *.db-{wal,shm}-files are being snapshotted too. Both times I used this approach, sqlite was able to recollect them. And otherwise there is another snapshot of 6h before…

But again, I’ve never heard of and seen an application with so many sqlite-database troubles. And welcome in real life, where updates sometimes mess with your connections forcing you to reboot, where there may be power glitches whatever you do to prevent them, hardware defects or other unforeseeable events in which aren’t signs of bad SNO-ship per se. In which it would be helpful if the databases aren’t giving you trouble.

JWvdV · June 14, 2023, 7:37am

I believe there is a difference between node data and stats. This probably hasn’t the same root cause as data loss from the node, which you will notice by missing out on audits. And as I already argued, I am almost sure it’s a different thing because I never missed an audit (or at least it’s still 100%). So I believe it should be implemented, and otherwise I’ll be doing it myself. Not in the least, because of the strong opinions conveyed in this thread

Besides the relation between data loss of the blobs and malformed database issue, isn’t being mentioned in the forum or in the manual.

arrogantrabbit · June 14, 2023, 8:23am

Dangit! My cover’s been blown! quick! Regroup and disappear into the sunset…

But even if I exaggerated — it was just a tiny bit.

Database are written to constantly. Node data - once ever. Also, audit is very weak in terms of detection, it’s a last resort — if it catches anything — node is probably already dead. You can have massive amount of objects missing and not have an audit failure for years.

Databases on the other hand — you’ll see immediately.

Data loss is data loss. It does not matter what is losses because if anything is lost at all — game over, the system is broken.

Great that you brought it up. Journaling only works if underlying filesystem does not lie about writes atomicity. But if you keep sync writes on — you’ll node will choke: too much IOPS. So sensible advice is to disable sync on the mount where storj keeps data and especially databases. This unlocks tremendous performance, but now you can’t afford to reset. It’s a fair trade off.

on reboot the is shall send graceful exit signal to processes (varies by OS) and wait for processes to exit. Patiently. Storagenode can take a long time to exit — limited by drive IOPS. Desktop OSes may have lower thresholds for patience and kill a process. That’s a problem.

Less is not zero. Data loss probability must be eliminated, not merely reduced.

those are once in a lifetime events each. You said you had three of them in a row. That means your setup is inadequate (no offense intended here).

Stable systems does not need updates more often than once per quarter. And it does not need to be rebooted if one subsystem fails. UPS keeps it powered forever and manages gracefull shutdowns when power is lost. Graceful shutdown waits for apps to exit and syncs disks befor halting. There is zero opportunity for data loss of any kind. This is reality I’m experiencing.

Cpu is not a problem here, only disk latency.

Yes. Or export data from a live database, if you rightfully don’t want to stop. It’s done atomically. But none of that would be required if your hardware and software did not allow data loss in the first place.

See above, you can lose quite a lot of data and never fail audit, node data is write-once; databases are constantly rewritten so corruption is immediately evident; if databases are on SSD without PLP your chances to lose data is much, and order of magnitude, higher, because of read-modify-write nature of SSDs. Instead of putting sata to SSD I would turn off sync for the mountpont and ensure no abrupt resets.

I completely understand your point and wholeheartedly disagree with it: time is better will be spent and with better outcome preventing data loss than concocting wicked recovery techniques.

Why would that matter? Data loss of any kind is bad and shall not happen. If it does — the appliance in current form is not suitable for storing data.

In one of the threads I suggested to keep databases on tmpfs: it’s evaporates on reboot and will always be consistent. The metrics are useless anyway. I personally never look at them.

But this will be with the understanding that your storage is ~~unreliable crap~~ sufficiently reliable to host storage node data where some data loss is acceptable and accounted for in the design.

But there is no other usecase that can tolerate such a ~~total garbage~~ low reliability storage service; which means you are running node on a hardware configured specifically to run a node, which goes against the recommendation to only use unused resources.

So, my node is on an array that guarantees data consistency, where all other my data lives. Databases are on the same array. I never had data loss. Neither with storj last year since day one, nor ever in the past 15 years of running servers at home… (and not planning to).

This is why any data loss is a catastrophe if you are running storage node as designed.

JWvdV · June 14, 2023, 1:20pm

This is quite a bit of exeggerating indeed. Taking my smallest node of about 250GB of data now (started one week ago), for this example on which I get daily 25 audits at the moment. If you lose a substantial amount of files like 0.1%, 0.5% or even 1%; the chance it won’t be detected within a week is 84, 41 and 17% (={0.999,0.995,0.99}^(7 x 25)) respectively and 47, 2 and 0.05% (={0.999,0.995,0.99}^(30 x 25)) within a month. For bigger nodes with the same relative amount of data loss, these figures are considerably less, because they get more audits.

Well, it actually does matter. Because data loss of the blobs is taken care of and accounted for in the STORJ-design. That’s also the reason why it’s adviced against to run STORJ-nodes with RAID>0.

The reason I brought it up is, because if the node has made a clean exit than the *.db-wal and *.db-shm files are being cleaned-up and written to the node. And you end up with only *.db-files. That’s also the reason why I doubt it has to do with timeout running out or something.

I’m working on Debian (without desktopt). Timeout is given bij docker -t 300, so 5 minutes, which is the same for qemu-VM. Usually more than enough, because usually it’s less than 30s with about 10 drives.

That’s the opinionated part, I spoke about. Because that’s even not an advice from STORJ itself. I refer you to the part stated above. It also won’t ever become zero. Because everything has it’s lifetime.

Oh, the first month I for sure was fidling around sometimes and trying to optimize the whole thing. So I may have been rebooted sometimes 10 times a day the first weeks. But even then, I never had any application using sqlite (of which are very many around) which I ran in trouble with. For example, Syncthing, Plex Media Server and Home Assistant are running on the same systems using a much bigger databases (with also not just one table in a file) with the same sqlite-driver. Those applications have seen many more reboots and even unanticipated power cycles, but I’ve never had any problem with them.

So yeah, I could be the problem here. As I see, SNO’s are often the problem and I won’t be an exception. Also because there are not always clear guides on how to make a rock solid node, so you have to tweak sometimes a little bit. But there’s also a peculiarity going on at the side of STORJ if you ask me.

Fine, but IO can be quite stalling for the CPU. Besides pinning the whole process to one CPU, keeps the system from swapping memory around. And I can tell you, since I did it the CPU-load decreased over 20%.

I can imagine this, albeit it might be written to the disk at exit of the node (and maybe so now and then in the meantime).
I myself was thinking of installing anything-sync-daemon, which is kind of the same thing.

This is making assumptions, as I wrote before in this reply also Home Assistant, Open Media Vault, Syncthing and Plex Media Server are running on these systems. Aside from some VPN-servers. Even if I were using it on dedicated hardware, what would be the problem if I saw it as a hobby or lived in a situation I could earn money because of electricity and hardware isn’t everywhere the same price.

That’s great. But also against the recommendations, as I wrote before. I also never suffered any data loss on Home Assistant, Open Media Vault, Syncthing, and so on. Because of having them in RAID1 at the first place, but I’m also tempted to think these applications are quite more stable.

In my world there is a difference between an irritation, a little problem, a problem, a big problem, a small disaster, a disaster, a big disaster, a small catastrophe, a catastrophe… STORJ never became higher than a problem, data loss never higher than a little problem.

arrogantrabbit · June 14, 2023, 7:36pm

This is demonstrably false.

On my node 9TB, logs rotated daily:

# bzgrep GET_AUDIT storagenode.log.0.bz2 | wc -l
    2482
# find /mnt/pool1/storagenode/blobs | wc -l
 17701087

therefore every day 0.02% of random files get checked. This is also happens to be probability of detecting one single corrupted block in a day. Therefore, corruption can stay undetected on average for 2500 days, or 6 years.

Yes. For storj, it’s OK to lose some of blob data. But that’s it. We are talking about your appliance, that also runs storj. That your appliance demonstrably loses data. Full stop. Fix that.

This is misguided advice, and contradicts the guidelines of re-using existing hardware, and has nothing to do with the present discussion: raid only addresses rot and bad sectors and other media failures, not filesystem promises. See below.

You are making a few implicit assumptions here, that may (or likely not, seeing that you see corruption) not be true. Most common ones are

broken file locking. Docker had (still has?) flock not working over bind points, and therefore it breaks the assumption sqlite makes.
broken fsync. Same deal with docker not fully implementing this.
Unclean power off of an SSD without PLP.
Process death (sqlite runs in-process)
Read more at How To Corrupt An SQLite Database File

So no, clean journal means nothing unless you satisfy these prerequisites and promises. And if you use docker – you already don’t.

Again, we are talking about your appliance, that you also happen to use for storj. You manage to lose data, so until you get to the bottom of it, you cannot trust your appliance to host important data.

It absolutely does become zero if configured correctly. Literally zero. Correct data or no data. Never corrupted data.

This actually confirms the point. Syncthing, plex, and home assistant write very little to the database. But just look at the write traffic storj sends to theirs. It’s massive; it’s the largest contributor to IOPS from the storagenode. So drastically lower traffic is one reason you haven’t yet seen an issue with them.

Another reason could be how you configure and run them (without docker, with correctly configured mounts, etc)

And yet, it does not matter. No amount of lack of issues prove anything, but even one failure proves the existence of a problem. So either storj has a bug with using sqlite (you can review the code) or your appliance is violating the assumptions and misbehaves.

I would start with throwing away docker. Run storagenode directly on your host OS. There is no benefit in containerization for go applications.

This is a brand new sentence, that makes zero sense. Zero. All CPU cores have access to the same memory controller and shared cache.

CPU load is irrelevant, and many other things could have changed to contribute to the apparent load reduction, including bugs in monitoring software.

SQLite does it internally anyway. You are suggesting to reinvent the same wheel around the database, instead of fixing the underlying issues that cause corruption in the first place. The suggestion for Ramdisk was to get rid of 100% useless IOPS.

Different write pressure, so irrelevant.

Not sure what you mean. You are running it on a shared appliance, and that appliance allows to lose data in SQLite of all things under moderate load, that happens to be from storj. You need to root cause and fix it.

Again, storj losing data is not a problem. Storj here is a canary showing you that your appliance is capable of losing data. In my world there are two types of storage devices – those that lose data and those that don’t. If the device lost 1 byte of data, I need to know why, and prevent that, or I cannot trust this appliance any more.

You seem to sleep well at night knowing that your storage appliance misconfigured to a degree of data loss under a moderate conditions, kudos to you. I can’t. I need to find the culprit and fix it. Not for storj, but for all my other data; and I would be immensely grateful to storj for generating use pattern that uncovered this vulnerability.

JWvdV · June 14, 2023, 9:49pm

It’s not. Because your example is just one corrupted block of data, and I was talking about 0.1, 0.5 or 1.0% of the data. These are like apples and pears. Or to say it otherwise, if you lose today 0.1, 0.5 or 1.0% of data, chance you won’t find out in audit score even within a day {0.999,0.995,0.99}^2482={8,3E-4,1.5E-10}% and is about zero chance withing a week. But that one piece, might take forever… (if not deleted before audited).

Since STORJ is made fault-tolerant, just for the sake of data losses this is real nonsense. Like even using RAID>0’s, which isn’t anywhere stated as a obligation or even recommendation.

How to add an additional drive? - Storj Node Operator Docs

We recommend to run a new node, if you want to add an additional hard drive.
(…)
You can, of course, use RAID instead, but this reduncancy not required for the network’s operation - the network itself has a built-in redundancy due to the usage of erasure codes: the customer needs only 29 pieces out of 80 to reconstruct the whole file.

If you would like to use RAID anyway, please note - your node will not receive more customer data only because you use RAID, this will not affect the node selection.

Using RAID0 (LVM, spanned drives, JBOD, etc.) is not recommended - with only one disk failure the whole node is lost.

Even using RAID5/6 with today’s disks is too risky due to bit failure rate on disks with high capacity: High risk to lose a RAID5 volume during rebuild.

You can also read the discussion RAID vs No RAID on our forum.

Whoever wrote the SNO-handbook, I don’t know. But the advices on Hardware Requirements - STORJ SNO Book are even contradicting the official advice not to use RAID5. So, if you want to RAID all the data, it’s fine by me. I choose to start some additional storage nodes over time.

My appliances are running apart from STORJ in their own VM’s, that haven’t suffered from data loss whatsoever. That’s the whole point. Besides, the database of Home Assistant is over 3GiB, and it writes about 10 GiB a day (increase TBW of the SSD, a little bit different from real written data). That’s really an awefull lot more than the database of STORJ which is contributing less than 1GiB a day to the TBW per node in my case (remember: database and data are on different disks in my case). Don’t worry for the data loss, it’s RAID1 and being backed up every day to the openmediavault server (also RAID1 and other drives).

So indeed, different write pressure. So it’s that unbelievable STORJ manages to fail on me, but Home Assistant for example isn’t.

It will never become zero. For example, the chance of both drives failing at the same day assuming a life time of 10 years. That would be 7.5E-6%, assuming these are independent variables. But in practice this is much higher, because these drives more than often are about the same age. And also external influences, like a fire, lightning strike, flooding, war, nuclear bomb, … total earth destruction whatsoever make them probably fail together. After all, a RAID isn’t a back-up. For that I’m using Syncthing, syncing my real important files to two other locations (family members living elsewhere). But even then, my chance of loosing the data isn’t 0%. It’s small, but never zero…

I already cited the sqlite ‘how to make it crack’-manual some posts before.
But not using docker is a real good point, especially since I’m running those storagenodes in separate VM’s anyway. Any manual lingering around on how to do this on Debian/Ubuntu(-derived) Linux? Can’t find it actually a recent one online, only this oldie.

Jup, as long as it only pertains to a hobby project, I’m really fine with it. And considering the whole topic, I’m increasingly convinced it’s a STORJ-issue. Also finding the Plex Media Server database being over 5GiB and Syncthing database over 3.5GiB (which turns out to be a level-db BTW), which are being rebooted / power cycled the same way as the storagenodes. Aside from the already mentioned Home Assistant with bigger file size and higher write pressure to the database.

arrogantrabbit · June 14, 2023, 11:19pm

You’re right, it’s like apples and one apple

Yes, lets stop talking about this: I said in every post that storj does not care about data loss, but it losing data means your appliance can lose data, means it’s not save to store your important data either.

Let’s dig into this further. Does each service run in a separate VM? How is the storage that hosts databases provided to VM? Disk passthrough or some mount point? i.e who manages filesystem on that device – VM or host? Is the situation the same for home assistant and storj?

I was (still am) talking about possibility to return bad data. If both drives died the appliance will return no data. Remember – either correct data or no data. Never corrupted data.

Funny, how you said raid isn’t backup (correct, it isn’t) but then you say you use syncthing for backup (which isn’t either But let’s not go into that off topic.

This might be the root of your problem. file locks does not work across kernel boundaries. Using docker instead of a VM here would be an improvement actually.

There is no need for a manual. Storagenode is a command line utility. run it with --help, if will tell you what to do. If you want step by step – here is my “tutorial” in the form of a script, for FreeBSD, which is literally a list of things I did manually from --help command, and wrote in the shell file, so I don’t have to do it manually again. You can adapt it easily to systemd on linux freebsd_storj_installer/install.sh at 93c882b08e4ee724b63114d4c84598640dd6b7eb · arrogantrabbit/freebsd_storj_installer · GitHub

You are missing the point. Unless you fully understand why does storj databases on your device get corrupted, you can’t be sure the same underlying issue does not affect your other data.

I’m increasingly convince4d it’s your config issue. If this was storj issues everyone would have been affected, not just a minority of users. After all, it’s really hard to screw up sqlite API, especially since they just use go bindings, from a single process. But it’s very easy to screw up everything else around it in the filesystem and environment.

Plex barely writes to it. Storj writes all the time. If the issue affects writes – you will unlikely see it with plex or synching, but will see it with storj.

I’m not familiar with home assistant but I doubt it writes gigabytes of data daily. Where does that data come from? if you judging by fast SSD life decrease it could be due to sector size mismatch you are seeing write amplification (did you force 4k sector size when adding SSD to the pool?). But if genuinely writes gigabytes of garbage per day, and you share SSD the same way as with storj (across kernel boundaries) – run check of their databases too. I bet it’s corrupted too.

Alexey · June 15, 2023, 3:57am

Usually database corruption happened when the write cache is enabled, but node was abruptly stopped, or if you use unstable file system like BTRFS or network filesystems any kind or some configuration of Unraid.
From thousands SNOs only dozens have problems with database corruption, this suggests the non-optimal configuration in these cases.
For example, I did not have a database corruption since start in 2019, but my nodes working on NTFS and ext4, no RAID. NTFS ones has UPS, but ext4 is not (Rpi3). One Windows node is a binary node, two others - docker for Windows (so even worse, these two uses a network filesystem (p9) to access disks).
So, depends on the setup.

This is not considered as a normal situation, it requires investigation, because missing databases only smallest of the problems in such case.
Dumb automation doesn’t help, it will just hide the problem. And we likely will have posts - “I found a bug - my stat suddenly disappeared!”

Again, it’s not a common and normal situation. I against such automatization, which hides problems.

Unlikely. Usually you also have a data corruption in other places, just not discovered yet. Audits did not check the entire dataset, they are random for random parts of pieces, they are used only to determine, can the satellite trust your node or not, not to check the entire data integrity. If you have corruption, it will be catched eventually and your node could be disqualified.

This suggests the slow disk subsystem or other hardware issues, when the process cannot be stopped normally even after 300 seconds of timeout, if your OS respects this timeout at all.

Yep, you may read the Unraid forum for SQLite corruption for any application. So, it depends. And not always on the application, it also the underlying setup. Why I did not have an SQLite corruption for the last 4 years? And remained thousands SNOs?

then you probably should place databases back to the dataset, if it’s more reliable, than having them on SSD.

The only known feature request is

JWvdV · June 15, 2023, 7:47pm

Agree on this, sounds constructive.

Each process is running in a separate VM, using qemu.
Partitions are being passed trough, so no caching (isn’t also supported anyway) and the file system is all being managed by the guests.
Situation is the same for Home Assistant and for the storage-nodes: file systems are all in RAID1 (so two partitions on two different drives are being passed through for the root file system), formatted as BTRFS in order to be able to have snapshots and scrubbing
Two nodes are residing on exactly the same drives as is Home Assistant, the other four nodes are on different systems but have same configuration.
Root file systems are all on internal drives.
Most data drives for STORJ are USB drives.

Same stance here for personal data, therefore RAID + back-up of essential data.

How do you mean, are you pointing to the fact it’s not used on a freezed (but running) file system or something? Just out of curiosity.
As far as I’m concerned, the Syncthing use case is that all our phones are being synced real-time so photos aren’t being lost accidentally. And since we’re using multiple systems, our documents are the same at each system (of which we’re using one at the time anyways). The Syncthing-system is being snapshotted at regular intervals till 3months back. Besides, one time a day (at night) the whole system is being synced with NASses of two family members. And once in one or two months, I sync an external drive with all the data. The biggest concern in this, is that Syncthing doesn’t validate the data as well, which it easily could do but has been turned down as something the file system should be doing.

Home Assistant is a platform, you can use to connect smart home devices from different brands that usually don’t work togheter. The whole point is that every update of all systems is being stored which in my case is about 300-500/min, aside from logs that are being written to it. Many complaints about wear out of micro-SDs and SSD arose the last years. It used to write 40GiB/day per drive before I implemented some measurements (such like reducing logs and more selective event logging), since it’s 10GiB/day. Almost fully attributable to database writes.

Except from commit-time, no other caching is enabled since these are passthrough block devices.

First one was running on ext4, probably choke on too few IOPS because it was an external hard drive some years old. A reboot often took >3min. After a reboot, almost al databases turned out to be corrupted. Since I also lost some data on it and also saw many database errors on the forum, I decided to move the database to the internal SSD-drive (BTRFS in RAID). Besides, I had some other crappy drives lingering around and decided to combine them to one storage node using mergerfs with func.create=pfrd to distribute the IOPS. Rendering this storagenode as a “testing” node, since I consider this one as most unreliable (multiple old drives, accumulating the chance of failure). If a modification doesn’t knock out this node, it won’t probably knock the other nodes. Although funny is that this node, is getting about the most ingress of them all now.
Second one, was already started with DB on the data disk (external SSD this time), which I formatted XFS. It was running fine till one day after a reboot three databases turned out to be corrupted. All running fine since I moved the database to the internal drive.
Third one, was already on the internal drive. But some day a process on the host caused an OOM-situation, in wich the node process was killed. After that the databases turned out to be corrupted (my other post, in which I overlooked the database error).

For sure, but as @arrogantrabbit already correctly postulates some single failures will take ages to detect.

Great, that’s something I could underline.

I really doubt whether you see them all. For example, I had already three occurences and only reported one on the forum. People not caring their stats, just throw away the databases and will have a full running and functional node afterwards in most situations.

Jice · June 15, 2023, 9:26pm

So what is the main advice for not corrupting data? No reboot, no power lost and snaps/backup will not help? How to ensure yourself?

arrogantrabbit · June 16, 2023, 12:45am

All of the above.

no power loss, only gracefully shutdowns
no killing the node process
- give it as much time to shut down as needed.
- make sure OOM monitors don’t kill it
don’t cross kernel boundaries (I.e. don’t host database on the host when node runs in a VM)

I’m do you want to backup these databases — you need to export data from it and backup that data. But I don’t see personally a point in saving them. They are purely cosmetic. I would love an option to avoid generating them. I don’t really care about nice graphs in the dashboard.