My “bandwidth” database is corrupt… yay… file is not a database. GREAT
Utter bullcrap.
I’m out!
Edit: This was written in anger when my setup failed out of the blue. Mostly because of the responses here and people like beast and littleskunk I still seem to have a few tries in me to try again.
What was pointed out to change on my next try is this:
Sorry for the initial charged post, but in the end I’m happy to have posted it as people like these here truly inspire to keep me at it.
Like I said, it’s the 4th time this has happened. I did all the steps, I spent a long time with support trying to figure stuff out… all times… oh well, nothing more to do than request new auth token and start over.
It’s about some definition of insanity of doing something over and over while expecting different results.
I am aware of those sqlite3 fix, dump, recreate stuff. None have worked.
All of my databases have survived 1 power outtage and at least 5 hard resets (out of memory and not responding to restart attempts). I never had a single database corruption even with these bad circumstances. I am not sure how you managed to corrupt your database but it can’t be the storage node itself because than it would have hit me as well.
What did you do during the other 3 db corruptions? Wait until it happens again or did you try something to fix the problem?
I also had some hard resets, HDD disconnects (because I use 2 external HDDs), killing the nodes, etc. Still working fine.
Doesn’t mean it can’t happen, but 4 times? There got to be more to it than just “randomness”.
“If an application crash, or an operating-system crash, or even a power failure occurs in the middle of a transaction, the partially written transaction should be automatically rolled back the next time the database file is accessed. The recovery process is fully automatic and does not require any action on the part of the user or the application.”
Extracted from https://www.sqlite.org/howtocorrupt.html
Power outages and hard reset are not the worst things.
More likely memory, hard disk sync issues related to the hardware of the storagenode.
It started completely randomly at ~140+ h uptime. No discernible change.
Then it gradually got worse, then it stopped working at all. Periodic errors, but only from 18UWP
Most uploads / downloads completed correctly.
Although I use my computer on the regular this HDD is only used by STORJ. I have 32GB of RAM, mostly using only about 10-15GB
Do you have error messages you can present? Otherwise it’s just an empty claim nobody will be able to help you with.
What made you think that you have periodic errors?
it looked like this sattellites messages caused the most issues.
That tried to access the bandwidth.db file.
Every incoming and outgoing message of your node accesses bandwith.db file so if the satellite 18UWP was the most active, then of course this one caused the most issues.
For this reason, SQLite does a “flush” or “fsync” operation at key points. SQLite assumes that the flush or fsync will not return until all pending write operations for the file that is being flushed have completed. We are told that the flush and fsync primitives are broken on some versions of Windows and Linux. This is unfortunate. It opens SQLite up to the possibility of database corruption following a power loss in the middle of a commit. However, there is nothing that SQLite can do to test for or remedy the situation. SQLite assumes that the operating system that it is running on works as advertised. If that is not quite the case, well then hopefully you will not lose power too often.
A solution may be to re-write the SNO software to be database agnostic and make default suggestion that the node operator utilize a more robust database such as postgresql.
I made this suggestion a few weeks ago and have been looking at the problem myself on and off since then. There are various libraries that could be used to connect and utilize postgresql databases rather than the current hardwired sqlite.
It was brought up that use of postgresql would require a network connection. However, this is not the case on GNU/Linux hosts. A connection can be accomplished using Unix Domain sockets. Here’s a sample implementation in Go. Unix Domain sockets do not utilize networking. The connection is made through the filesystem and so would not have any networking overhead. This is the default connection method for postgresql local users on GNU/Linux systems and it is very fast.
I will continue to look at how to put it all together for a more robust DB connection… but, I’m just another SNO and have other responsibilities in the real world.