I'm done... 4th time the database has randomly corrupted FTS. I'm out!

My “bandwidth” database is corrupt… yay… file is not a database. GREAT
Utter bullcrap.

I’m out!

Edit: This was written in anger when my setup failed out of the blue. Mostly because of the responses here and people like beast and littleskunk I still seem to have a few tries in me to try again.

What was pointed out to change on my next try is this:

Sorry for the initial charged post, but in the end I’m happy to have posted it as people like these here truly inspire to keep me at it.

2 Likes

Quitting is easy. To find the problem & applying the solution takes time. Are you using the same disk and operating system from last 3 attempts ?

Like I said, it’s the 4th time this has happened. I did all the steps, I spent a long time with support trying to figure stuff out… all times… oh well, nothing more to do than request new auth token and start over.

It’s about some definition of insanity of doing something over and over while expecting different results.

I am aware of those sqlite3 fix, dump, recreate stuff. None have worked.

All of my databases have survived 1 power outtage and at least 5 hard resets (out of memory and not responding to restart attempts). I never had a single database corruption even with these bad circumstances. I am not sure how you managed to corrupt your database but it can’t be the storage node itself because than it would have hit me as well.

1 Like

classic admin answer “works for me” :wink:

2 Likes

Did you had 1 power outage and 5 hard resets? I am asking a serious questions. My storage node should be the one that gets corrupted but it is not!

3 Likes

What did you do during the other 3 db corruptions? Wait until it happens again or did you try something to fix the problem?

I also had some hard resets, HDD disconnects (because I use 2 external HDDs), killing the nodes, etc. Still working fine.
Doesn’t mean it can’t happen, but 4 times? There got to be more to it than just “randomness”.

1 Like

@kajar9 , please, tell us, how is your HDD connected to the host with storagenode?
What is your host OS?
What is the filesystem on your HDD?

1 Like

“If an application crash, or an operating-system crash, or even a power failure occurs in the middle of a transaction, the partially written transaction should be automatically rolled back the next time the database file is accessed. The recovery process is fully automatic and does not require any action on the part of the user or the application.”
Extracted from https://www.sqlite.org/howtocorrupt.html

Power outages and hard reset are not the worst things.
More likely memory, hard disk sync issues related to the hardware of the storagenode.

Regular SATA3 HDD, few months old, fully checked for bad blocks, single disk, single node.
Windows 10 Enterprise, NTFS

It started completely randomly at ~140+ h uptime. No discernible change.
Then it gradually got worse, then it stopped working at all. Periodic errors, but only from 18UWP

Most uploads / downloads completed correctly.

Although I use my computer on the regular this HDD is only used by STORJ. I have 32GB of RAM, mostly using only about 10-15GB

I have had 1 crash that happened more than 3 weeks ago. Otherwise fine. UPS protected.

In my opinion it should be the most ideal conditions, yet this has happened over and over.

Periodic errors, but only from 18UWP

What does that mean? Errors about harddrive or database are not satellite dependant… What kind of errors?

Storage Node Recovery mode is advisable.

I might have missed them on others, but it looked like this sattellites messages caused the most issues.
That tried to access the bandwidth.db file.

What commands do I need to use to enter that recovery mode since it certainly did not automatically do that for me.

Do you have error messages you can present? Otherwise it’s just an empty claim nobody will be able to help you with.
What made you think that you have periodic errors?

it looked like this sattellites messages caused the most issues.
That tried to access the bandwidth.db file.

Every incoming and outgoing message of your node accesses bandwith.db file so if the satellite 18UWP was the most active, then of course this one caused the most issues.

there is no recovery mode so far. I suggested it would be a nice feature to avoid losing Storage Node data because of a db corruption

1 Like

This may be the underlying problem:

SQLite Atomic Commit Documentation

For this reason, SQLite does a “flush” or “fsync” operation at key points. SQLite assumes that the flush or fsync will not return until all pending write operations for the file that is being flushed have completed. We are told that the flush and fsync primitives are broken on some versions of Windows and Linux. This is unfortunate. It opens SQLite up to the possibility of database corruption following a power loss in the middle of a commit. However, there is nothing that SQLite can do to test for or remedy the situation. SQLite assumes that the operating system that it is running on works as advertised. If that is not quite the case, well then hopefully you will not lose power too often.

A solution may be to re-write the SNO software to be database agnostic and make default suggestion that the node operator utilize a more robust database such as postgresql.

I made this suggestion a few weeks ago and have been looking at the problem myself on and off since then. There are various libraries that could be used to connect and utilize postgresql databases rather than the current hardwired sqlite.

It was brought up that use of postgresql would require a network connection. However, this is not the case on GNU/Linux hosts. A connection can be accomplished using Unix Domain sockets. Here’s a sample implementation in Go. Unix Domain sockets do not utilize networking. The connection is made through the filesystem and so would not have any networking overhead. This is the default connection method for postgresql local users on GNU/Linux systems and it is very fast.

I will continue to look at how to put it all together for a more robust DB connection… but, I’m just another SNO and have other responsibilities in the real world.

I have also problems with one drive, try crystalDiskinfo its free an you can read all params from SMART perhaps its the drive and not the db.