So long and thanks for all the fish

Well, after 2+ years of running my 6TB storagenode, the colocation of the sqlite files and the blob storage has finally killed my node. I attempted to come up with a workaround, but it isn’t possible w/o a code change.

My “issue” was identified some time ago in this GitHub ticket but hasn’t gotten any love. Being able to separate the slower blob storage from the more time sensitive sqlite DB files is why the moderators and the documentation are always saying that you should use local storage only. Well, if storj is really a “rent your unused storage” kinda project, my excess storage is on (usually) fast attached storage.

I’ve added my notes there and subscribed to updates hoping it’ll get some love so I can return to the storagenode family. I believe the fix is quite easy so any upvotes on the issue would be appreciated.
Until then, best of luck folks!

1 Like

Someone can correct me if I am wrong, but I think this was implemented through a setting in config.yaml

# directory to store databases. if empty, uses data path
# storage2.database-dir: ""

Just mount the path via your run command and point the SN to it?

There are databases in 2 directories – the root data dir and the storage subdirectory:

➜  find . -type f -name "*.db"
./storage/pricing.db
./storage/notifications.db
./storage/bandwidth.db
./storage/storage_usage.db
./storage/piece_spaced_used.db
./storage/pieceinfo.db
./storage/piece_expiration.db
./storage/used_serial.db
./storage/info.db
./storage/reputation.db
./storage/heldamount.db
./storage/orders.db
./storage/satellites.db
./revocations.db

Looks like there is a config option for the revocations.db as well

# url for revocation database (e.g. bolt://some.db OR redis://127.0.0.1:6378?db=2&password=abc123)
# server.revocation-dburl: bolt://config/revocations.db

Although I haven’t seen anyone talk about this one yet.

The last access time for the revocations.db was 2 days ago during the last reboot of my node. You don’t need to worry about that one. It won’t impact IO. It’s also not an SQLite database. I don’t think it’s sensitive to the issues the other db’s have either. It’s also non-vital.

I am already using the separate db location on 2 nodes and it works like a charm. One of them would definitely not be able to work if I didn’t have the option to store db’s elsewhere.

Look into this thread, here is a solution.

2 Likes

ok moved all the DB files onto the fast primary disk (a directory I’m mounting to /app/config/dbs in the docker container. Indentity is in usual place. Everything else on the /app/config mount (slower disk) including the config.yaml.

Changed these 2 settings:

server.revocation-dburl: bolt://config/dbs/revocations.db
storage2.database-dir: /app/config/dbs

It is important the second one is an absolute path – the relative path puts it under storage which is what needs to be avoided (slow disk).

Checking lsof for open file descriptors paints an OK picture. I do see files showing up in the blob storage directory. I’ll run it like this for a while and let you know.

But I think you may have cracked it @baker!

Launch configuration for others should they care:

    --mount type=bind,source="<IDENTITY_DIR>",destination=/app/identity \
    --mount type=bind,source="<SLOW_DRIVE_DIR>",destination=/app/config \
    --mount type=bind,source="<FAST_DRIVE_DIR",destination=/app/config/dbs \
1 Like

Looks like we’re not getting your graceful exit or repair traffic after all. :wink:

Good to see this could help. Do report back after you’ve had some experience with this!

3 Likes

Yea would have loved to exit gracefully, but the corrupt DB files had other ideas. Can’t exit if you can’t start :wink:

1 Like

Did you do it just the way @baker mentioned above?

directory to store databases. if empty, uses data path
storage2.database-dir: “”

I use SSDs (sometimes even fast NVMe stuff) as OS drives in all my nodes - should I put the db files there?

I did, but if you’re not having problems don’t do this. It’s easy to mess up the procedure and even if you do it right, you’ve just introduced another point of failure for your node. So only do this if your setup relies on it.

The only reason I’m using it is because I have that node running on an old drobo unit which is notoriously slow, usb 2 and ntfs on Linux. A perfect storm of horribleness. Pretty sure that wouldn’t have worked without this fix. If that’s not your situation, don’t bother with this.

3 Likes

That was sort of the thing I expected to hear, thanks