Node suspended / userserialsdb: database is locked

jeffdoo · May 13, 2020, 1:19pm

My node has been suspended from:
satellite.stefan-benten.de:7777

The only audit log error which repeats daily for a period of time is:

2020-05-13T11:16:09.355Z ERROR piecestore download failed {“Piece ID”: “KMVZZYRK6DTB6RQBQOCDZWSUYGSIS7NPDZWZTPAOFUQDD2HSFXHA”, “Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Action”: “GET_AUDIT”, “error”: “usedserialsdb error: database is locked”, “errorVerbose”: “usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”}

There isn’t anything I can to about a “database is locked” error and it is the only reason for the audit failures for all of the satellites (happy to provide the entire grep output).

Thanks!

Jeff

BrightSilence · May 13, 2020, 1:24pm

This issue is being looked into already, there is indeed not much you can do. You can try to vacuum the db and defrag it. But that doesn’t help for everyone unfortunately.

jeffdoo · May 13, 2020, 1:31pm

Forgot to mention I have started manually performing a vacuum on the database(s) and did one just this past weekend; with the Storj node shutdown of course. The first vacuum of a 10 month running system did reduce the databases sizes by ~1/3. Currently my orders/pieceinfo/usedserial/bandwidth DBs are 540/322/56/15 MB respectively and everything else is < 1MB…

Jeff

nerdatwork · May 13, 2020, 2:47pm

Welcome to the forum @jeffdoo!

How is your HDD connected ?

jeffdoo · May 13, 2020, 9:10pm

That is a little bit involved… and I admit I probably did it wrong …

At the hardware level I have four 3TB drives connected via SAS and exposed as JBOD.

The host OS is FreeBSD 12.1 with the four drives pooled together via ZFS and defined as a “device” which is mounted in the bhyve VM Ubuntu 18.04.4 LTS instance (recommendation in the FreeBSD manual). The VM image file is stored on a 2nd set of four 6TB SAS drives exposed as JBOD and configured as a ZFS mirror (2x2). Within Ubuntu I created a “storj” ZFS dataset on the exposed ZFS device. The docker image is stored within the VM img file while the actual storj storage location is the ZFS “storj” dataset. The VM has been given 6 CPUs and 16GB of RAM with 12 GB dedicated to ZFS ARC.

Eight hours after stopping the node, performing a vacuum (not too much recovered) and restarting the node it is no longer showing a suspended warning on the web dashboard. All nodes show 99.9% Uptime and 100% Audit, except stephen.de which shows 99.8% Uptime and 100% Audit.

Jeff

BrightSilence · May 14, 2020, 11:42am

Big OOF on the use of JBOD there… I guess you like risky setups. But that’s beside the point for this discussion. I don’t think your setup would have significant impact on disk IO, so it seems you did what you could to work around this issue. Lets hope they fix this problem soon.

jeffdoo · May 14, 2020, 5:07pm

Well technically speaking the LSI SAS2X28 controller is in IT Mode so really the drives are just straight up pass-through…

Jeff

fmoledina · May 14, 2020, 7:18pm

From what you’re saying, the disks are passed through the SAS2X28 expander to some HBA, which I’m assuming uses an SAS2008 controller, in IT mode to FreeBSD.

What is the vdev configuration for this? Just RAID0? If so, I agree with @BrightSilence, this is not safe at all.

The rest of the config, though it may seem convoluted, should work fine.

jeffdoo · May 15, 2020, 10:17pm

Yes, it is basically RAID0… In this case it’s a “to each their own” scenario as I made a decision to repurpose these “proven” drives for Storj without redundancy and instead go for maximum storage (alotted 10TB which is almost full). All of my other local ZFS instances I personally care about are mirrors with periodic snapshots and my remote backup server is ZFS RAID1 with its own snapshot schedule (my wife would have a fit if we lost ~19 years of digital pictures; dates back to a Coolpix 990).

As for the convoluted configuration it’s simply because I could not figure out how to get Storj to run natively on FreeBSD (my preferred server OS) and opt’d to try the VM route with Ubuntu which I then understood docker to be the preferred method for deployment. It works… and, I have AT&T Gigabit Fiber (1gb up/down)…

Thanks!

Jeff