I’ve been measuring I/O on both nodes on this system and I’m at a loss to explain what I’m seeing. Both nodes are seeing a very similar ingress rate. Here is 60 seconds of iostat averages:
sdc is the disk for the node that’s having trouble. This is after vacuuming and defragging all databases, and limiting the node on sdc to three concurrent uploads. (Before, sdc was seeing 300+ tps and I think this was causing the lock timeouts – the disk was just too busy.)
Both HDDs are exactly the same model. Something about the storagenode on sdc is doing a ton of write I/O and I’ve no idea why. It doesn’t make sense.
No, that’s not it as far as I can tell. Note that the read kB/s doesn’t match the written kB/s. Over a 60 second window, if GC was the issue I’d expect those to be approximately equal, or at least much closer than 53 and 14,172. I also had a suspicion it was related to mass trash deletions so I monitored the output of find trash/ -type f | wc -l and it didn’t change over the course of several hours.
2020-07-07T19:28:24.676-0300 ERROR piecestore failed to add order {"error": "ordersdb error: database is locked", "errorVerbose": "ordersdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*ordersDB).Enqueue:52\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).saveOrder:665\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:312\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:328\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-07-07T19:28:26.255-0300 ERROR piecestore failed to add order {"error": "ordersdb error: database is locked", "errorVerbose": "ordersdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*ordersDB).Enqueue:52\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).saveOrder:665\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:409\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-07-07T19:28:27.465-0300 ERROR piecestore failed to add order {"error": "ordersdb error: database is locked", "errorVerbose": "ordersdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*ordersDB).Enqueue:52\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).saveOrder:665\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:409\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-07-07T19:28:28.823-0300 ERROR piecestore failed to add order {"error": "ordersdb error: database is locked", "errorVerbose": "ordersdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*ordersDB).Enqueue:52\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).saveOrder:665\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:409\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-07-07T19:28:31.277-0300 ERROR piecestore failed to add order {"error": "ordersdb error: database is locked", "errorVerbose": "ordersdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*ordersDB).Enqueue:52\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).saveOrder:665\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:409\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:996\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:56\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
What kind of errors are these? should I be worried? What things should I look to correct these errors and that will not happen again?
My HDD is connected via SATA 3
I am very new to Storj and just setup my first node tonight. I am running on Debian 10 (Buster) with Docker and running into the same “database is locked” error. I run sudo docker run ... and tried both with and without --privileged option. The behavior is the same error for me in both cases.
The “Database is locked” error is not related to “failed to start”, so moved to the appropriate topic.
You do not needed to specify the --privileged option, unless you are on QNAP. The issue with QNAP is investigated in a separate topic and have nothing common to your problem.
The “Database is locked” related to the slow disk subsystem or network connected drives.
How is the HDD connected to your PC?
My HDD is connected by USB 3 in a pi4 4GB and I have de same problem.
Could it be that the error: “database is locked” causes the uptime restart?
And SMR Disk can be the problem?
I am running in a VM and had the mount via a CIFS share for some other setup work I was doing. I’ve since moved the device to be locally attached, which has solved the issue. Docker starts successfully now.
Thank you for pointing me in the right direction.
CIFS is not compatible with SQLite and thus - with storagenode. The NFS will not work too,
The only working network protocol is iSCSI.
But even then your node will loose the race for almost all pieces - there are a lot of nodes with direct attached drives.
So, as you figured that out - the local attached drive is the only working solution.