Storage node database is locked

My storage node has one big problem at the moment. Every full hour it is sending all orders to the satellite. This process takes a few minutes and is locking my sqlite3 database for a long time. In my logfile I get a lof of upload, download and audit requests. I am not responding to them for 5-20 minutes.

Lets take a look whats happening in the background. By default the storage node opens a random debug port and you can get a lot of information from it. You can use a specific port by adding debug.addr: ":7777" to your config file.

curl localhost:7777/mon/ps | less is showing what the storage node is doing at the moment. In my case it was showing many uploads and downloads blocked by a database query for 5 minutes:

[562667889748747363] storj.io/storj/storagenode/piecestore.(*Endpoint).Upload() (elapsed: 4m12.620860502s)
 [1612100082723656965] storj.io/storj/storagenode/piecestore.(*Endpoint).VerifyOrderLimit() (elapsed: 4m12.62057045s)
  [784185589668157969] storj.io/storj/storagenode/storagenodedb.(*usedSerials).Add() (elapsed: 4m12.619908614s)

We looked into it and identified multiple issues:
1.) The database is locked and that is blocking everything.
2.) On each request we are checking used space with a database query that will sum up more than 1 million entries.
3.) On each request we are checking used bandwidth. Again a table with more than 1 million entries.

Now we try to fix these issues one by one. Stay tuned :slight_smile:

2 Likes

I’ve seen the same on my node, nice work identifying it!

1 Like

Maybe worth for an issue or idea. I would expect that the used space and bandwidth is only loaded/calculated once at startup and from then tracked in ram.

1 Like

great thought – also remember that you can submit ideas via our ideas portal on https://ideas.storj.io if you want our product manager to see it quickly!

Possible solutions are:

1.) Don’t lock the database so long: https://github.com/storj/storj/pull/2410
2.) Calculate the used space on startup and update it in memory.
3.) Rollup the bandwidth usage table. Instead of more than 1 million entries we can archive the same goal with only 60 entries (2 month) per satellite.

If you have any other ideas how to improve the storage node performance I am happy to create issues for it :slight_smile:

1 Like