Overall health of my nodes

Hi,

My main and first node is about 9 months now. I have a quite good held amount (>170$) and want to make sure my node will not die before being able to recover the half of it (month 15).

So I want to make an overall check-up.
If I well understood, I can do it with the following command (source):
for sat in `docker exec -i storagenode wget -qO - localhost:14002/api/sno | jq .satellites[].id -r`; do docker exec -i storagenode wget -qO - localhost:14002/api/sno/satellite/$sat | jq .id,.audit; done

Below is the result:

“118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”
{
“totalCount”: 24764,
“successCount”: 24756,
“alpha”: 19.99999999999995,
“beta”: 0,
“score”: 1
}
“1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”
{
“totalCount”: 6345,
“successCount”: 6341,
“alpha”: 19.99999999999995,
“beta”: 0,
“score”: 1
}
“121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”
{
“totalCount”: 12694,
“successCount”: 12689,
“alpha”: 19.99999999999995,
“beta”: 0,
“score”: 1
}
“12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”
{
“totalCount”: 22157,
“successCount”: 22149,
“alpha”: 19.99999999999995,
“beta”: 0,
“score”: 1
}
“12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”
{
“totalCount”: 29041,
“successCount”: 29031,
“alpha”: 19.99999999999995,
“beta”: 0,
“score”: 1
}
“12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”
{
“totalCount”: 52,
“successCount”: 52,
“alpha”: 18.680586036442538,
“beta”: 0,
“score”: 1
}

I don’t know what to think about it…
Could anyone help me to understand it?
Is my node is in a good shape?
Should I run other commands to make sure everything is fine so far?

Thanks a lot!

Your scores look good. If you want to check you can search your log for download failed and GET_AUDIT. These are failed audits. If dashboard has no notifications about suspension then you are all good.

1 Like

Thanks.

Actually, I have some failed downloads but also some failed recoverable audits. What should I figure out?

Unrecoverable failed audits: docker logs storagenode 2>&1 | grep GET_AUDIT | grep failed | grep open -c

0

Recoverable failed audits: docker logs storagenode 2>&1 | grep GET_AUDIT | grep failed | grep -v open -c

14

Failed downloads: docker logs storagenode 2>&1 | grep '"GET"' | grep failed -c

2961

I had to shutdown the node long time ago for some minutes or hours. Could the failed downloads be related to it?
Is my number of recoverable failed audits important? What should I do about them?

Failed “GET” downloads are not a problem, simply means your node was slower than other nodes.
Failed recoverable audits are not a problem either because those will be retried. Only failed unrecoverable audits are a problem.

2 Likes

Can you show these?

docker logs storagenode 2>&1 | grep GET_AUDIT | grep failed | grep -v open

Without the -c it will show the error lines instead of the count. They’re probably database locked errors, this is something being looked into currently, but I just want to make sure there is nothing more serious.

Thanks for your help.
Below is the result:

|2020-05-02T07:01:07.559Z|ERROR|piecestore|download failed|{"Piece ID": "KX6C7OTQKUL4T7DM7MK3UWKVGH44G56UMVPKRYMCHOM4SYX3BVJQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|---|---|---|---|---|
|2020-05-02T09:01:08.656Z|ERROR|piecestore|download failed|{"Piece ID": "6BBTYYOQLQTKOEH2MCJD25DGUF7VR2M4MZAGI6OTTNXQK34YUWFA", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-03T23:39:59.967Z|ERROR|piecestore|download failed|{"Piece ID": "ZVX2YQDSOF3FMIQ7EEGUFRT46VDA2HSSI2FDRLYDQVTIC2TPHJEA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-04T07:01:43.315Z|ERROR|piecestore|download failed|{"Piece ID": "B5UT3VNBA2A3DCQ67WX6TOUSEMUSQW4MXUKFHJD6BMZ7B556FNLQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-05T07:01:22.121Z|ERROR|piecestore|download failed|{"Piece ID": "2JYYSUHBPFPDPU5GE4N6AD3L362WYAF2XFXXYQCKQC4VLGLQJRLQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-06T08:01:35.237Z|ERROR|piecestore|download failed|{"Piece ID": "PGSUGJWYVAVI6PG3EPZHDDHW5MRKSWWTGCYKQYAQBLFRYHABH7JA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-06T10:01:51.825Z|ERROR|piecestore|download failed|{"Piece ID": "BDIEQOYYE7VM52XVWEFJGTEZAN6SAHNCZI7GIGKGA7ZTGKV6YDMA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-06T11:01:35.728Z|ERROR|piecestore|download failed|{"Piece ID": "J7YVO2G7TPJZ2NIMSDRGCO44FBGYIQIODUAJTTTFZ6DYOEGZ5TBQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-07T07:01:22.701Z|ERROR|piecestore|download failed|{"Piece ID": "ISCR4YPSNVJ2QJTASGJMSB6NHZFUGSQB6ORZLRTLDYXOKGT6EWLA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-07T12:01:17.778Z|ERROR|piecestore|download failed|{"Piece ID": "YKUJT5BKSRMCTFGNZLY2ZAISFW4AW3CCDLUZMBXKFKVMHE4W6FVA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-09T23:45:00.285Z|ERROR|piecestore|download failed|{"Piece ID": "TDHF4EKR6BAMK3XJLDZ7UTBBRIIZP2XP2OUSVA2O57W2YMHCRMGQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-10T00:45:03.043Z|ERROR|piecestore|download failed|{"Piece ID": "RHHJKLJP7Q5JCYKI52J4IWOTJRUIXNO7ADBQAUJNZCZHE7I7XAFA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-10T07:01:05.096Z|ERROR|piecestore|download failed|{"Piece ID": "YFM3SHQJORQ6VCAUWOIWRYN4MY7W6ZFXXRBHCKV5SHDCKDLJYQLA", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|
|2020-05-11T07:01:12.809Z|ERROR|piecestore|download failed|{"Piece ID": "75N4PNRBYVG2QQN65PQQAPRL7P2VX5MBV7JZHHZ6V5FDULGVM25A", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT", "error": "usedserialsdb error: database is locked", "errorVerbose": "usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}|

That’s exactly what I expected. There’s not much you can do about these and they seem to be infrequent enough to not worry about it. It’s being worked on. So it seems your node is doing well!

1 Like

Thanks!

Just for my information: why is the database locked?

Probably because another thread is still using it. It tends to happen more when IO is constrained.

1 Like

I had this error with the orders.db with constantly 99% I/O. The file was heavily fragmented (+180000 extents) and after defragmenting it the problem went away. It could have been coincidence, but I think it was not.
Depending on the OS and filesystem you’re using you might be able to do it while the node is running. I use BTRFS and defragment all database files daily now.