Your node has been suspended

4ich · May 2, 2020, 6:13am

Hello storjlings
I have a raspberry with an 4 TB drive on it.
The node id is 1UfHZAzNqFRS9yjsgSeTeERXDrLo7dUw1f714qTgZXvzrq71Fu
Ive checked it today and it got the update 1.3.3 today but also the message that it got suspended on All but the US sattelite.
Can you check why and if i can fix it?
Thanks in advance

This log entry makes me worry:
05-02T06:13:32.500Z ERROR piecestore download failed {“Piece ID”: “V2KZNMBJP2XGP2MXK5JJECC6B2CLE6IRGR2IO3LWER4WQG5VOR7Q”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET”, “error”: “usedserialsdb error: database is locked”, “errorVerbose”: “usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”}

pietro · May 2, 2020, 6:22am

The same here:

This morning the node has been upgraded and got the suspension. My node was working fine since September when I started. It’s under UPS and no fault in the last week.

4ich · May 2, 2020, 6:59am

I just restarted my pi and it seems that it got some in/egress.
In the logs iam finding also this entry
05-02T06:49:50.505Z ERROR piecestore failed to add bandwidth usage {"error": "bandwidthdb error: database is locked", "errorVerbose": "bandwidthdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:59\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).saveOrder:728\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:448\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:216\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:987\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

Madbrain · May 2, 2020, 7:27am

for me to … i recive same

Your node has been suspended on 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs . If you have any questions regarding this please check our Node Operators thread on Storj forum.

what to do ?

nerdatwork · May 2, 2020, 7:37am

Check your log. Does it have something like

littleskunk · May 2, 2020, 9:23am

Search for GET_AUDIT instead.

Unfortunate no. On the satellite side, we noticed that some nodes are not working fine and we had to implement suspension mode because of that (~4 weeks ago). So your node was not working fine but for different reasons, you didn’t notice it before. → It is a feature and not a bug.

The error message you posted is unrelated. Search especially for any download failed with GET_AUDIT

Find out why you fail audits.

Madbrain · May 2, 2020, 9:29am

find… and how to do that? need search in node.log problem?

pietro · May 2, 2020, 10:28am

OK, I noticed a lot of "database is locked" errors, both on GET and on GET_AUDIT.

So, what I have to do to fix this problem?

I already bought a new hardware (8 cores) and a new SATA drive will arrive on Tuesday next week, at this point I don’t know if it’s better to move this node on the new HD or start a new node and perform a graceful exit on the current node.

Please, let me know, I’m here to help you to better support your client, but I need a little help from Storj to fix this situation.

Thanks again
Pietro

Mikey89 · May 2, 2020, 10:53am

This happened to me as well. Was it because i erased my logs?

4ich · May 2, 2020, 12:09pm

I have the same output at get_audit and d’ownload failed’ there are the database locked messages

pietro · May 2, 2020, 12:27pm

I’m rotating the logs. Perhaps that’s the cause?

BrightSilence · May 2, 2020, 4:54pm

This tends to happen when there is an IO bottleneck. There can be several reasons for this, for example if you use a network protocol or USB2. It also happens with SMR drives that can’t keep up.

Neither would be my suggestion. Since it’s likely an IO bottleneck, you might be best off running a node on each HDD. That will cut the IO per HDD in half because the nodes would be sharing the same traffic.

Suspension is not definitive. You can recover from it if you can fix the issues. So hopefully this post gives you some ideas of what to look for.

No to both. The logs don’t impact the nodes performance. Removing or rotating logs can’t have any effect on this.

pietro · May 2, 2020, 5:29pm

You’re right, my HD is running on USB2 interface, that’s why I ordered a new SATA drive which will be connected directly to the SATA port of my new board (Odroid HC2).

Thank you, the node is up and running now, but I’ve understood that this setup cannot cope with the workload required. I’ve invested some money in new hardware, both board and disk, next week I will deploy a more efficient node.

So I can continue rotating logs, can’t I?

Thanks again for your support.

Regards,
Pietro

Alexey · May 2, 2020, 5:39pm

Yes, you can. Logs have nothing todo with “database is locked” issue.

4ich · May 2, 2020, 8:16pm

So is there a way to improve my pi3 that it can resume or is a Hardware change the only way?

CyborgCat · May 3, 2020, 9:17am

Hi,
The same problem appeared to me “Your node has been suspended on 118U …”. I use raspberry pi 4 with 8TB usb 3.0 HDD. I think this problem is a major one considering that there are many people with RPi, and if the problem is not solved, the Storj network will suffer. How can we solve this problem?
Thanks.

Alexey · May 3, 2020, 11:03am

Please, search for failed audits in your log

CyborgCat · May 3, 2020, 11:10am

2020-05-03T10:43:19.293Z ERROR piecestore download failed {“Piece ID”: “EJ…”, “Satellite ID”: “121…”, “Action”: “GET_AUDIT”, “error”: “usedserialsdb error: database is locked”, “errorVerbose”: “usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”}

Database is locked - in version 1.1.1 there was no problems. I still have a node that was not updated in v.1.3.3. which works normally.

Alexey · May 3, 2020, 11:13am

For this exact error I have submitted an internal issue, it’s not solved yet.
This problem here for a long time, only after 1.3.3 it’s visible

https://static.zoonar.com/img/www_repository2/95/dc/ed/10_bc7337c698e8323587944f650de9a8ac.jpg
But it is not a solution

CyborgCat · May 3, 2020, 11:17am

Let’s find a solution so that it works normally on RPi.