Nodes online dont increase anymore

Hi,

It’s been 2 weeks my nodes are online without interruption but the % online doesnt change a bit.

and the log is not beautiful.

|2023-06-05T01:25:40.024Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: WZZC24U5NPTZWZISYA26CCAC6K5ZGPS6B777LJU4M64EIIPIILVQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 9472, Remote Address: 172.17.0.1:51636}|
|---|---|---|---|---|
|2023-06-05T01:25:40.025Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: QUF6SZHRXMGO55TWJUZI3BUUOVQPNAGR2NLTP6KO3IUVJ5PZHFBQ, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:529\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 290304, Remote Address: 172.17.0.1:53500}|
|2023-06-05T01:25:40.089Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: G2Z4R53NKAEVBXFRGYRIUR4O7KUFXSGSV7VKLF3XNZVGP4PVUXDA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:529\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 75264, Remote Address: 172.17.0.1:39274}|
|2023-06-05T01:25:40.339Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: WX26QO7EXABJQ4HBIVD7QNQOV6JHYGKJ5KJJTZDRK7QGIHLPXNZA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 36864, Remote Address: 172.17.0.1:38178}|
|2023-06-05T01:25:40.375Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: VAVZO5WULD43NJRM6QV4R3CU7QQH6BM5736ICEN2IBLV44AJIFGQ, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT_REPAIR, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:529\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 2048, Remote Address: 172.17.0.1:45416}|
|2023-06-05T01:25:40.387Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: YYTNO67GBX2H66P7SZ2HNFYOKKJV2J2LW4QAQZI6SJBMDUHZU5HA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:529\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 175104, Remote Address: 172.17.0.1:39698}|
|2023-06-05T01:25:40.517Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: HV6ZX6JWUNLZVGJKVUII2NV656LHA5I3B7H3PX4NDIJMVYFHFDWA, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 2560, Remote Address: 172.17.0.1:36468}|
|2023-06-05T01:25:40.606Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: VRD46UYPOJZMFVN7QI6PFXAJMOXT3OYTZLIZT3ZLMDZBFXJFV42A, Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 10496, Remote Address: 172.17.0.1:57660}|
|2023-06-05T01:25:40.617Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: 6DBTVTL2ZJWJL7WWNULMJRQ6ULTLID3EZEFKNZIITYSEH6ZTBALQ, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:529\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 183808, Remote Address: 172.17.0.1:32882}|
|2023-06-05T01:25:42.577Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: SKNW3U4Q2CZAGVUKPQ5GIHAJ5YXLUE4VHFBRFESJBUDDFWBAPHPQ, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:504\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 21504, Remote Address: 172.17.0.1:44730}|
|2023-06-05T01:25:52.464Z|ERROR|piecestore|upload failed|{process: storagenode, Piece ID: ROM4KBKTPFYY23MOAOZB2BFE3FC4WVGVDMMF4O5BCI5KAAKL5QTA, Satellite ID: 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs, Action: PUT, error: context canceled, errorVerbose: context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:75\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func5:498\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:529\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35, Size: 290304, Remote Address: 172.17.0.1:51636}|

What’s happening :).
Thank you

Your node should be online for the next 30 days to fully recover the online score. Each downtime requires additional 30 days online to recover. So, keep it online.

Usual long tail cancelation, your node was slower than others.

Though this one is not good:

perhaps your disk is unable to keep up.

Do you have restarts? Or some fatal errors in the logs?

PUT_REPAIR isn’t that much of an issue, is it? I thought it was only GET_REPAIR failures that were bad.

I have the same problem on 2 nodes, at different locations. Due to ISPs moving fiber optics, I lost internet access for 2-3 days, but since 2 weeks, all is solved and the nodes are online and kicking. My logs show the same lost races and the online score dosen’t recover; worse, it seems it is going smaller and smaller each day. The internet is not a problem, because in one location I changed the ISP too. So if in one location maybe ISP is doing something, in the other can’t be the same thing. Uptime robot dosen’t pick up anything, at 5 min interval, both have UPS, no power outages. I checked RAM on one oh them with memtest, the internal tests say no problems. Both are Synologys with 18GB ram, Exos 16TB disks. I am very sure is not a hardware or internet problem. Routers are ok, no changes there.
Db-es are OK.
BTW, memtest on Synology takes 6h for 18GB RAM non-ECC, unregistered.

I think problem is that most of ingress is from US satellite and locals get faster uploaded in most cases. Also there is some deletes of old data was some days ago on my 90 node i see around 3-4 tb gone.

I don’t have a problem with lost races or small ingress, that is something normal. The online score is the problem. Why is getting smaller? I have more nodes in different locations, none have problems with the score, same ISPs, pretty much same settings, hardware, software. They did’t lost internet, so the scores are above 99.9 online. It seems that after going below a value, it keeps going down, with no recovering. And this is a recent thing. Last year, I moved a node, with 5 days of downtime, and the scores recovered in 30 days.

Hi to check if you are still actively loosing online points you can try this URL from your network:
http://NODEIP:14002/api/sno/satellite/12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S

This specific link would show you the online stats for the US1 Satellite. You can then check how many online checks where tried and how many you successfully answered.


if you scroll all the way down you should get the most recent timeframes. If you had your node running the last few days you can check there if it answered all online checks in that timeframe or not.

1 Like

I don’t think if FW can influence the online score, but for the record, I have FW disabled on all my nodes since 2-3 months ago.
@Anon22 thanks for the link. I will report back.

I just discovered the JSON Viewer in my browser :hugs:.
It seems that the online checks are all accounted for on both machines. Let’s just wait and see in 30 days.
These are now:

Indeed, however the repair worker should have more patience than uplink :slight_smile:

@snorkel you may check when the satellites was unable to contact your node:

Since online score uses a rolling 30 days window, your node should be online in the next 30 days after downtime to recover the online score.

It seems that I was wrong. The online score increases, as suppose too. Just keep it online for a few days and you can see the difference. Here is the first one from above.