Node shows Online in red

thebadcat · January 23, 2023, 9:45pm

I just noticed this today, where saltlake.tardigrade.io is in red for the Online status. I’ve stopped and restarted the node but it is still showing in red. Is there something I need to do to fix this?

BTW, this what the status shows for the node:

Alexey · January 24, 2023, 4:47am

Your may check when your node was not available for this satellite:

To fully recover, your node should be online in the next 30 days, each downtime requires additional 30 days to recover.

Ruskiem · January 24, 2023, 12:55pm

Alex, can You explain?
“each downtime requires additional 30 days to recover.”
like if i get 5 minutes, or 30 minutes downtime? it will require additional 30 days to recover?!

Beside that i noticed the same:

i have several nodes, and just noticed almost all got hit on saltlake satelite Downtime unproportionately to other satelites on the node, that got 95-100% (thats not physicaly possible! if theres example: 5h downtime its the same 5h for all satelites! obviously! whats goin on?)

https://imgur.com/a/9E1P2ko

it affected some nodes payment, like from 6-7$ a month, down to 1,5-2$, beacuse saltlake was main source of income.

3)Some nodes have 100% saltlake, but still their payments are down from 3$ a month to some 1,5$

thebadcat · January 24, 2023, 2:24pm

I check my messages multiple times daily and all my nodes were in 99%+ online until yesterday. Something has happened that caused this. I am now down to 88% so I am worried that will mean I will lose this node that has been running for many years. What do I need to get this fixed?

thebadcat · January 24, 2023, 2:48pm

I have no idea what this response means… my node is fine on all satellites except for that one. WHAT DO I NEED TO DO TO FIX IT? I check my nodes multiple times daily and it was fine the day before. WHAT HAS CHANGED? I do not want to lose this node and start from scratch for something I have no control over.

Alexey · January 25, 2023, 3:28am

Yes. This is due 30 days rolling window. The downtime will not affect your online score when it will be outside of this window, thus you need an additional 30 days online for each downtime.
However, the online score affects your ingress only when it fall below 60%, then your node will be suspended (no ingress until the online score will recover above 60%).

no, it’s not. Each satellite checks your node independently and in a different time. You may check, when your node was not available with provided scripts.

Your income depends on usage by the customers, the only affected usage will be the ingress traffic (unpaid), if your online score less than 60% and thus no new data ($1.5/TB), however the egress traffic ($20/TB and $10/TB) is still here with the exception of the time when your node was offline. But of course it’s better to fix the problem by figuring out why your node was not available during this time.

The most interesting would be to figure out, why your node was not available for this satellite during this time.
You may search in logs for “ping satellite failed” except “ratelimit” and see - what was the reason.

Alexey · January 27, 2023, 4:12am

10 posts were merged into an existing topic: Your Node was suspended - saltlake

thebadcat · January 25, 2023, 4:24pm

@Alexey From your comment above. I’ve checked the log and I see these messages:

2023-01-25T13:40:25.161593644Z 2023-01-25T13:40:25.160Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 1, "error": "ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 173.35.195.118:28967: connect: connection refused", "errorVerbose": "ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 173.35.195.118:28967: connect: connection refused\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:145\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:100\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-01-25T13:40:26.563408779Z 2023-01-25T13:40:26.562Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 2, "error": "ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 173.35.195.118:28967: connect: connection refused", "errorVerbose": "ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 173.35.195.118:28967: connect: connection refused\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:145\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:100\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-01-25T13:40:28.930403151Z 2023-01-25T13:40:28.929Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 3, "error": "ping satellite: check-in ratelimit: node rate limited by id", "errorVerbose": "ping satellite: check-in ratelimit: node rate limited by id\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:139\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:100\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
storj@pine64-storj00:~$ date
Wed Jan 25 11:11:26 EST 2023
storj@pine64-storj00:~$ ping 173.35.195.118
PING 173.35.195.118 (173.35.195.118) 56(84) bytes of data.
64 bytes from 173.35.195.118: icmp_seq=1 ttl=64 time=1.77 ms
64 bytes from 173.35.195.118: icmp_seq=2 ttl=64 time=2.31 ms
64 bytes from 173.35.195.118: icmp_seq=3 ttl=64 time=2.26 ms
64 bytes from 173.35.195.118: icmp_seq=4 ttl=64 time=1.72 ms
^C
--- 173.35.195.118 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 1.726/2.020/2.318/0.277 ms

I can ping the address as you can see so I am not sure why the ping request is failing. I can also successfully lookup and ping the saltlake.tardigrade.io address as well:

storj@pine64-storj00:~$ dig saltlake.tardigrade.io

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> saltlake.tardigrade.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62476
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1220
; COOKIE: 7715aa7febbc7a22971230f863d157382a646c6179269278 (good)
;; QUESTION SECTION:
;saltlake.tardigrade.io.		IN	A

;; ANSWER SECTION:
saltlake.tardigrade.io.	60	IN	A	35.236.104.196

;; Query time: 48 msec
;; SERVER: 64.71.255.204#53(64.71.255.204)
;; WHEN: Wed Jan 25 11:22:16 EST 2023
;; MSG SIZE  rcvd: 95

storj@pine64-storj00:~$ ping saltlake.tardigrade.io
PING saltlake.tardigrade.io (35.236.104.196) 56(84) bytes of data.
64 bytes from 196.104.236.35.bc.googleusercontent.com (35.236.104.196): icmp_seq=1 ttl=106 time=71.5 ms
64 bytes from 196.104.236.35.bc.googleusercontent.com (35.236.104.196): icmp_seq=2 ttl=106 time=73.5 ms
64 bytes from 196.104.236.35.bc.googleusercontent.com (35.236.104.196): icmp_seq=3 ttl=106 time=74.2 ms
^C
--- saltlake.tardigrade.io ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 71.549/73.118/74.277/1.150 ms

Note: my Online percentage continues to decline and it is now at 87%. I do not want this node to be DQ’ed since it has been running for a number of years now without incident and from what I’ve seen of other posts you guys have no contingencies for reinstating DQ’ed nodes other than recreating a new one and starting over from scratch which is utterly unacceptable.

Knowledge · January 25, 2023, 4:32pm

Looks like the connection is being blocked. Are you running antivirus software that may have blocked conmections from the Satellite? Could also be blocked by your ISP before reaching you.

thebadcat · January 25, 2023, 5:33pm

No, no antivirus software… and if it is being blocked by my ISP, then why only that 1 out of 6 satellites? The other 5 seem fine and as I said it was fine on Sunday then when I checked on Monday it was in the red.

Knowledge · January 25, 2023, 7:18pm

Well they go to different URL’s and if traffic was high from a particular ip/port perhaps they thought it was unusual and blocked.it

Usually it happens locally where antivirus software will think the traffic is from a suspect ip and block it. Some ISP’s can also behave that way.

thebadcat · January 25, 2023, 7:43pm

Maybe my understanding of blocked and yours is different. If the address was blocked would I still be able to ping it? I would think not but maybe I’m wrong. Wouldn’t the ICMP packet get rejected/blocked as well?

Knowledge · January 25, 2023, 7:49pm

Perhaps it was temporary and it is now working properly.

Alexey · January 26, 2023, 3:18am

something is blocking connections to your node - either your external IP was a different, or your firewall blocking connections, or your router (DDOS protection or throttling or whatever), or your ISP.
And here is a “ping satellite” message is not an ICMP ping, it includes dRPC request to your storagenode and response. So it requires not only network availability (this is what ICMP ping is doing), but also working and responding software listening its port.
In this case the satellite is tried to contact the node’s port and did not get a response from your storagenode, the ICMP ping will work, even if your node is stopped, so it’s useless for such troubleshooting. You may also setup an external monitoring for your port.

Alexey · January 27, 2023, 4:13am

A post was merged into an existing topic: Your Node was suspended - saltlake

thebadcat · January 26, 2023, 4:14pm

I’ve setup the UptimeRobot monitor and it doesn’t indicate that there are any problems.

So, I still do not know what would be causing my online percentage to be decreasing.

Stob · January 26, 2023, 5:34pm

@elek could this be the same issue?

Alexey · January 27, 2023, 4:07am

it should be setup before the problem to see it. So now you will be notified if your port will be offline.
Or did you setup it longer than 30 days ago?