What happened? Server uptime is 19 days, checked the nodes when the system booted then, they were all online with 100% or 99.x% stats. Check them today, they have been offline for 4 hours, and according to the alert in the top right everything was suspended months ago.
Am I suspended? What happened?
If I’m not suspended, as my stats all show I’m still at 100% or damn near minus these 4 hours, how do I get back online?
Sampling of the logs looks good, I see nothing in the last tail of each node that suggest failure. Bandwidth and utilization are still good. It still shows it is doing things with a current timestamp even though it is offline. Everything looks good until 4 hours ago when I noticed it went down and saw the suspension note.
Dug deeper, found this, but still am unsure if this is the cause:
Failed to dial storage node, and 2021-05-10T03:26:10.779Z WARN contact:service Your node is still considered to be online but encountered an error.
Edit: Sorry for all the edits, trying to add in as much info before someone responds. Digging deeper here…All of the nodes are offline, some report suspensions alerts, others have no alerts. The ones that have suspension alerts have a log and information similar to what is shown above, nothing obvious stands out, and they are still “working”.
Going to bed, checked the tail of each node log just now, they all look like the above image, current timestamps, no issues reported, downloads started, gets, etc…Everything is still “Offline.” For something “Suspended” in February, it is still reporting Current Month earnings that seem reasonable, and I have payouts for April and March. What is going on here?
ddclient runs hourly, IP in log matches the result of dig +short myip.opendns.com @resolver1.opendns.com, which also matches my google domain (which loads without issue on https).
port table is still there, all storj ports/ips are noted, along with my other services (https, etc…) (unchanged since early 2020)
All local devices are static assigned IP (unchanged for years)
I see nothing in the logs, no issues with ports being open closed. According to https://www.yougetsignal.com/tools/open-ports/, my IP/port for storj nodes is open
Correct as of posting last night, PC runs on UTC, I was posting at midnight NYC time, screenshots above show 0400 hours. That seems right.
3 successful pings, all less than 22ms.
Edit: I still see clean docker logs with traffic, but it still reads, “Offline” on the dash. This log snippet is the matching pair of the dashboard image in this post. Timestamps are within seconds of this edit.
Still very confused.
Edit Edit: Looked back at the log, saw the satellite id, tracked that down on the node dashboard and that still shows as, “Offline”, scroll down, Suspension/Audit are both 100%.
Could you please check your firewall, it should not block incoming traffic to your node’s port and should not block any outgoing traffic.
Please, check your identity: Identity - Node Operator
If you moved identity from the default path, please use this new path in the checking commands.
Incoming traffic is not blocked. I show that the port is open online and can ping it successfully from yougetsignal.com (pm for domain/IP). Also, the docker logs show that it is downloading happily with “INFO piecestore downloaded” notes every few minutes
The Identity - Node Operator link you sent returns back the results of 2 and 3 respectively.
Then something blocking the traffic, because satellite is unable to request your node. It sends a message, not the ICMP ping, if it got no response, it consider your node as offline.
Please note, each satellite checking your node independently.
Please, try to use Chrome browser in Incognito mode on your dashboard, make sure that you looking on the dashboard of the node in question.
It still shows, “4 hours ago”, which seems strange, all satellites show 100/100 on suspension/audit for that node. Why is it pinned at “4 hours ago” for the last 16 hours?
PM and you can probe the server all you like. I can’t figure out what changed.
I just ran docker stop [all the nodes], then docker start [all the nodes]. The uptime that previously was shown as 16h 6m, is now…4 hours. It should be zero. …it’s still, “Offline”
Would this prevent me from gracefully exiting the network?
Last contact - it’s a timestamp when your node is successfully contacted all satellites.
The satellite do not use an ICMP ping to determine is your node available or not, instead it sends a message to the node and expect a respond.
The last successful respond was 4 hours ago.
If your node keeps show an uptime even after restart it doesn’t looks like normal.
Please do not only docker stop, but docker rm too and run them back.
Brought them up one by one, they all came up without error. Returning to the dashboard, they are still “Offline”, and the times still read 4hr and 4hr. If the 4 hour was accurate, it should have drifted by now…seeing as how it is a few hours later if that was really the “last contact”.
It’s dangerous in such case. If your node would be considered as offline, the Graceful Exit will fail after a few days offline and node would be disqualified.
# curl -L http://localhost:14002/api/sno | jq '.lastPinged' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 44 100 44 0 0 44000 0 --:--:-- --:--:-- --:--:-- 44000 100 1416 100 1416 0 0 460k 0 --:--:-- --:--:-- --:--:-- 460k "2021-05-10T18:18:38.593081582Z" # date Mon 10 May 2021 06:18:43 PM UTC
…still shows as “Offline”, last contact is +4 hr, online for +4:23hr…which I read as “recently contacted ‘just now’, and online for 23 minutes”. Still don’t get the “Offline” thing though.
I believe we’ve seen this before if the local machine has a different time zone. The web dashboard seems to compare to local time. So this is especially relevant if you’re opening the web dash board on a remote machine.
Even if the time looks correct make sure the local machine is also set to the correct time zone. Otherwise if the timezones is wrong, but the time is correct, the local system would derive UTC wrong.