Node Online status

Noticed one of my nodes online dropping in spite of showing status online. How do I check to see what is causing it? My other two are at 100% and also showing as online.

I do have the ports forwarded in router and the configuration setup on the server for the port. Just rebooted server as well to see if that corrects.

Your node were offline. You can see history of audit requests:

What would have caused them to report as offline? Certainly no configuration issues as all 3 are not reporting the same - the other two are fine - as noted.

What should I be looking for in that url?

Time and dates. If you would use browser, then open it in the FireFox, it will generate a pretty view for you.
Or you can use curl and jq in bash
Here is example for PowerShell:

For bash:

curl http://localhost:14002/api/sno/satellite/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs | jq .auditHistory.windows

Ok. I see time and dates. Again, why would it report as offline when it was not?

Because your node didn’t answer on audit. Search for dates, when total number audits don’t match online audits.
In these dates and time your node was not available from outside.
It could be your ISP, or dynamic IP has changed but your DDNS hostname was not updated in time.
I would suggest to use Uptimerobot.com for monitoring.

This is my point - there is no indication they did not match.

If it was, as you say, the ISP or change in a dynamic IP causing a DDNS hostname not updating - it would be all the servers experiencing the same thing, which has not happened.

That said, what is the detriment to the online scores(which have not yet changed for some of the satellites since reporting) resultant in?

The online percentage is a rolling 30 day window, so it can take up to 30 days for the outage/downtime to no longer be reflected in the figures.

Make sure the percentages don’t continue to drop, otherwise you may have a problem which needs fixing!

1 Like

Clearly there is a problem that needs fixing - as of today, some are lower - so clearly there is an issue with something and the storj connection.

Certainly wish someone from storj would assist with resolving the issue.

You don’t appear to have provided any log files or audithistory scores. Those would help StorJ employees and other members of the forum to assist in diagnosing further.

Edit - Using my own node as an example the audithistory shows an issue on the 16th June 2021:

image

With that knowledge I can then go to the full log file and check for ERROR or FATAL:

The log file quite clearly shows a problem. In my case I found out my router stopped port forwarding, so I had to reboot the router.

1 Like

Could you please provide results of audit checks?

Am I supposed to have done so? Did not see any request for that, unless I missed it.

Audits across all servers show 100% (under Suspension & Audit), as reflected in the original post attachment.

Please, provide a list of audit checks, returned by the command for the satellite in question.

It is like pulling teeth with storj - half a dozen posts and still at step 1 of the concern. I’ve seen a substantial amount of people frustrated with storj and the whole process - but now that I am experiencing it, I understand why people leave all the time.

Again, as reflected in the original post attachment.

It is like pulling teeth with storj - half a dozen posts and still at step 1 of the concern. I’ve seen a substantial amount of people frustrated with storj and the whole process - but now that I am experiencing it, I understand why people leave all the time.

Again, as reflected in the original post attachment.

Please, execute the command on your PC (I do not know what is your OS) and post results:

PowerShell

((curl http://localhost:14002/api/sno/satellite/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs).Content | ConvertFrom-Json).auditHistory.windows | where{$_.totalCount -ne $_.onlineCount}

bash

curl http://localhost:14002/api/sno/satellite/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs | jq '.auditHistory.windows | select(.totalCount != .onlineCount)'

Longer you waiting - more data would disappear, because it’s 30 days rolling window
Added a filter to show only a difference
In these dates your node was not available at full.

Your node has all needed data. It was not available for the external services. And you can request this data to see when it’s happened. You can also configure an uptimerobot.com monitoring to have a nice GUI.

In Windows, could you please open a PowerShell window and paste there a command, then copy results?
The cmd.exe will not work.

Sorry, I do not know what OS are you running now, you didn’t tell that in this thread, so I’m forced to guess.

The command will show you when your node was offline, but not why.
It can be your ISP, router, PC, power outage, OS upgrade, BSOD or kernel panic - anything.
The reason impossible to figure out only with storagenode. You need to check all your infrastructure. The storagenode can only show when the problem was.

The storagenode stores detailed data for audit checks in databases. And it has rolling window.
For today the earliest record would be 2021-06-08T21:07:00Z. So all records before that already gone.

Does it mean you went offline if the uptime tracker near the top of your dashboard resets?

No. This is reset only when your service was restarted (update, reboot, power failure, etc.).
However, in case of docker for Windows with Hyper-V or docker for Mac the reboot of the host may not affect uptime, because these versions of docker uses a VM, and the hypervisor can save the state of the VM across reboots, so docker won’t actually restart and thus all running containers will continue to run.