ERROR preflight:localtime unable to get satellite system time

I have a problem on 2 out of 3 nodes. All of a sudden docker containers restarting in infinite loop generating error message:
ERROR preflight:localtime unable to get satellite system time
ERROR Failed preflight check. {“error”: “system clock is out of sync: system clock is out of sync with all trusted satellites”

On all nodes time sync works same and shows exactly the same time.
Only 2 nodes went offline and container won’t start.

I did compare configuration on all nodes. There is no difference between one which works and 2 others which are crashing.

Any ideas where else to look for some clues?
Firewall has exactly same configuration, network is configured exactly the same.
All machines have same spec as I’m running those as virtual machines.

I did remove all SNO docker images and downloaded it again, still the same.

Running out of ideas at the moment.

Please, post errors appeared after start, the “clock out of sync” is a last check and it’s usually a consequence of connectivity issues. For example, your node cannot ping/request time from any satellite and thus clocks doesn’t match.
I agree, we should not check clocks, if we unable to connect to the satellite, but unlike ping, the clock out of sync is a fatal error and it’s checked after all satellites ping. The storagenode should crash if no one satellite is available and “clock is out of sync” error will do that.

All seems that it is network issue.

When trying to telnet to hosts from logs to port 7777 or other, no response.
Same telnet from other servers works.

However, no changes on server made since that happened.
Checked with network provider, confirmed that no changes or any traffic blocked from their side.

Disabled all other services (including firewall) with no luck.
I can ping target IP, but can’t connect to any of the services.

at this stage I ran out of ideas what else to check as system logs show nothing unusual.
only thing now is to reinstall OS :confused:

Here are starting logs, basically showing that can’t communicate with remote nodes on port 7777:

2021-03-31T06:30:27.966Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2021-03-31T06:30:27.985Z        INFO    Operator email  {"Address": "XXXXXX"}
2021-03-31T06:30:27.986Z        INFO    Operator wallet {"Address": "XXXXXX"}
2021-03-31T06:31:03.117Z        INFO    Telemetry enabled       {"instance ID": "XXXXXX"}
2021-03-31T06:31:03.202Z        INFO    db.migration    Database Version        {"version": 51}
2021-03-31T06:31:18.958Z        INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.
2021-03-31T06:31:38.958Z        ERROR   preflight:localtime     unable to get satellite system time     {"Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMu
awymB", "error": "rpc: dial tcp 35.228.10.185:7777: i/o timeout", "errorVerbose": "rpc: dial tcp 35.228.10.185:7777: i/o timeout\n\tstorj.io/common/rpc.TCPConnector.Dia
lContextUnencrypted:106\n\tstorj.io/common/rpc.TCPConnector.DialContext:70\n\tstorj.io/common/rpc.Dialer.dialEncryptedConn:180\n\tstorj.io/common/rpc.Dialer.DialNodeURL
.func1:101\n\tstorj.io/common/rpc/rpcpool.(*Pool).Get:87\n\tstorj.io/common/rpc.Dialer.dialPool:146\n\tstorj.io/common/rpc.Dialer.DialNodeURL:100\n\tstorj.io/storj/stor
agenode/preflight.(*LocalTime).getSatelliteTime:110\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check.func1:67\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:
57"}
2021-03-31T06:31:38.959Z        ERROR   preflight:localtime     unable to get satellite system time     {"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3
vzoA6", "error": "rpc: dial tcp 35.194.133.253:7777: i/o timeout", "errorVerbose": "rpc: dial tcp 35.194.133.253:7777: i/o timeout\n\tstorj.io/common/rpc.TCPConnector.D
ialContextUnencrypted:106\n\tstorj.io/common/rpc.TCPConnector.DialContext:70\n\tstorj.io/common/rpc.Dialer.dialEncryptedConn:180\n\tstorj.io/common/rpc.Dialer.DialNodeU
RL.func1:101\n\tstorj.io/common/rpc/rpcpool.(*Pool).Get:87\n\tstorj.io/common/rpc.Dialer.dialPool:146\n\tstorj.io/common/rpc.Dialer.DialNodeURL:100\n\tstorj.io/storj/st
oragenode/preflight.(*LocalTime).getSatelliteTime:110\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check.func1:67\n\tgolang.org/x/sync/errgroup.(*Group).Go.func
1:57"}
2021-03-31T06:31:38.959Z        ERROR   preflight:localtime     unable to get satellite system time     {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwm
cNDDs", "error": "rpc: dial tcp 35.205.38.124:7777: i/o timeout", "errorVerbose": "rpc: dial tcp 35.205.38.124:7777: i/o timeout\n\tstorj.io/common/rpc.TCPConnector.Dia
lContextUnencrypted:106\n\tstorj.io/common/rpc.TCPConnector.DialContext:70\n\tstorj.io/common/rpc.Dialer.dialEncryptedConn:180\n\tstorj.io/common/rpc.Dialer.DialNodeURL
.func1:101\n\tstorj.io/common/rpc/rpcpool.(*Pool).Get:87\n\tstorj.io/common/rpc.Dialer.dialPool:146\n\tstorj.io/common/rpc.Dialer.DialNodeURL:100\n\tstorj.io/storj/stor
agenode/preflight.(*LocalTime).getSatelliteTime:110\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check.func1:67\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:
57"}

Yes, as expected, this node cannot reach the satellites and fail. There is something blocking traffic to/from satellites.
I’m not sure that switching off a firewall will disable created rules in iptables (or what the firewall uses to make the filtering work).
Perhaps you need to check rules on working hosts and copy them to the affected ones.
If it’s a VPS, then maybe change it (with moving storage and identity) to a different one.

If you use docker - try to reinstall or update it, it could fix the issue with routing.

Went through iptables config. All updates applied including docker.
Nothing unusual. Host blocks nothing too, checked that.
As nodes are not operational I will backup identity on one and reinstall it.
It is interesting especially that it happened overnight without any changes to the systems.
Will keep posting findings.

And data!
Otherwise it will be disqualified instantly