macOS system clock is out of sync: system clock is out of sync with all trusted satellites

Had a very bad day after a routine maintenance none of my 2 nodes running for months passed the preflight check, I did not find the solution however I wanted to share my investigations. First, I had a look to all previous post to get an answer, and none was adapted to my context.
Finally fixed the problems moving my nodes to another Mac.
I was afraid of loosing my oldest nodes and to be DQ, after a few hours turning around suspecting the storage filesystem/db to corrupted. Still unexplained, but might interested people in the same config.

Pattern
infinite looping on fatal error at startup, node stays in status offline
Did a few attempts to troubleshoot this issues

  • restarted docker ; rebooted {no effect}
  • change ntp server in date & time preferences to time.google.com ; time.apple.com ; pool.ntp.org {no effect}
  • added a few lines in config.yaml : uncommenting timeout and preflight check variables {no effect}
  • finally moved the disks to the backup Mac mini and carefully cleaned the containers to recreate them on the backup machine {success}

Question : Is there a way to check the machine without taking the risk of getting DQ by running storage node container ?

Configuration details
Node version 1.6.4
macos10 Catalina
2 full 1.9TB nodes running on a single machine
logs are available on demand in INFO mode

Logs extract
2020-07-22T08:46:38.627Z INFO Telemetry enabled
2020-07-22T08:46:38.690Z INFO db.migration Database Version {“version”: 42}
2020-07-22T08:46:48.918Z WARN trust Failed to fetch URLs from source; used cache {“source”: “https://tardigrade.io/trusted-satellites”, “error”: “HTTP source: Get https://tardigrade.io/trusted-satellites: dial tcp: lookup tardigrade.io on 192.168.65.1:53: read udp 172.17.0.2:57347->192.168.65.1:53: i/o timeout”, “errorVerbose”: “HTTP source: Get https://tardigrade.io/trusted-satellites: dial tcp: lookup tardigrade.io on 192.168.65.1:53: read udp 172.17.0.2:57347->192.168.65.1:53: i/o timeout\n\tstorj.io/storj/storagenode/trust.(*HTTPSource).FetchEntries:63\n\tstorj.io/storj/storagenode/trust.(*List).fetchEntries:90\n\tstorj.io/storj/storagenode/trust.(*List).FetchURLs:49\n\tstorj.io/storj/storagenode/trust.(*Pool).fetchURLs:240\n\tstorj.io/storj/storagenode/trust.(*Pool).Refresh:177\n\tstorj.io/storj/storagenode.(*Peer).Run:688\n\tmain.cmdRun:200\n\tstorj.io/private/process.cleanup.func1.4:359\n\tstorj.io/private/process.cleanup.func1:377\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:320\n\truntime.main:203”}
2020-07-22T08:46:48.924Z INFO preflight:localtime start checking local system clock with trusted satellites’ system clock.
2020-07-22T08:46:58.931Z ERROR preflight:localtime unable to get satellite system time {“Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “error”: “rpccompat: dial tcp: lookup europe-north-1.tardigrade.io on 192.168.65.1:53: read udp 172.17.0.2:44234->192.168.65.1:53: i/o timeout”, “errorVerbose”: “rpccompat: dial tcp: lookup europe-north-1.tardigrade.io on 192.168.65.1:53: read udp 172.17.0.2:44234->192.168.65.1:53: i/o timeout\n\tstorj.io/common/rpc.Dialer.dialTransport:290\n\tstorj.io/common/rpc.Dialer.dial:267\n\tstorj.io/common/rpc.Dialer.DialNodeURL:177\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).getSatelliteTime:110\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check.func1:67\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-07-22T08:47:33.033Z FATAL Failed preflight check. {“error”: “system clock is out of sync: system clock is out of sync with all trusted satellites”, “errorVerbose”: “system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:692\n\tmain.cmdRun:200\n\tstorj.io/private/process.cleanup.func1.4:359\n\tstorj.io/private/process.cleanup.func1:377\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:320\n\truntime.main:203”}

That looks more like a problem with your connection to the satellite. Otherwise the message would print out the time offset but instead it prints out that it couln’t reach the satellite.

DNS to 192.168.65.1 seems to be non functional?

1 Like

Thx for your post @mike, there seems to be a local DNS on the logs indeed. I checked my network preferences and there is no reference to DNS 192.168.65.1. My gateway is my router 192.168.1.1 and I have no local configuration. No idea where it can come from. docker configuration ?

Looks like. Try to restart the docker from the docker desktop application.

Already tried that two, restarting docker desktop and/or rebooting has no effect. I had 6 hours outage trying to fix this. Moving to another macmin saved the day, however this could happen again.
Thanks @Alexey for your proposal.