Had a very bad day after a routine maintenance none of my 2 nodes running for months passed the preflight check, I did not find the solution however I wanted to share my investigations. First, I had a look to all previous post to get an answer, and none was adapted to my context.
Finally fixed the problems moving my nodes to another Mac.
I was afraid of loosing my oldest nodes and to be DQ, after a few hours turning around suspecting the storage filesystem/db to corrupted. Still unexplained, but might interested people in the same config.
Pattern
infinite looping on fatal error at startup, node stays in status offline
Did a few attempts to troubleshoot this issues
- restarted docker ; rebooted {no effect}
- change ntp server in date & time preferences to time.google.com ; time.apple.com ; pool.ntp.org {no effect}
- added a few lines in config.yaml : uncommenting timeout and preflight check variables {no effect}
- finally moved the disks to the backup Mac mini and carefully cleaned the containers to recreate them on the backup machine {success}
Question : Is there a way to check the machine without taking the risk of getting DQ by running storage node container ?
Configuration details
Node version 1.6.4
macos10 Catalina
2 full 1.9TB nodes running on a single machine
logs are available on demand in INFO mode
Logs extract
2020-07-22T08:46:38.627Z INFO Telemetry enabled
2020-07-22T08:46:38.690Z INFO db.migration Database Version {“version”: 42}
2020-07-22T08:46:48.918Z WARN trust Failed to fetch URLs from source; used cache {“source”: “https://tardigrade.io/trusted-satellites”, “error”: “HTTP source: Get https://tardigrade.io/trusted-satellites: dial tcp: lookup tardigrade.io on 192.168.65.1:53: read udp 172.17.0.2:57347->192.168.65.1:53: i/o timeout”, “errorVerbose”: “HTTP source: Get https://tardigrade.io/trusted-satellites: dial tcp: lookup tardigrade.io on 192.168.65.1:53: read udp 172.17.0.2:57347->192.168.65.1:53: i/o timeout\n\tstorj.io/storj/storagenode/trust.(*HTTPSource).FetchEntries:63\n\tstorj.io/storj/storagenode/trust.(*List).fetchEntries:90\n\tstorj.io/storj/storagenode/trust.(*List).FetchURLs:49\n\tstorj.io/storj/storagenode/trust.(*Pool).fetchURLs:240\n\tstorj.io/storj/storagenode/trust.(*Pool).Refresh:177\n\tstorj.io/storj/storagenode.(*Peer).Run:688\n\tmain.cmdRun:200\n\tstorj.io/private/process.cleanup.func1.4:359\n\tstorj.io/private/process.cleanup.func1:377\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:320\n\truntime.main:203”}
2020-07-22T08:46:48.924Z INFO preflight:localtime start checking local system clock with trusted satellites’ system clock.
2020-07-22T08:46:58.931Z ERROR preflight:localtime unable to get satellite system time {“Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “error”: “rpccompat: dial tcp: lookup europe-north-1.tardigrade.io on 192.168.65.1:53: read udp 172.17.0.2:44234->192.168.65.1:53: i/o timeout”, “errorVerbose”: “rpccompat: dial tcp: lookup europe-north-1.tardigrade.io on 192.168.65.1:53: read udp 172.17.0.2:44234->192.168.65.1:53: i/o timeout\n\tstorj.io/common/rpc.Dialer.dialTransport:290\n\tstorj.io/common/rpc.Dialer.dial:267\n\tstorj.io/common/rpc.Dialer.DialNodeURL:177\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).getSatelliteTime:110\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check.func1:67\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-07-22T08:47:33.033Z FATAL Failed preflight check. {“error”: “system clock is out of sync: system clock is out of sync with all trusted satellites”, “errorVerbose”: “system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:692\n\tmain.cmdRun:200\n\tstorj.io/private/process.cleanup.func1.4:359\n\tstorj.io/private/process.cleanup.func1:377\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:320\n\truntime.main:203”}