Now that’s weird. I have just rsynced a node to another hardware. Same timezone both servers perfectly time synced.
Now when I try to start the node on the new server I get:
DEBUG Unrecoverable error {"Process": "storagenode", "error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:876\n\tmain.cmdRun:251\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tmain.main:479\n\truntime.main:255"}
2022-09-02 13:02:47,009 INFO waiting for storagenode, processes-exit-eventlistener to die
2022-09-02 13:02:47,011 INFO stopped: storagenode (exit status 1)
2022-09-02 13:02:47,011 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
How can that be and how can I get the node back on?
2022-09-02T13:17:49.096Z DEBUG db Database version is up to date {"Process": "storagenode", "version": 54}
2022-09-02T13:17:49.872Z DEBUG trust Fetched URLs from source; updating cache {"Process": "storagenode", "source": "https://www.storj.io/dcs-satellites", "count": 6}
2022-09-02T13:17:49.872Z DEBUG trust Satellite is trusted {"Process": "storagenode", "id": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2022-09-02T13:17:49.872Z DEBUG trust Satellite is trusted {"Process": "storagenode", "id": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2022-09-02T13:17:49.872Z DEBUG trust Satellite is trusted {"Process": "storagenode", "id": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2022-09-02T13:17:49.872Z DEBUG trust Satellite is trusted {"Process": "storagenode", "id": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2022-09-02T13:17:49.872Z DEBUG trust Satellite is trusted {"Process": "storagenode", "id": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2022-09-02T13:17:49.872Z DEBUG trust Satellite is trusted {"Process": "storagenode", "id": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2022-09-02T13:17:49.872Z INFO preflight:localtime start checking local system clock with trusted satellites' system clock. {"Process": "storagenode"}
2022-09-02T13:17:49.926Z ERROR preflight:localtime unable to get satellite system time {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "error": "context canceled"}
2022-09-02T13:17:50.003Z ERROR preflight:localtime unable to get satellite system time {"Process": "storagenode", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "error": "context canceled"}
2022-09-02T13:17:50.226Z ERROR preflight:localtime unable to get satellite system time {"Process": "storagenode", "Satellite ID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "error": "context canceled"}
2022-09-02T13:17:50.236Z ERROR preflight:localtime unable to get satellite system time {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "context canceled"}
2022-09-02T13:17:50.346Z ERROR preflight:localtime unable to get satellite system time {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "error": "context canceled"}
2022-09-02T13:17:50.552Z ERROR preflight:localtime unable to get satellite system time {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "error": "context canceled"}
2022-09-02T13:17:50.552Z ERROR Failed preflight check. {"Process": "storagenode", "error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:876\n\tmain.cmdRun:251\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tmain.main:479\n\truntime.main:255"}
2022-09-02 13:17:51,553 INFO waiting for storagenode, processes-exit-eventlistener to die
2022-09-02T13:17:52.284Z DEBUG Unrecoverable error {"Process": "storagenode", "error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:876\n\tmain.cmdRun:251\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tmain.main:479\n\truntime.main:255"}
2022-09-02 13:17:52,286 INFO stopped: storagenode (exit status 1)
2022-09-02 13:17:52,286 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
I think it’s just the restart that fixes it. None of these files have anything to do with time. Only the last one has something to do with connecting to satellites, but I’m pretty sure you can safely remove that one and it will just be recreated.
If I remember correctly this error doesn’t always necessarily have something to do with time, because it’s also just the first thing your node talks to satellites about. You could try removing the trust cache file and restarting if it happens again. Or in case it does have to do with time and you’re using an old motherboard, it may be that the CMOS battery on the motherboard is dead and the clock drifts out of sync as a result.
The hardware is perfectly fine.
This time I have rsynced the node files on the same hardware just from one disk to another.
It went from working perfectly fine to unrecoverable error. It is a bit weird.
Next time I’ll delete the files one by one.
I wasn’t suggesting it was bad. But batteries drain and that does impact your system clock first usually.
I think this is a distraction. If time is synced every once in a while it’s just gonna work at random moments. And these files have very specific functions that have nothing to do with the error you’re getting.
I got that. It’s just that I don’t believe it is a hardware issue.
Could be totally coincidentally. But I had it now 2 times almost in a row that this error shows up and right after deleting of these files, the node starts without the issue.
2022-09-04T05:07:53.636Z ERROR Failed preflight check. {"Process": "storagenode", "error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:876\n\tmain.cmdRun:251\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tmain.main:479\n\truntime.main:255"}
2022-09-04 05:07:54,637 INFO waiting for storagenode, processes-exit-eventlistener to die
Error: system clock is out of sync: system clock is out of sync with all trusted satellites
2022-09-04 05:07:55,268 INFO stopped: storagenode (exit status 1)
2022-09-04 05:07:55,268 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
It is getting more and more interesting. Believe it or not here is what happened:
I waited for an hour or so and kept restarting and removing the container. It did not start, never.
Then I rsynced only the identity folder again, but kept the old identity folder.
Now guess what? With the new identity folder, the node starts. With the old one it does not start and shows the error.
It is reproducible. When I move the working node back to the old identity folder, it stops working.
Now I am either doing something wrong, or rsync is messing with the identity files. I have no clue yet.
Rsync would not detect any changes.
I have to delete/rename the identity folder manually and then perform a new rsync.
Then it works.
I have no idea why.
It seems that this error occurs when identity files of a previous node are present at the same location and rsync is supposed to delete them. It appears to be reproducible that in this case the new identity files are corrupt in a way or not usable for the node. It seems to be explicitly required to delete the identity files by manually.