Unrecoverable error: System clock is out of sync

jammerdan · September 2, 2022, 1:07pm

Now that’s weird. I have just rsynced a node to another hardware. Same timezone both servers perfectly time synced.
Now when I try to start the node on the new server I get:

DEBUG   Unrecoverable error     {"Process": "storagenode", "error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:876\n\tmain.cmdRun:251\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tmain.main:479\n\truntime.main:255"}
2022-09-02 13:02:47,009 INFO waiting for storagenode, processes-exit-eventlistener to die
2022-09-02 13:02:47,011 INFO stopped: storagenode (exit status 1)
2022-09-02 13:02:47,011 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

How can that be and how can I get the node back on?

jammerdan · September 2, 2022, 1:21pm

Here is the whole thing:

2022-09-02T13:17:49.096Z        DEBUG   db      Database version is up to date  {"Process": "storagenode", "version": 54}
2022-09-02T13:17:49.872Z        DEBUG   trust   Fetched URLs from source; updating cache        {"Process": "storagenode", "source": "https://www.storj.io/dcs-satellites", "count": 6}
2022-09-02T13:17:49.872Z        DEBUG   trust   Satellite is trusted    {"Process": "storagenode", "id": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2022-09-02T13:17:49.872Z        DEBUG   trust   Satellite is trusted    {"Process": "storagenode", "id": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2022-09-02T13:17:49.872Z        DEBUG   trust   Satellite is trusted    {"Process": "storagenode", "id": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2022-09-02T13:17:49.872Z        DEBUG   trust   Satellite is trusted    {"Process": "storagenode", "id": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2022-09-02T13:17:49.872Z        DEBUG   trust   Satellite is trusted    {"Process": "storagenode", "id": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB"}
2022-09-02T13:17:49.872Z        DEBUG   trust   Satellite is trusted    {"Process": "storagenode", "id": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo"}
2022-09-02T13:17:49.872Z        INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.        {"Process": "storagenode"}
2022-09-02T13:17:49.926Z        ERROR   preflight:localtime     unable to get satellite system time     {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "error": "context canceled"}
2022-09-02T13:17:50.003Z        ERROR   preflight:localtime     unable to get satellite system time     {"Process": "storagenode", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "error": "context canceled"}
2022-09-02T13:17:50.226Z        ERROR   preflight:localtime     unable to get satellite system time     {"Process": "storagenode", "Satellite ID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "error": "context canceled"}
2022-09-02T13:17:50.236Z        ERROR   preflight:localtime     unable to get satellite system time     {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "context canceled"}
2022-09-02T13:17:50.346Z        ERROR   preflight:localtime     unable to get satellite system time     {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "error": "context canceled"}
2022-09-02T13:17:50.552Z        ERROR   preflight:localtime     unable to get satellite system time     {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "error": "context canceled"}
2022-09-02T13:17:50.552Z        ERROR   Failed preflight check. {"Process": "storagenode", "error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:876\n\tmain.cmdRun:251\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tmain.main:479\n\truntime.main:255"}
2022-09-02 13:17:51,553 INFO waiting for storagenode, processes-exit-eventlistener to die
2022-09-02T13:17:52.284Z        DEBUG   Unrecoverable error     {"Process": "storagenode", "error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:876\n\tmain.cmdRun:251\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tmain.main:479\n\truntime.main:255"}
2022-09-02 13:17:52,286 INFO stopped: storagenode (exit status 1)
2022-09-02 13:17:52,286 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

ItsHass · September 2, 2022, 2:12pm

be interested to know what caused this one… seems odd…

jammerdan · September 2, 2022, 2:14pm

Same, it is still not running. Need help.

jammerdan · September 2, 2022, 2:29pm

Ok, solved it:

Removed literally everything except for the big data folders blobs and trash.
Did an rsync again
Started it

Back!

jammerdan · September 3, 2022, 2:48pm

It just happened again.
This time I only deleted and re-rsynced the config files + identity:

Identity folder
Config.yaml
revocations.db
trust-cache.json

One of these is the culprit. After re-syncing, the node started normally.

BrightSilence · September 3, 2022, 3:03pm

I think it’s just the restart that fixes it. None of these files have anything to do with time. Only the last one has something to do with connecting to satellites, but I’m pretty sure you can safely remove that one and it will just be recreated.

If I remember correctly this error doesn’t always necessarily have something to do with time, because it’s also just the first thing your node talks to satellites about. You could try removing the trust cache file and restarting if it happens again. Or in case it does have to do with time and you’re using an old motherboard, it may be that the CMOS battery on the motherboard is dead and the clock drifts out of sync as a result.

jammerdan · September 3, 2022, 3:08pm

The hardware is perfectly fine.
This time I have rsynced the node files on the same hardware just from one disk to another.
It went from working perfectly fine to unrecoverable error. It is a bit weird.
Next time I’ll delete the files one by one.

Toyoo · September 3, 2022, 3:29pm

Is is a VM? Virtual machines have surprisingly many problems with accurate time keeping, VMWare dedicated a whole paper to the issue.

BrightSilence · September 3, 2022, 4:51pm

I wasn’t suggesting it was bad. But batteries drain and that does impact your system clock first usually.

I think this is a distraction. If time is synced every once in a while it’s just gonna work at random moments. And these files have very specific functions that have nothing to do with the error you’re getting.

jammerdan · September 3, 2022, 5:06pm

No, it’s bare metal.

I got that. It’s just that I don’t believe it is a hardware issue.

Could be totally coincidentally. But I had it now 2 times almost in a row that this error shows up and right after deleting of these files, the node starts without the issue.

jammerdan · September 4, 2022, 5:09am

And again:

2022-09-04T05:07:53.636Z        ERROR   Failed preflight check. {"Process": "storagenode", "error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:876\n\tmain.cmdRun:251\n\tstorj.io/private/process.cleanup.func1.4:378\n\tstorj.io/private/process.cleanup.func1:396\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tmain.main:479\n\truntime.main:255"}
2022-09-04 05:07:54,637 INFO waiting for storagenode, processes-exit-eventlistener to die
Error: system clock is out of sync: system clock is out of sync with all trusted satellites
2022-09-04 05:07:55,268 INFO stopped: storagenode (exit status 1)
2022-09-04 05:07:55,268 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)

jammerdan · September 4, 2022, 6:36am

It is getting more and more interesting. Believe it or not here is what happened:

I waited for an hour or so and kept restarting and removing the container. It did not start, never.
Then I rsynced only the identity folder again, but kept the old identity folder.
Now guess what? With the new identity folder, the node starts. With the old one it does not start and shows the error.
It is reproducible. When I move the working node back to the old identity folder, it stops working.
Now I am either doing something wrong, or rsync is messing with the identity files. I have no clue yet.

Alexey · September 4, 2022, 7:56am

Try to rewrite the old folder with a new one, does it change anything?

jammerdan · September 4, 2022, 7:58am

Rsync would not detect any changes.
I have to delete/rename the identity folder manually and then perform a new rsync.
Then it works.
I have no idea why.

Alexey · September 4, 2022, 8:00am

Could be a permissions issue?

jammerdan · September 4, 2022, 8:00am

But how? 1st rsync → not working, 2nd rsync → working without any changes.

BrightSilence · September 4, 2022, 9:30am

Maybe check your file system for issues. Though it would be super weird if it somehow kept corrupting the same files.

jammerdan · September 6, 2022, 4:32am

It seems that this error occurs when identity files of a previous node are present at the same location and rsync is supposed to delete them. It appears to be reproducible that in this case the new identity files are corrupt in a way or not usable for the node. It seems to be explicitly required to delete the identity files by manually.