Storage Node Preflight Checks

In the satellite logs we noticed that a few storage nodes are in a bad state and the storage node operators might not even notice it. Long term we will have to disqualify these storage nodes. To give the storage node operators a chance to avoid the disqualification we have added a storage node preflight check for the system clock and for the SQLite databases.

We are especially interested in 2 situation.

  1. It looks like docker on windows is unable to get the timezone from the host system. The storage node will run with a wrong clock. Please watch for log messages like system clock is off: clock off by 2.01461. Is your system clock correct? If so please contact us with some additional information about your setup. Are you using windows or a different OS? Which docker version are you running? Any way to fix this issue? Is switching to the windows installer an option for you?
  2. The SQLite database check will catch schema mismatches as well. With v0.30.5 this check is disabled by default because we want to be sure it works as expected and doesn’t catch storage nodes by surprise. It will get enabled with the next release. Please use the time to test it on your storage node by adding preflight.database-check: true to your config.yaml. Does it show any error messages? Is your storage node working if you disabled the check? I would call that a false positive and would like to know why this is happening.
1 Like

About a half of my nodes have malformed db error, but dashboard says that all ok.
1317604469

3 Likes

Great. Now is your time to fix it because with the next release the storage node will not start anymore.

Of course I’ll do it.
But how many operators will not do this because they do not read the forum?
I think you need to do a newsletter

The storage node will stop working and they will be force to fix it. That is the plan.

If 99.9% of the storage nodes are running just fine do we need to send everyone an email? The preflight check is designed to identify the 0.1% and will force them to fix the issue.

My target here is to make sure the preflight check doesn’t get triggered because of some unknow bugs. Please give us feedback especially about the 2 points I mentioned.

1 Like

how are we supposed to fix it if we get an issue?

Search for the issue here in the forum or create a post yourself.

I am requesting feedback and don’t expect that I will be able to solve all of your problems. I know you will not like the news but my job is only to force you :smiley:

haha ok ill give it a go and hope for the best! :smiley:

1 Like

I see no errors mentioned after enabling it in config.yaml but log file should mention about dbs being checked & finding no errors just like system clock.

preflight:localtime start checking local system clock with trusted satellites’ system clock.
preflight:localtime local system clock is in sync with trusted satellites’ system clock.

I see no error either on all 3 nodes. A success message would have been nice though :smiley:

3 Likes

you updated nodes manualy?

only this time. generally i wait for watchtower

My watchtower just happened to have already updated my node on it’s own, so I went ahead and enabled the pre-flight db check. Everything seems to be fine. I agree, a success message would be nice. Maybe for the next version once the check is enforced?

2 Likes

My node is still at 0.29.3. How do I force the Windows updater to do its thing?

Update: I went to restart the update service, but it seems that it was already in the process of installing the update.

Same here, nothing about the database in the log.

2020-01-23T19:41:05.627Z        INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.
2020-01-23T19:41:06.610Z        INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.

I guess this means the db is OK?

1 Like

Hello @LarsOS,
Welcome to the forum!

You shouldn’t force the update. We do not want to push offline the whole network when we release a new version.
Just wait until it updates itself.

1 Like

Here is example of positive messages:

2020-01-23T20:26:30.180775047Z 2020-01-23T20:26:30.180Z INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.
2020-01-23T20:26:31.113173554Z 2020-01-23T20:26:31.112Z INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.

@littleskunk I enabled preflight.database-check: true on all storage nodes that already updated, and see no issues with this option (no addition bugs and it true positive news :smiley:)

1 Like

So far it is looking positive. Do we have nobody with a clock offset? Please shoot a short message if your clock was out of sync and how you fixed it. That would help us as well.

Once my nodes update Ill let you know my Windows node came up fine.

No problem:
Precision clock is my standard by default on every server, it extremely important for Virtual Machines, and other non-bare metal systems.
How to check status on Debian (Ubuntu):
timedatectl status
image
NTP service: should be active
If status is not active, check configuration file /etc/systemd/timesyncd.conf

Summary
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See timesyncd.conf(5) for details.

[Time]
#NTP=
#FallbackNTP=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org
#RootDistanceMaxSec=5
#PollIntervalMinSec=32
#PollIntervalMaxSec=2048

And enable it with:
timedatectl set-ntp true

1 Like