Node daily restart

TowerBR · July 11, 2024, 11:31am

I rotate the node log with logrotate daily, using a script that also restarts the node. This script takes 10~20 seconds to execute.

Is this daily restart harmful to satellite statistics? I believe it is beneficial for the node’s performance, correct?

jammerdan · July 11, 2024, 11:42am

Expect all sorts of strange things to happen.

In the past, nodes were not able to resume various tasks in progress after a restart, and efforts to improve this are still ongoing.

nerdatwork · July 11, 2024, 11:49am

What is the maximum allowed size for your log before it rotates ?

TowerBR · July 11, 2024, 11:53am

Thanks for the advice, I hadn’t seen that other topic.

Checking the logs, I only have upload errors, for example, in a daily log with 231550 lines I have 1223 upload errors.

Node statistics show “full 100%”:

TowerBR · July 11, 2024, 11:55am

It’s configured to run daily, regardless of size. Compressed log files are around 10 MB in size.

arrogantrabbit · July 11, 2024, 7:36pm

Why?

If the problem is that it does not release the handles on SIGHUP, you can run it with logs redirected under daemon, that does process SIGHUP and otherwise works correctly with logrotate/newsyslog.

An example here:

github.com

arrogantrabbit/freebsd_storj_installer/blob/310a10e8c295d102c1cb85e42883cd7699a13622/overlay/usr/local/etc/rc.d/storagenode#L22


      
          
          load_rc_config $name
          : "${storagenode_enable:=yes}"
          : "${storagenode_executable:="/usr/local/bin/storagenode"}"
          : "${storagenode_msg:="Nothing started."}"
          : "${storagenode_storage_path:="/mnt/storagenode"}"
          : "${storagenode_config_dir:="${storagenode_storage_path}/config"}"
          : "${storagenode_identity_dir:="${storagenode_storage_path}/identity/storagenode"}"
          
          pidfile="/var/run/${name}.pid"
          command="/usr/sbin/daemon"
          
          # Create log file with 644 permissions and storagenode as owner. Daemon utility by default uses 600
          logfile="/var/log/${name}.log"
          touch "${logfile}" && chown storagenode:storagenode "${logfile}" && chmod 644 "${logfile}"
          
          command_args="-r -f -H \
            -o \"${logfile}\" \
            -P \"${pidfile}\" \
            -u storagenode \"${storagenode_executable}\" run \
              --config-dir \"${storagenode_config_dir}\" \

Alexey · July 12, 2024, 6:53am

You may also configure to use a truncate instead, of course you may lose some lines, while it copies the content, but in exchange the node will not require to be restarted.

The downside of the daily restarts is a constantly running used-space-filewalker, especially in a lazy mode (in that mode it takes days to finish), so your disks will be always busy on 100%, unless you have a cache solution for your disk subsystem.

TowerBR · July 12, 2024, 3:33pm

I prefer to have logs in a more granular way, on a daily basis, and stored externally. This makes it easier to search if something happens, as I don’t have to deal with a 500 MB log file and millions of lines.

Considering what was exposed here and the content of the topic mentioned above, I modified the script to copy the log without restarting the node, truncating the log file. It seems to be working fine.

arrogantrabbit · July 12, 2024, 3:47pm

This was a question about why restart the node, not why you want to rotate logs

I have configured logrotate with compression, and then use bzgrep to parse them when needed with on-the-fly decompression

I would not do that because of combination of

And Murphy's law - Wikipedia

If your logging solution can drop lines you can no longer trust it (for example, when looking at audit failure: was the missing chunk deleted by the satellite before checking (true story, literally my first message on this forum) but your log solution dropped the message or is it some other failure? You can’t know for sure if your log can skip messages) and therefore it’s useless.

TowerBR · July 12, 2024, 3:58pm

Oh, okay. Because in my first logrotate setup I thought it would be a good opportunity to restart the node and that would be “healthy”

But if I don’t truncate the log it will grow indefinitely, right? Or am I missing something?

arrogantrabbit · July 12, 2024, 4:56pm

Utilities like logrotate and newsyslog move the existing log file and send the HUP signal to the application, in response to which the application is supposed to close and re-open its log files. Thus nothing is lost, and the separated log file can be compressed and saved for later.

Storagenode, instead, dies on SIGHUP, so this is a no-go. @Alexey’s suggested log truncations is one workaround, but it can lose data.

Another, arguably better, workaround is to have storagenode log to stdout instead of a file, and run it under another utility that will handle the logs by redirecting to file, and properly managing log file handles in response to SIGHUP. daemon(8) is one such utility:

-H, –sighup Close output_file and re-open it when signal SIGHUP is received, for interoperability with newsyslog(1) and similar log rotation / archival mechanisms. If –output-file is not specified, this flag is ignored.

TowerBR · July 12, 2024, 7:00pm

Yes, solutions involving SIGHUP and SIGUSR1 seem to kill the node, based on quick tests I did.

When choosing between the available solutions, I’m considering the balance of risk versus implementation complexity, along with the remote chance of Murphy’s law causing me to lose the exact lines I need in the event of a unlikely problem. Therefore, I believe I will stick with the “truncation” solution for now.

arrogantrabbit · July 12, 2024, 10:41pm

There is a forth, even better solution: to submit PR to have node handle SIGHUP

pangolin · July 19, 2024, 1:04pm

The question is why storj devs not managed to implement such a basic functionality? How many years is this problem around?

Alexey · July 20, 2024, 7:48am

Because you still not submitted a PR I would assume?
Priorities. We are still a small team, the help from the Community is very welcome, especially for storagenode improvements (features).