Node daily restart

I rotate the node log with logrotate daily, using a script that also restarts the node. This script takes 10~20 seconds to execute.

Is this daily restart harmful to satellite statistics? I believe it is beneficial for the node’s performance, correct?

Expect all sorts of strange things to happen.

In the past, nodes were not able to resume various tasks in progress after a restart, and efforts to improve this are still ongoing.

1 Like

What is the maximum allowed size for your log before it rotates ?

Thanks for the advice, I hadn’t seen that other topic.

Checking the logs, I only have upload errors, for example, in a daily log with 231550 lines I have 1223 upload errors.

Node statistics show “full 100%”:

It’s configured to run daily, regardless of size. Compressed log files are around 10 MB in size.

Why?

If the problem is that it does not release the handles on SIGHUP, you can run it with logs redirected under daemon, that does process SIGHUP and otherwise works correctly with logrotate/newsyslog.

An example here:

1 Like

You may also configure to use a truncate instead, of course you may lose some lines, while it copies the content, but in exchange the node will not require to be restarted.

The downside of the daily restarts is a constantly running used-space-filewalker, especially in a lazy mode (in that mode it takes days to finish), so your disks will be always busy on 100%, unless you have a cache solution for your disk subsystem.

3 Likes

I prefer to have logs in a more granular way, on a daily basis, and stored externally. This makes it easier to search if something happens, as I don’t have to deal with a 500 MB log file and millions of lines.

Considering what was exposed here and the content of the topic mentioned above, I modified the script to copy the log without restarting the node, truncating the log file. It seems to be working fine.

1 Like

This was a question about why restart the node, not why you want to rotate logs :slight_smile:

I have configured logrotate with compression, and then use bzgrep to parse them when needed with on-the-fly decompression

I would not do that because of combination of

And Murphy's law - Wikipedia

If your logging solution can drop lines you can no longer trust it (for example, when looking at audit failure: was the missing chunk deleted by the satellite before checking (true story, literally my first message on this forum) but your log solution dropped the message or is it some other failure? You can’t know for sure if your log can skip messages) and therefore it’s useless.

1 Like

Oh, okay. Because in my first logrotate setup I thought it would be a good opportunity to restart the node and that would be “healthy” :upside_down_face:

But if I don’t truncate the log it will grow indefinitely, right? Or am I missing something?

Utilities like logrotate and newsyslog move the existing log file and send the HUP signal to the application, in response to which the application is supposed to close and re-open its log files. Thus nothing is lost, and the separated log file can be compressed and saved for later.

Storagenode, instead, dies on SIGHUP, so this is a no-go. @Alexey’s suggested log truncations is one workaround, but it can lose data.

Another, arguably better, workaround is to have storagenode log to stdout instead of a file, and run it under another utility that will handle the logs by redirecting to file, and properly managing log file handles in response to SIGHUP. daemon(8) is one such utility:

-H, –sighup Close output_file and re-open it when signal SIGHUP is received, for interoperability with newsyslog(1) and similar log rotation / archival mechanisms. If –output-file is not specified, this flag is ignored.

1 Like

Yes, solutions involving SIGHUP and SIGUSR1 seem to kill the node, based on quick tests I did.

When choosing between the available solutions, I’m considering the balance of risk versus implementation complexity, along with the remote chance of Murphy’s law causing me to lose the exact lines I need in the event of a unlikely problem. Therefore, I believe I will stick with the “truncation” solution for now.

2 Likes

There is a forth, even better solution: to submit PR to have node handle SIGHUP :slight_smile:

1 Like

The question is why storj devs not managed to implement such a basic functionality? How many years is this problem around?

1 Like

Because you still not submitted a PR I would assume?
Priorities. We are still a small team, the help from the Community is very welcome, especially for storagenode improvements (features).