If the problem is that it does not release the handles on SIGHUP, you can run it with logs redirected under daemon, that does process SIGHUP and otherwise works correctly with logrotate/newsyslog.
You may also configure to use a truncate instead, of course you may lose some lines, while it copies the content, but in exchange the node will not require to be restarted.
The downside of the daily restarts is a constantly running used-space-filewalker, especially in a lazy mode (in that mode it takes days to finish), so your disks will be always busy on 100%, unless you have a cache solution for your disk subsystem.
I prefer to have logs in a more granular way, on a daily basis, and stored externally. This makes it easier to search if something happens, as I don’t have to deal with a 500 MB log file and millions of lines.
Considering what was exposed here and the content of the topic mentioned above, I modified the script to copy the log without restarting the node, truncating the log file. It seems to be working fine.
If your logging solution can drop lines you can no longer trust it (for example, when looking at audit failure: was the missing chunk deleted by the satellite before checking (true story, literally my first message on this forum) but your log solution dropped the message or is it some other failure? You can’t know for sure if your log can skip messages) and therefore it’s useless.
Utilities like logrotate and newsyslog move the existing log file and send the HUP signal to the application, in response to which the application is supposed to close and re-open its log files. Thus nothing is lost, and the separated log file can be compressed and saved for later.
Storagenode, instead, dies on SIGHUP, so this is a no-go. @Alexey’s suggested log truncations is one workaround, but it can lose data.
Another, arguably better, workaround is to have storagenode log to stdout instead of a file, and run it under another utility that will handle the logs by redirecting to file, and properly managing log file handles in response to SIGHUP. daemon(8) is one such utility:
-H, –sighup Close output_file and re-open it when signal SIGHUP is received, for interoperability with newsyslog(1) and similar log rotation / archival mechanisms. If –output-file is not specified, this flag is ignored.
Yes, solutions involving SIGHUP and SIGUSR1 seem to kill the node, based on quick tests I did.
When choosing between the available solutions, I’m considering the balance of risk versus implementation complexity, along with the remote chance of Murphy’s law causing me to lose the exact lines I need in the event of a unlikely problem. Therefore, I believe I will stick with the “truncation” solution for now.
Because you still not submitted a PR I would assume?
Priorities. We are still a small team, the help from the Community is very welcome, especially for storagenode improvements (features).