18 h uptime in v1.108.3
and db looking like this. its on cached nvme disk.
do i have to worry?
Only thing to worry seems to be if database implementations as correct and efficient.
There is some discussion on Github:
You can also try stopping the node and vacuum the database. You might want to make a backup copy first though just to be safe. This was just a test I did on a old copy of some DB that I had.
More talk about vacuum in: Vacuum databases in a ramdisk to reduce downtime
Edit: Never mind, I just realized you are talking about the wal file, not the DB itself, but you can still vacuum if you want…
I do not think that you need to worry, it will be deleted on restart.
Yes, it will be deleted when the node is restarted. But this is a sign that the TTL collector on this node has big problems - it cannot cope with deleting expired pieces and, when restarting the node, it will most likely lose all progress (work already done) and start it over again - trying to delete files that have long been deleted already.
Details can be found in the description of the problem on the github at the link above.
A fix for this issue is already in the works, but will most likely only be integrated into version v109 at best or even v110.
So it’s better not to restart the node yet - there is a chance that sooner or later it will cope and clear everything, including the -WAL file when this happens. But it really took me almost 6 days of work for one of my nodes (5700 + 3000 MB for piece_expiration db files) . And on the second (~8500 MB + 3300 MB) it is still in process - 7 days and counting.
But if you restart it, it will only get worse, because the process will restart from the beginning.
P.S.
But there’s really no need to worry too much about it. This situation does not pose any risks to user data or the risk of node disqualification.
In the worst case (if the situation does not normalize itself after a few days or week later), you will have to delete this database file (it will be re-created and begin to fill up again) and then wait a next few weeks until the expired pieces are deleted by a regular Garbage Collector instead of the TTL collector.
I do not think so, actually this only mean that your node has problems to add records to the database on-line, so it’s adding them to the journal, which will be replayed on restart.
Did you move all databases to SSD to have a less latency and “locked” issues?
You are right.
No. They where there from the node start. A year ago.
However, it is solved today.