4.2TB of data went missing this morning

Eleos · August 21, 2024, 3:53pm

So yesterday and for the past 2 months, this node has been completely full. All of a sudden this morning, 4.2TB of data went missing and it’s not in the trash. What happened?

I do have 2 nodes and the 2nd one is full too but it’s completely fine. Both are running on docker, the same linux headless system, OS is on an SSD via USB.

Roxor · August 21, 2024, 3:59pm

If the satellite is hovering around a 4TBm average then you probably still have around another 4TB to go: hooray for disk cleanups! For it to disappear that fast a used-space-filewalker run probably just completed to update your stats.

Eleos · August 21, 2024, 4:07pm

The average disk space used has been at 4TB for the past month or more at least and that was another issue I was going to bring up. The other node doesn’t have that issue and is sitting at a nice 14TB average usage.

EasyRhino · August 21, 2024, 4:31pm

there have been problems the last month or two with uncollected garbage where nodes don’t get a bloom filter to trigger a garbage cleanup, or they fail in processing them.

If this happens, the satellites (left) will report low usage because that figure excludes trash the moment the custom deletes it.

(ignore dips in the left graph, those are from days of a missing report).

But if the disk usage on the right doesn’t show a lot of trash, then it still thinks it has desirable data, but doesn’t.

The usual workaround is to restart the node and give it time to finish the used space filewalker and that usually makes the reported disk usage more closely resemble the satellite.

Eleos · August 21, 2024, 4:51pm

When you say restart the node do you mean stop the docker container for the node and start it again or do you mean restart the whole system? Also, is there a way to monitor the file walker and see how much time it’ll take to complete?

Alexey · August 22, 2024, 4:54am

It could be a TTL collector, search for collector in your logs.
It also could be a successfully finished used-space-filewalker, search for "\sused-space" and `“started|completed”.

docker restart storagenode

should do the trick.

for the docker

docker logs storagenode 2>&1 | grep "\sused-space" | grep -E "started|completed" | tail

if you want to track it in a real time, then add -f to the docker logs command.
To check for errors:

docker logs storagenode 2>&1 | grep "error" | grep -E "filewalker|database" | tail