When will "Uncollected Garbage" be deleted?

thepaul · August 15, 2024, 4:49pm

Toyoo:

I like it. One nitpick to a comment:

	// SetExpiration sets an expiration time for the given piece ID on the given satellite. If pieceSize
	// is non-zero, it may be used later to decrement the used space counters without needing to call
	// os.Stat on the piece.

The OS does have to read the inode on unlink, but of course it unfortunately doesn’t return any of the info from the deleted inode to the unlink caller. We have to call stat separately. I’m not sure why a subsequent unlink doesn’t read the inode from cache at that point, but some profiling (on ext4+linux) suggests that it does not, in at least some cases. Maybe for something like a delete, a cached inode is not good enough? I haven’t gone into depth here.

This hadn’t occurred to me. I was under the impression most people ran their nodes with a much higher RLIMIT_NOFILE. Is 1024 even enough to run a fair-sized node in normal operation?

pwilloughby · August 15, 2024, 5:24pm

github.com

golang/go/blob/2693f77b3583585172810427e12a634b28d34493/src/syscall/rlimit.go#L30


      
          // Go does not use select, so it should not be subject to these limits.
          // On some systems the limit is 256, which is very easy to run into,
          // even in simple programs like gofmt when they parallelize walking
          // a file tree.
          //
          // After a long discussion on go.dev/issue/46279, we decided the
          // best approach was for Go to raise the limit unconditionally for itself,
          // and then leave old software to set the limit back as needed.
          // Code that really wants Go to leave the limit alone can set the hard limit,
          // which Go of course has no choice but to respect.
          func init() {
          	var lim Rlimit
          	if err := Getrlimit(RLIMIT_NOFILE, &lim); err == nil && lim.Cur != lim.Max {
          		origRlimitNofile.Store(&lim)
          		nlim := lim
          		nlim.Cur = nlim.Max
          		adjustFileLimit(&nlim)
          		setrlimit(RLIMIT_NOFILE, &nlim)
          	}
          }

edo · August 20, 2024, 7:38am

Hey everyone, I’m still wrestling with the uncollected data issue, and despite some patience, I’m seeing little improvement. My nodes have been sitting full at around 50% uncollected data for nearly two months now, which is making it impossible to start accumulating new data.

I understand the advice has been to wait it out as the BFs should eventually get smaller and work faster—but so far, that doesn’t seem to be happening. I’ve already filled out @elek’s form for several of my nodes, but it’s been a few weeks, and I haven’t heard back about any (preliminary) results.

For some context see the graph below that illustrates the SLC trash folder size over time: I’m holding on average 150 GB of trash for SLC the last 30 days, while I have more than 7 TB of uncollected data!

@littleskunk, I totally get your point about wanting a more constructive approach—so I’m here and ready to help! What steps can I take to gather more useful information to help find the root cause? It’s a genuine issue, and I’d love to see it resolved soon so we can all get back to business as usual (and maybe even have some extra room for more paid data).

Looking forward to any suggestions or guidance you can offer. Thanks, everyone!

agente · August 20, 2024, 9:07am

I’m facing similar challenges with a large number of nodes. Unfortunately, my storage is becoming saturated due to the accumulation of unpaid and unused test data, which is causing me to lose valuable long-term EU/US data.
I’m here to help. Let me know if you need some logs or forms compiled.
PS: I exited SL with half nodes.

chinabjy · August 20, 2024, 12:01pm

Alexey · August 21, 2024, 4:38am

Did you compare the last fully reported usage from the satellites with the usage reported by the OS?
Please ignore the average, it’s wrong. You may use the usage from the piechart, if it’s matched the usage reported by the OS (you need to measure in SI units (base 10) to compare).
For my nodes:

edo · August 21, 2024, 5:58am

Hi @Alexey, I did.

Could you provide us some insights from the team related to the BF form @elek provided weeks ago? Any information is much appreciated! Thank you.

Alexey · August 21, 2024, 6:00am

I do not have any insights except working on a replacement for the TTL database. There is no additional information related to the garbage collector, except occasional audits from the trash, it’s a few, but still. This is suspicious, because it may mean that the GC and/or BF are working incorrectly, so I wouldn’t push on that too hard, we do not need disqualifications or the customers’ data loss. I would allow the team to figure out, why this is happen and fix it.

If the TTL issue would be fixed, then all TTL data would be handled as it should without need of GC at least for skipped TTL data.

chinabjy · August 21, 2024, 6:23am

I haven’t received data on this node for two months now. The satellite report only has a few GB of data left, but the BF result shows that there are still 20 million files

2024-08-05T23:46:00Z INFO retain Moved pieces to trash during retain {“Process”: “storagenode”, “cachePath”: “config/retain”, “Deleted pieces”: 31728, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 25582635, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “7h53m32.145322693s”, “Retain Status”: “enabled”}
2024-08-08T13:46:51Z INFO retain Moved pieces to trash during retain {“Process”: “storagenode”, “cachePath”: “config/retain”, “Deleted pieces”: 44093, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 25550907, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “9h30m36.369603234s”, “Retain Status”: “enabled”}
2024-08-10T09:42:59Z INFO retain Moved pieces to trash during retain {“Process”: “storagenode”, “cachePath”: “config/retain”, “Deleted pieces”: 33167, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 25506814, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “7h29m42.474423903s”, “Retain Status”: “enabled”}
2024-08-12T07:48:11Z INFO retain Moved pieces to trash during retain {“Process”: “storagenode”, “cachePath”: “config/retain”, “Deleted pieces”: 70503, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 25473647, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “4h46m26.861424848s”, “Retain Status”: “enabled”}
2024-08-14T06:40:06Z INFO retain Moved pieces to trash during retain {“Process”: “storagenode”, “cachePath”: “config/retain”, “Deleted pieces”: 99756, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 25403144, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “2h9m31.278654266s”, “Retain Status”: “enabled”}
2024-08-16T14:26:26Z INFO retain Moved pieces to trash during retain {“Process”: “storagenode”, “cachePath”: “config/retain”, “Deleted pieces”: 40584, “Failed to delete”: -40584, “Pieces failed to read”: 0, “Pieces count”: 10303391, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “6h23m0.284257194s”, “Retain Status”: “enabled”}
2024-08-18T13:14:08Z INFO retain Moved pieces to trash during retain {“Process”: “storagenode”, “cachePath”: “config/retain”, “Deleted pieces”: 73246, “Failed to delete”: -73246, “Pieces failed to read”: 0, “Pieces count”: 25222442, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “10h23m40.721871302s”, “Retain Status”: “enabled”}
2024-08-20T10:31:11Z INFO retain Moved pieces to trash during retain {“Process”: “storagenode”, “cachePath”: “config/retain”, “Deleted pieces”: 142210, “Failed to delete”: -142210, “Pieces failed to read”: 0, “Pieces count”: 25149196, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “13h15m21.033751074s”, “Retain Status”: “enabled”}

Alexey · August 21, 2024, 6:26am

Does the usage on the piechart match the usage reported by the OS?

edo · August 21, 2024, 6:26am

Interesting to know why failed to delete is a negative number!?

Alexey · August 21, 2024, 6:27am

Yes, me too. No answer so far. I guess that the garbage collector was unable to delete pieces, which were already deleted by the TTL collector.
Thus I suspect that numbers on a piechart are incorrect. As result the node reports itself as full.
This may happen due to failed filewalkers and/or errors related to the databases.

chinabjy · August 21, 2024, 6:32am

The pie chart matches the operating system report

edo · August 21, 2024, 6:34am

Thanks for the feedback. So I understand the team is investigating issues with GC and BF, which might also explain why large amounts of uncollected garbage from TTL is not deleted.

Can you keep us updated on any progress? Thank you.

Also if there is anything we can do to help analyze issues, let us know. I’m happy to help out!

Alexey · August 21, 2024, 6:35am

Why there is an overusage? Did you reduce an allocation?
Do you have a message less than requested in your logs?

chinabjy · August 21, 2024, 7:02am

Because at that time, the hard drive couldn’t handle IOPS and the success rate was too low, so I reduced the allocation and stopped receiving data

Alexey · August 21, 2024, 8:05am

I see. So, at least the used-space-filewalker should be happy and updated your databases with the actual values.
So, only waiting for a Garbage Collector or TTL collector to remove the excess data.

edo · August 21, 2024, 8:38am

It seems we missed that window since the rollout for v1.111 has already begun:

{
    "storagenode": {
        "minimum": {
            "version": "1.110.3",
            "url": "https://github.com/storj/storj/releases/download/v1.110.3/storagenode_{os}_{arch}.zip"
        },
        "suggested": {
            "version": "1.111.4",
            "url": "https://github.com/storj/storj/releases/download/v1.111.4/storagenode_{os}_{arch}.zip"
        },
        "rollout": {
            "seed": "45ec21dc18b18307d9805a4c786d9001059b3b807d5adfba61c5aa538c9bbd9c",
            "cursor": "0f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c28f5c"
        }
    }
}

Looks like the code isn’t quite ready for merging yet. Fingers crossed it’ll be good to go for v1.112! Thanks for all the hard work you all are putting into getting this fixed!

elek · August 21, 2024, 11:51am

Yeah, thanks for the information. I checked the nodes, and the provided information was very helpful, as it removes the uncertainty of the false reports from the picture.

My current understanding (which can be changed over the time):

thanks to the continuous improvement on BF side, we have way less discrepancies on US1/EU1/AP1
We clearly have some problems on SLC, but it can be related to the time delay between different reports (satellite may notice TTL expiration faster)
I checked the code, and it looks good, TTL data supposed to be deleted (at least if the modification time of the piece files are not screwed up on the file system)
There are some strong limits for the expired piece deletion chore (1000 entries are deleted per hour)
This can be configured, but it’s not very bad. Assuming we uploaded 5k/sec test data to 12k nodes (available nodes), uploading 80 pieces per segment (65 + not cancelled fast enough), it may generated 120k pieces in an hour ((5000 * 80) / 12000 * 60 * 60).
We didn’t have continuous testing, all the time, so it’s probably not a problem, but this is certainly what I am testing. (I have access to a node with high discrepancies… I am counting the pieces + bumping configuration settings + repeating the count)
But still, those pieces supposed to be deleted by the GC (and some reported node are very fast to process GC BFs…). I am still investigating this part…

Roxor · August 21, 2024, 12:43pm

So… nodes can store TTL data faster than they’re allowed to delete it… and any extra starts to get handled by the regular garbage-collection system? It’s great that you found it: and that it slowly fixes itself!