Updates on Test Data

Is the size shown in the log?

It’s not shrinking as much as I would expect. Two weeks with (almost) no ingress should likely delete half of the test data (what was uploaded between 4 and 6 weeks ago) since the test data is supposed to have 30d TTL. However, the node shrunk only by a little (a few TB).

I’m sorry, but I would expect that you need to run another round of a used-space-filewalker…
We have had several bugs (which are fixed, but perhaps not rolled out yet), so the another round of a used-space-filewalker should keep the databases updated.

1 Like

One of my nodes is unable to keep up with the TTL deletes and running it since 70h already. So yea it is possible that a node is falling behind with the TTL deletes. It does affect only one of my nodes for some unknown reason. My suspicion is the new badger cache. It would help if I would run the used space filewalker. One execution and the cache would know all the pieces sizes I have on disk. I don’t run the used space filewalker. I should run it to correct the numbers on my dashboard but as long as I still have free space to fill I see no need to correct the numbers and keep the used space filewalker disabled. → TTL delete hits empty cache and slows down a bit.

My recommendation would be to check /mon/ps output to see how the TTL deletes are doing. You don’t want to see something like this:

[2829538955632910137,8638826813066407777] storj.io/storj/storagenode/collector.(*Service).Collect() (elapsed: 72h25m39.379630284s)
 [434335277620690664,8638826813066407777] storj.io/storj/storagenode/pieces.(*Store).GetExpired() (elapsed: 72h25m39.379624266s)
  [988878055489152159,8638826813066407777] storj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired() (elapsed: 72h25m39.333418717s)
   [9058240293391052955,8638826813066407777] storj.io/storj/storagenode/pieces.(*Store).DeleteSkipV0() (elapsed: 164.819823ms)
    [3248952435957555315,8638826813066407777] storj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat() (elapsed: 164.811971ms)
     [6663036615378833482,8638826813066407777] storj.io/storj/storagenode/blobstore/filestore.(*Dir).Stat() (elapsed: 164.809563ms)

I didn’t try it without the cache yet. Thats on my todo list for next week.

Search for “Filter Size”.

2024-07-25T07:32:00+02:00       INFO    retain  Prepared to run a Retain request.       {"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Created Before": "2024-07-18T17:59:59Z", "Filter Size": 13862095, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}

I restarted my node to run the used space filewalker, it is still running. My problem is that the used space on disk roughly matches what the dashboard shows, but the “used space” that is reported by the satellite is decreasing.
Saltlake only:

All satellites:

df output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        56T   44T   12T  79% /storj

I do not see GetExpired in the /mon/ps output.

2024-07-23T03:54:27Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-07-17T17:59:59Z", "Filter Size": 17000003, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-07-24T23:44:49Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-07-19T17:59:59Z", "Filter Size": 17000003, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}

Filter size was the same since at least July 13 (I did not check older logs).

Just asking with rookie knowledge. As far as I understand, the data is beeing deleted, to free up space, so the node can write new data again. Wouldn’t it be more efficient just to overwrite the trash which is older than 7 days instead of deleting it, to rewrite data? Like windows does it when you click “delete” it doesn’t get deleted driectly, it is just marked as overwritable, when the space is needed. Wouldn’t this speed up the process, or is it maybe already done partially?

So this seems to be the max size they can make. Still way to small for big nodes… :poop:

25MB is the max size that was last reported. I hope they have bumped it up a little but considering how resource extensive it is, it could take a while to get bigger BFs.

That’s what every filesystem does.
That’s the very reason why removing a file, originally is being called unlink: you’re unlinking an inode from it’s directory, that’s all what needs to happen in order to throw the file away. In background of course the blocks/extents of the file need to be added to the free-map. This is also the very reason, why you can recover deleted files to a certain extent. Depending on the filesystem the extent of recoverability differs of course.

2 Likes

These are what I got for a 9TB node. The node was restarted 4 days ago and the first one is still in the queue as the node did not finished with the used-space + GC + BF filewalkers…
It is by the directory 4d in the 07-24 trash folders. Most of the directories have 7k+ files.
I have a 07-19 folder in the trash for the same sat, there are around 4k+ files in each folder…
And still nothing in the other sat’s trash folder.

And in reality, the drive has 1,5TB free space, out of 10TB…

Sorry I may be dense, but are there magic words I can search for in my logs to look for TTL delete info?

I know for regular bloom filter / garbage collection I can look for “retain” and “gc”.

Look for collector in the log while at info level. You get more details at debug level.

2 Likes

oh wow literally just unable to delete piece becuase the file doesn’t exist warnings.

1 Like

How is current state after 3 month? Is all none SLC data paid by now?

2 Likes

Incoming data has stopped for me. Anyone else?

yes also for me, so it all ok.
as there 1.110 released and started rollout, it is goo to have some break for update and filewalkers. it on version.storj.io but not on github jet. soon will be i think.

Test has been stopped temporarily while they perform some checks.

6 Likes

The scary thing about turning off test ingress… is that we all get reminded of the slow erosion of TTL data beneath the surface, that’s usually hidden. You can watch your %-used gradually get nibbled away as the nodes continue their housekeeping… like your disk has sprung a leak. :sweat_drops:

2 Likes

Seems not so fast: When will "Uncollected Garbage" be deleted?

2 Likes