Quick growth of piece_expiration.db

Walter1 · January 15, 2024, 5:58pm

Hello together,

only some months ago I started a new node and it has already ~200 MB of the piece_expiration.db. Can somebody explain the high growth rate and how to shrink it? The filewalker is not active.

Thanks and kind regards,

Knowledge · January 15, 2024, 6:06pm

Well, without Filewalker running to purge expired segments, you are going to build them up (And not get paid for having them) which subsequently increases the database size

nerdatwork · January 15, 2024, 6:31pm

Considering above data, does it mean I have 435 pieces that got left behind from the year 2022 ?

Walter1 · January 15, 2024, 6:32pm

I got also way older nodes who have like 60 MB. Don’t know if it is filewalker-related.

Knowledge · January 15, 2024, 6:47pm

Fair enough. I don’t know off the top of my head the database retention rules. Will ask an engineer.

daki82 · January 15, 2024, 7:24pm

well ,my filwalker is runnig for sure, and i have same 250MB .db… at 3TB node data

maybe clients add data for only a short time?

Toyoo · January 15, 2024, 9:23pm

Try vacuuming it.

Walter1 · January 15, 2024, 9:44pm

Thanks alot for the vacuum tipp.

Have tried it with a 440 MB piece_expiration. Brought only 40 MB or 10% to 400 MB. Will have to delete the piece_expiration.db as they are clearly the reason for the grafana-dashboard failures and very likely slow read-rate.

Walter1 · January 18, 2024, 7:13am

I habe also a high file on bandwith.wal:

Walter1 · January 18, 2024, 1:26pm

When the data-bases are moved to a SSD, then the problem is solved. Unfortunately it also happends to a single node on a 20 TB CMR Enterprise HDD.

Alexey · January 19, 2024, 3:40am

https://www.sqlite.org/wal.html

nerdatwork · January 19, 2024, 3:46am

@Alexey your opinion on this please.

Alexey · January 19, 2024, 3:48am

I have no idea, I usually respond only when I think that I know the answer.

nerdatwork · January 19, 2024, 3:50am

I appreciate and admire that in you hence the title of Awesome

snorkel · January 19, 2024, 6:02am

I thought it’s only me, and I was going to ignore it, but I also have that db way too big. I remember I checked all dbs a few monyhs ago and the biggest was the bandwidth. Now the piece…db is the biggest. I’ll edit with all the sizes from my nodes in an hours or so. So this si a recent thing. The FW was run on all nodes last week.
How can I “vacuum” them? Is it dangerous? Do I have to stop the nodes?

Alexey · January 19, 2024, 7:18am

You can vacuum them the usual way for SQLite:

Stop and remove the container (or stop the service in case of Windows/Linux GUI)
Run the command

However

I think you should not fix what’s not broken.

snorkel · January 19, 2024, 10:59am

I don’t know, but something fishy is happening. The piece_expiration.db is growing FAST!
These are all my nodes; take note that those small ones are started on 30-31 december, so I don’t even know if they are fully vetted, and have huge db for their age, too. The FW was run.
Synology’s, Exos, ext4, no ssd.

Node: space occupied / piece_expiration.db

node11: 12.3TB / 527MiB
node12: 431GB / 77MiB

node21: 14.3TB / 512MiB
node22: 426GB / 77MiB

node31: 6.8TB / 160MiB
node32: 6.8TB / 255MiB

node41: 11.1TB / 522MiB
node42: 425GB / 78MiB

node51: 12.8TB / 530MiB
node52: 433GB / 78MiB

node61: 14.4TB / 418MiB

node71: 14.5TB / 324MiB
node72: 654GB / 105MiB

node81: 14.4TB / 516MiB
node82: 427GB / 78MiB

Walter1 · January 19, 2024, 11:09am

I have also several 400-500 MB piece_expiration DBs. Like I said the only solution was to move the DBs to a SSD and to change the --mount order, with this guide:

Now the dashboard and the STORJ-Exporterlogs are fine after the DB-moving.

thepaul · January 19, 2024, 3:11pm

It certainly seems like something is wrong. The collector service on your node should be continually trying to get rid of any entries older than the current time. My only theory is that it is experiencing some error trying to delete those files. If that’s the case, it would keep those entries around so it could try again later. Do you have any errors in the log relating to a service called “collector”?

Another thing to check would be whether the deletion_failed_at column is set for those entries:

SELECT
    satellite_id, piece_id, piece_expiration, deletion_failed_at
FROM
    piece_expirations
WHERE
    piece_expiration < '2023-01-01 00:00:00';
ORDER BY
    piece_expiration;

that would mark the last time the node failed in trying to delete those pieces. If it’s blank, something altogether different is going on.

nerdatwork · January 19, 2024, 3:48pm

The semi colon terminated the where clause so I edited it

I will work on that and edit this post.

Edit:

2023-12-25T02:17:48Z    ERROR   collector       unable to update piece info     {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "2XLSFR7RKZXCZZUBAATNNJRFNZG7DTSXQGKQ3UILLHTHO3AZ3FKQ", "error": "pieceexpirationdb:
database is locked", "errorVerbose": "pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).DeleteFailed:99\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteFailed:597\n\tstorj.io/storj/storagenode/collector.(*Servic
e).Collect:109\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tsto
rj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-12-25T02:17:48Z    ERROR   collector       unable to delete piece  {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "2XLSFR7RKZXCZZUBAATNNJRFNZG7DTSXQGKQ3UILLHTHO3AZ3FKQ", "error": "pieces error: database is
locked", "errorVerbose": "pieces error: database is locked\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteExpired:365\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:344\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagen
ode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Ru
n.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}