Quick growth of piece_expiration.db

thepaul · January 19, 2024, 6:43pm

Hrm. Is it only the one instance of that error or is it recurring?

nerdatwork · January 19, 2024, 7:03pm

It is recurring over multiple logs.

thepaul · January 19, 2024, 7:13pm

Interesting. I wonder if something is trying to open a second handle to that db when it’s already open. Something must have a handle open to it somewhere.

Does anyone else have these errors? That would explain some excessive growth of the db file.

snorkel · January 19, 2024, 8:30pm

I can’t check my nodes, so don’t take my silence as a NO :). I believe @Ruskiem said something about locked db-es in other thread.

Alexey · January 20, 2024, 9:05am

Perhaps it could be a lazy filewalker process? It’s a separate Storagenode instance

nerdatwork · January 20, 2024, 2:07pm

2023-12-23T03:07:19Z    WARN    retain  failed to delete piece  {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "YT4EZ34ZRG3V2DW7JFMOGWP7YGYUR6QS7QWVMDOHEYBX6QUUE5WA", "error": "pieces error: pieceexpirationdb:
database is locked", "errorVerbose": "pieces error: pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:403\n\tstorj.io/storj/storagenode/retain.(*Service)
.trash:364\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-12-23T13:02:59Z    WARN    retain  failed to delete piece  {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "ZW5JE4F5IRSPSUZQY7PDVYMS57OXMIMFHUWCL6VQ4PASHEXO7YEA", "error": "pieces error: pieceexpirationdb:
database is locked", "errorVerbose": "pieces error: pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:403\n\tstorj.io/storj/storagenode/retain.(*Service)
.trash:364\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-12-23T13:03:10Z    WARN    retain  failed to delete piece  {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "ZW5JJITWFSJ3344EA6KB4KMKXZLEAER3QXJ5L6PIVLG4CRY2NJNQ", "error": "pieces error: pieceexpirationdb:
database is locked", "errorVerbose": "pieces error: pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:403\n\tstorj.io/storj/storagenode/retain.(*Service)
.trash:364\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-12-23T13:03:20Z    WARN    retain  failed to delete piece  {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "ZW5JVD3G4WDX6MQ2URMAOWDLI5EVGGC4Y3RTJ7OV3ODXCX4RUNDQ", "error": "pieces error: pieceexpirationdb:
database is locked", "errorVerbose": "pieces error: pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:403\n\tstorj.io/storj/storagenode/retain.(*Service)
.trash:364\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-12-23T13:03:33Z    WARN    retain  failed to delete piece  {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "ZW5JX4MRYGOXVSHNU6EC5IGP2DNWGHUW6FCLLLXH5LMBK5L6LBPQ", "error": "pieces error: pieceexpirationdb:
database is locked", "errorVerbose": "pieces error: pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).Trash:112\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:403\n\tstorj.io/storj/storagenode/retain.(*Service)
.trash:364\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-12-24T07:49:11Z    ERROR   piecestore      upload failed   {"process": "storagenode", "Piece ID": "GD6NBZ5KONLZWAZOTV2WUR6I546WOITG3WM4YSR6TCMP25IWEBPQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "pieceexpirationdb:
database is locked", "errorVerbose": "pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).SetExpiration:67\n\tstorj.io/storj/storagenode/pieces.(*Store).SetExpiration:587\n\tstorj.io/storj/storagenode/piecestore.(*End
point).Upload.func6:483\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:541\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/
common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35", "Size": 261632,
"Remote Address": "172.17.0.1:48520"}
2023-12-24T07:49:11Z    ERROR   piecestore      upload failed   {"process": "storagenode", "Piece ID": "FSS34SO6KZZLTFA4ZSRZUBELITHSHAE7L4VHFOAQCZN73KQCJTTA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "pieceexpirationdb:
database is locked", "errorVerbose": "pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).SetExpiration:67\n\tstorj.io/storj/storagenode/pieces.(*Store).SetExpiration:587\n\tstorj.io/storj/storagenode/piecestore.(*End
point).Upload.func6:483\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:516\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/
common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35", "Size": 11264, "Remote
Address": "172.17.0.1:48484"}
2023-12-25T02:17:48Z    ERROR   collector       unable to update piece info     {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "2XLSFR7RKZXCZZUBAATNNJRFNZG7DTSXQGKQ3UILLHTHO3AZ3FKQ", "error": "pieceexpirationdb:
database is locked", "errorVerbose": "pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).DeleteFailed:99\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteFailed:597\n\tstorj.io/storj/storagenode/collector.(*Servic
e).Collect:109\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tsto
rj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-12-25T02:17:48Z    ERROR   collector       unable to delete piece  {"process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "2XLSFR7RKZXCZZUBAATNNJRFNZG7DTSXQGKQ3UILLHTHO3AZ3FKQ", "error": "pieces error: database is
locked", "errorVerbose": "pieces error: database is locked\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteExpired:365\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:344\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagen
ode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Ru
n.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-12-29T00:54:29Z    ERROR   piecestore      upload failed   {"process": "storagenode", "Piece ID": "HBJVKPMU7BK4EIEDSK3TUFT4VC5RAX6ULI6AXGUOY3MDKKADZ4IQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "pieceexpirationdb:
database is locked", "errorVerbose": "pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).SetExpiration:67\n\tstorj.io/storj/storagenode/pieces.(*Store).SetExpiration:587\n\tstorj.io/storj/storagenode/piecestore.(*End
point).Upload.func6:483\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:541\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:243\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/
common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:124\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:66\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:114\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35", "Size": 287232,
"Remote Address": "172.17.0.1:42876"}

More log entries if it helps.

Alexey · January 20, 2024, 3:58pm

seems you need to move databases to the SSD…

daki82 · January 20, 2024, 7:45pm

Also an usb flashdrive seems to work fine! I use this
[Samsung USB-Stick Typ-A BAR Plus (MUF-128BE4/APC), 128 GB]

snorkel · January 22, 2024, 12:32am

You gave me an ideea… to use flash drives on my Synology’s for db-es. I think there was a discussion about this, and the conclusion was that they can’t handle the I/O; but if it works for you, maybe it’s a good solution.

snorkel · January 22, 2024, 12:40am

Reexamining my db sizes on all nodes, I just realised that on the old nodes they are ~530MB, except one, which is ~330MB.
What is the difference between them? On this one, I ran the FW with lazy off. On the others, the lazy was on. So, it seems that the lazy fw has something to do with the piece_expiration.db getting huge.

nerdatwork · January 22, 2024, 2:21am

You could run the SQL commands over the db to see what it holds. Copy the db file to another location then execute SQL commands on it.

daki82 · January 22, 2024, 4:18am

Just take a decent one. (i chose the samsung because of the 5y warranty and i did read a test)
i reformated them to 4k ntfs.(i use windows)

maybe the realy old ones? with fat32 or exfat?
but an CMR drive can? doubt.
in iops (at 4k cluster), it has around 3-6k wich is 8 to 20 times of an cmr disk.
Also dbs are non critical. so reliability does not realy matter

snorkel · January 22, 2024, 6:33am

What commands? I’m not familiar with SQL queries.

nerdatwork · January 22, 2024, 7:13am

First SQL query

SELECT
 strftime('%Y', piece_expiration) AS year,
 COUNT (piece_id) AS num_piece_ids
FROM
   piece_expirations
GROUP BY
  year
ORDER BY
   year;

Second SQL query

SELECT
    satellite_id, piece_id, piece_expiration, deletion_failed_at
FROM
    piece_expirations
WHERE
    piece_expiration < '2023-01-01 00:00:00'
ORDER BY
    piece_expiration;

You can get the binaries from Downloads - DB Browser for SQLite

snorkel · January 26, 2024, 5:17am

I did this queries on one of my nodes, 14 months old, with piece_expiration.db of 530MiB and the Vacuum.
Results:

the vacuum shrinked all 14 db-es with 9% per total, so the piece_expiration.db is still huge.
firt query gave me:
2022 46
2023 420
2024 2272544
9999 26369
second query dosen’t show anything in the “reason for failed deletes” column, just NULL.
I verified for < 2023, <2024, <2025.

Walter1 · January 26, 2024, 8:22am

When I did the vaccuming I also got like 10% only, had also a higher expectation.

Unique · February 12, 2024, 9:42am

Vaccum is only removing empty space. It wont shrink if the size is because there is data there.