Pieces failed to migrate v0 piece. Piece may not be recoverable

mtone · August 11, 2024, 11:36pm

I’m getting lots of these warnings on my only node that is on 1.110. Haven’t seen those before. Anything of concern?

2024-08-11T20:50:19Z	WARN	pieces	failed to migrate v0 piece. Piece may not be recoverable	{"Process": "storagenode"}
2024-08-11T20:50:19Z	WARN	retain	failed to trash piece	{"Process": "storagenode", "cachePath": "config/retain", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "74MBENAXPOZNXY2MNI37UFR2FBK3SCJ27SKVWCCFKKP7WMA2PHQQ", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:124\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:340\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).TrashWithStorageFormat:406\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:422\n\tstorj.io/storj/storagenode/retain.(*Service).trash:428\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces.func1:387\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).processTrashPiece:112\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).writeLine:99\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).Write:78\n\tio.copyBuffer:431\n\tio.Copy:388\n\tos.genericWriteTo:269\n\tos.(*File).WriteTo:247\n\tio.copyBuffer:411\n\tio.Copy:388\n\tos/exec.(*Cmd).writerDescriptor.func1:578\n\tos/exec.(*Cmd).Start.func2:728"}
2024-08-11T20:50:19Z	WARN	pieces	failed to migrate v0 piece. Piece may not be recoverable	{"Process": "storagenode"}
2024-08-11T20:50:19Z	WARN	retain	failed to trash piece	{"Process": "storagenode", "cachePath": "config/retain", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Piece ID": "754TS5J5DYBC7G746J4DYIVJ4DRSZOG7C27LQLV4VMAIOSGYUSAA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:124\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:340\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).TrashWithStorageFormat:406\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:422\n\tstorj.io/storj/storagenode/retain.(*Service).trash:428\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces.func1:387\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).processTrashPiece:112\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).writeLine:99\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*TrashHandler).Write:78\n\tio.copyBuffer:431\n\tio.Copy:388\n\tos.genericWriteTo:269\n\tos.(*File).WriteTo:247\n\tio.copyBuffer:411\n\tio.Copy:388\n\tos/exec.(*Cmd).writerDescriptor.func1:578\n\tos/exec.(*Cmd).Start.func2:728"}
2024-08-11T20:50:19Z	INFO	lazyfilewalker.gc-filewalker	subprocess finished successfully	{"Process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-08-11T20:50:19Z	INFO	retain	Moved pieces to trash during retain	{"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 83551, "Failed to delete": -83551, "Pieces failed to read": 0, "Pieces count": 3479638, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "8m0.71244498s", "Retain Status": "enabled"}

That’s a funny total: "Deleted pieces": 83551, "Failed to delete": -83551

Alexey · August 12, 2024, 5:39am

These ones - yes, the piece has been corrupted.

These are not. This means that the data with TTL was moved to the trash by the garbage collector before the expiration. The customer can delete the TTL data before the expiration date. So kind of “normal”.

Yes, this is interesting.

flo82 · August 12, 2024, 6:45am

I’m having the same logs since upgrade.

@Alexey Here you said that it’s OK Failed to migrate v0 piece. Piece may not be recoverable - #7 by Alexey

Alexey · August 12, 2024, 7:01am

If it’s tried to delete them? Yes, it’s ok. However, if you would get the same for GET_AUDIT or GET_REPAIR, it would mean the failed audit.

But you should be worried because this means that there may be damage in other pieces.

flo82 · August 12, 2024, 7:16am

My nodes don’t have any issues with HDD filesystem or SSD database. There are other people with these types of problems. So I would expect a software bug.

Alexey · August 12, 2024, 7:38am

Did you have them before upgrade to 1.110.x?
Because my nodes (1.109.2) don’t have such an error, and they are quite old (when the v0 parts were uploaded).
Maybe related:

then this warning could be just a consequence of attempt to remove the piece, which was already moved by the garbage collector to the trash.

flo82 · August 12, 2024, 7:53am

Yes. Had them here in Nov 23: Piecedeleter could not send delete piece to trash .. "error": "pieces error: v0pieceinfodb: sql: no rows in result set" - #27 by flo82

Node is still healthy.

Alexey · August 12, 2024, 8:06am

Good to know. Then it is related to the some corrupted v0 pieces. And the good thing, that they appears only for deletions.
They were on one of my nodes and for the Saltlake satellite where it was eventually disqualified (an auditor found them). No errors have occurred since I moved this node back to Windows, and before that there were corruptions of v0 pieces due to kernel panics and using NTFS under Linux.

Roxor · August 12, 2024, 11:57am

You monster!

mtone · August 12, 2024, 2:57pm

Thanks, I did check that my audit is still at 100% . They consistently come in pair with failed to trash warnings and stop exactly at end of retain/gc-filewalker job. So It doesn’t sound like a random corruption problem?

Still getting the same entries this morning in logs on a different satellite. I count 30000 pairs of warnings with total 31989 in last entry:

2024-08-12T11:17:16Z	INFO	retain	Moved pieces to trash during retain	{"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 31989, "Failed to delete": -31989, "Pieces failed to read": 0, "Pieces count": 16548404, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "31m55.699918659s", "Retain Status": "enabled"}

Edit: Did a e2fsck, all good. Since they’re warnings I won’t worry too much.

Alexey · August 13, 2024, 4:01am

Otherwise how would I know that using NTFS under Linux is horrible idea?
But actually I tried to migrate to Linux and convert the NTFS disk with data in-place:

Unfortunately my setup has a hardware issues, incompatible with Linux, so it working normally only under Windows (after two of three patches of every new Windows version).

Alexey · August 13, 2024, 4:08am

I confused these warnings from retain with warnings from the TTL collector. These ones from retain are new.
So, definitely an interesting issue. I need to wait until my nodes are upgraded to 1.110.3, may be it’s related to that version.

jammerdan · August 13, 2024, 4:43am

I see them on a 1.110.3 node too:

2024-08-13T02:11:06Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:06Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:06Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:07Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:07Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:07Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:07Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:07Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:07Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:07Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:07Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:07Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:08Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:08Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:08Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:08Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:08Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}
2024-08-13T02:11:09Z    WARN    pieces  failed to migrate v0 piece. Piece may not be recoverable        {"Process": "storagenode"}

and there are many of them:

docker logs storagenode | grep migrate | wc -l
253785

TWH · August 23, 2024, 4:11am

I noticed the same errors for the first time on one of my nodes running v1.110.3. The node audit score also went down a fraction of a percent on a few satellites then the score went back up to 100%.

Alexey · August 25, 2024, 2:20am

@mtone @jammerdan @TWH
Could you please share what’s the OS and a platform?

TWH · August 25, 2024, 2:55am

The node is running in a Debian 11.10 VM using docker.

mtone · August 25, 2024, 3:12am

Ubuntu 24.04, docker 27.1.x, ext4 on lvm cache, DBs on SSD, no badger, lazy enabled. Also a VM on Windows hyper-v. 100% audit & 99.x% online on all nodes atm.

jammerdan · August 25, 2024, 3:16am

I cant remember on which node I saw it. Some Linux Debian OS (10/11 or 12).

Mark · August 25, 2024, 3:43am

I just checked and I also had over 300K entries for that error message. And also the “failed to delete” with the negative number but only for the period of time between 2024-08-11 - 2024-08-16.
I only see this on my node that stores data for US1 satellite.

pi@Four:/seagate4/storj2us/storagenode $ grep -i "failed to delete" node.log
2024-07-25T03:05:21Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 232093, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 944055, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "7m52.93481316s", "Retain Status": "enabled"}
2024-07-27T18:29:12Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 68683, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 1741430, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "16m40.792855444s", "Retain Status": "enabled"}
2024-07-29T17:32:15Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 37983, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 1892800, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "14m18.100645327s", "Retain Status": "enabled"}
2024-07-31T17:44:51Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 41527, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2260114, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "23m35.78412322s", "Retain Status": "enabled"}
2024-08-03T03:34:13Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 132689, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2598168, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "22m52.403881233s", "Retain Status": "enabled"}
2024-08-04T08:09:37Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 60524, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2714337, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "22m21.1962619s", "Retain Status": "enabled"}
2024-08-06T13:33:45Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 150743, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2909073, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "31m34.026949571s", "Retain Status": "enabled"}
2024-08-08T14:08:24Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 114531, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 3027871, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "22m46.246403346s", "Retain Status": "enabled"}
2024-08-10T03:31:44Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 101697, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 3090103, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "1h44m30.828066249s", "Retain Status": "enabled"}
2024-08-11T22:52:00Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 93382, "Failed to delete": -93382, "Pieces failed to read": 0, "Pieces count": 3319424, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "41m31.783135804s", "Retain Status": "enabled"}
2024-08-13T14:43:00Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 144342, "Failed to delete": -144342, "Pieces failed to read": 0, "Pieces count": 3480229, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "39m18.865287763s", "Retain Status": "enabled"}
2024-08-16T02:12:24Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 102729, "Failed to delete": -102729, "Pieces failed to read": 0, "Pieces count": 3737899, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "57m57.583758772s", "Retain Status": "enabled"}
2024-08-17T20:52:59Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 148267, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 3907217, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "32m8.716430777s", "Retain Status": "enabled"}
2024-08-20T07:10:14Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 256161, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 3970789, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "29m50.056435378s", "Retain Status": "enabled"}
2024-08-22T11:45:56Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 119612, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 3785692, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "26m33.822417085s", "Retain Status": "enabled"}
2024-08-24T19:17:06Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 114100, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 3662110, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "26m43.135920421s", "Retain Status": "enabled"}
pi@Four:/seagate4/storj2us/storagenode $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

Alexey · August 25, 2024, 3:44am

Could you please check what was a reason of failed deletions?