Retain process is not running on my us1.storj.io node

Greetings,

The retain process is not currently running on my US Storage node.

I have been monitoring the process run cycle on my nodes and I noted that the process is running on my other three regions, but not the US one.

Upon reviewing the logs, it looks like it ran into an issue during the 3/1/2024 run and deleted all but six pieces and then has not run since.

I restarted the node on 3/10/24, but it did not run as expected on 3/15/2024. See attached screenshot summarization of the previous two weeks of retain process runs. Here are the relevant logs from the errors noted during the 3/1/2024 run. There are no errors associated with the retain process for the US node dated after 3/1/2024.

Storj is running via Windows, via ISCSI drive to NAS. Data on NAS is Synology’s SHR2 RAID config.

I would appreciate your advice on debugging this issue.

Thanks in advance.

2024-03-01T11:23:11-06:00 INFO retain Prepared to run a Retain request. {“Created Before”: “2024-02-21T17:59:59Z”, “Filter Size”: 4100003, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-03-01T11:23:11-06:00 INFO lazyfilewalker.gc-filewalker starting subprocess {“satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-03-01T11:23:11-06:00 INFO lazyfilewalker.gc-filewalker subprocess started {“satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-03-01T11:23:11-06:00 INFO piecestore download started {“Piece ID”: “7X7Y5MSVCUJNRHKRAPR5ICOZDOM47BCI3ZJVMLVOSYFDV4ISCNHQ”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET”, “Offset”: 0, “Size”: 217600, “Remote Address”: “79.127.223.129:52630”}
2024-03-01T11:23:12-06:00 INFO lazyfilewalker.gc-filewalker.subprocess Database started {“satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “process”: “storagenode”}
2024-03-01T11:23:12-06:00 INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker started {“satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “process”: “storagenode”, “createdBefore”: “2024-02-21T17:59:59Z”, “bloomFilterSize”: 4100003}
2024-03-01T12:02:28-06:00 INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {“satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “piecesCount”: 34948349, “piecesSkippedCount”: 0, “process”: “storagenode”}
2024-03-01T12:02:29-06:00 INFO lazyfilewalker.gc-filewalker subprocess finished successfully {“satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}2024-03-01T12:10:25-06:00 WARN pieces failed to migrate v0 piece. Piece may not be recoverable
2024-03-01T12:10:25-06:00 WARN retain failed to delete piece {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “4DK27CWN232HFKYML3OVV76OAAPLI545KDQB46DDFXJBLA3YD5GA”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:110\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Trash:290\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:404\n\tstorj.io/storj/storagenode/retain.(*Service).trash:373\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2024-03-01T12:43:24-06:00 WARN pieces failed to migrate v0 piece. Piece may not be recoverable
2024-03-01T12:43:24-06:00 WARN retain failed to delete piece {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “HFLK3WGVOCC7TTXNRCC3CICYP2MFXQG3XUCY7A5KTWJFZ334Q5ZA”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:110\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Trash:290\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:404\n\tstorj.io/storj/storagenode/retain.(*Service).trash:373\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2024-03-01T12:53:48-06:00 WARN pieces failed to migrate v0 piece. Piece may not be recoverable
2024-03-01T12:53:48-06:00 WARN retain failed to delete piece {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “KTKX4KVTA6TWM2RXPPEDEVH33DM6UR2JONH3H232P7MDU5HWEY3A”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:110\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Trash:290\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:404\n\tstorj.io/storj/storagenode/retain.(*Service).trash:373\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2024-03-01T12:54:07-06:00 WARN pieces failed to migrate v0 piece. Piece may not be recoverable
2024-03-01T12:54:07-06:00 WARN retain failed to delete piece {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “KWSRNMTO4VIZ7OC2R2NDWMGYVRP3BBH5N625T3SZPLKHXERDVAUA”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:110\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Trash:290\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:404\n\tstorj.io/storj/storagenode/retain.(*Service).trash:373\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2024-03-01T13:03:31-06:00 WARN pieces failed to migrate v0 piece. Piece may not be recoverable
2024-03-01T13:03:31-06:00 WARN retain failed to delete piece {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “NP33MQY6MYXGWNWVJFXXFW2ZH5SZUW3DEWU4RR5YE2WGPOR22DYA”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:110\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Trash:290\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:404\n\tstorj.io/storj/storagenode/retain.(*Service).trash:373\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2024-03-01T13:20:04-06:00 WARN pieces failed to migrate v0 piece. Piece may not be recoverable
2024-03-01T13:20:04-06:00 WARN retain failed to delete piece {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “SX2PXXJHZBJ3DDKAEOLH6JJYBTIXQCDPPPQNZCWJVCLNM66MF7UQ”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).Stat:110\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Trash:290\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:404\n\tstorj.io/storj/storagenode/retain.(*Service).trash:373\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2024-03-01T13:42:37-06:00 INFO retain Moved pieces to trash during retain {“Deleted pieces”: 1736436, “Failed to delete”: 6, “Pieces failed to read”: 0, “Pieces count”: 34948349, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Duration”: “2h19m25.3197935s”, “Retain Status”: “enabled”}

1 Like

Seems these pieces already removed by the collector process earlier. Could you please search that piece in your logs to see its lifecycle?

Here is the log information for those six pieces:

2024-03-01T11:55:27-06:00 INFO collector deleted expired piece {Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: 4DK27CWN232HFKYML3OVV76OAAPLI545KDQB46DDFXJBLA3YD5GA}
2024-03-01T11:55:27-06:00 INFO collector deleted expired piece {Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: HFLK3WGVOCC7TTXNRCC3CICYP2MFXQG3XUCY7A5KTWJFZ334Q5ZA}
2024-03-01T11:55:27-06:00 INFO collector deleted expired piece {Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: KTKX4KVTA6TWM2RXPPEDEVH33DM6UR2JONH3H232P7MDU5HWEY3A}
2024-03-01T11:55:27-06:00 INFO collector deleted expired piece {Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: KWSRNMTO4VIZ7OC2R2NDWMGYVRP3BBH5N625T3SZPLKHXERDVAUA}
2024-03-01T11:55:27-06:00 INFO collector deleted expired piece {Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: NP33MQY6MYXGWNWVJFXXFW2ZH5SZUW3DEWU4RR5YE2WGPOR22DYA}
2024-03-01T11:55:27-06:00 INFO collector deleted expired piece {Satellite ID: 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S, Piece ID: SX2PXXJHZBJ3DDKAEOLH6JJYBTIXQCDPPPQNZCWJVCLNM66MF7UQ}

So, they already deleted automatically due to expiration. Is retain finished for this satellite?

sls retain "$env:ProgramFiles\Storj\Storage Node\storagenode.log" | sls "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S" | sls "started|finished" | select -last 4

It looks like it finished:

2024-03-01T13:42:37-06:00 INFO retain Moved pieces to trash during retain {“Deleted pieces”: 1736436, “Failed to delete”: 6, “Pieces failed to read”: 0, “Pieces count”: 34948349, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Duration”: “2h19m25.3197935s”, “Retain Status”: “enabled”}

1 Like

My apologies, I missed your code. I will run it.

It will give the same :slight_smile:

I ran it for all four nodes to see if there was any difference in the results.
I didn’t see any new retain commands in the log.

Oh, you are correct. For the retain you need to search not started|finished, but Prepare|Move

sls retain "$env:ProgramFiles\Storj\Storage Node\storagenode.log" | sls "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S" | sls "Prepare|Move" | select -last 4

That generated a result. The below result looks identical to the 3/1/2024 data I previously snipped:

C:\Program Files\Storj\Storage Node\storagenode.log:3708959:2024-03-01T11:23:11-06:00 INFO retain Prepared to run a Retain request. {“Created Before”: “2024-02-21T17:59:59Z”, “Filter Size”: 4100003, “Satellite ID”:
“12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
C:\Program Files\Storj\Storage Node\storagenode.log:3784584:2024-03-01T13:42:37-06:00 INFO retain Moved pieces to trash during retain {“Deleted pieces”: 1736436, “Failed to delete”: 6, “Pieces failed to read”: 0, “Pieces count”:
34948349, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Duration”: “2h19m25.3197935s”, “Retain Status”: “enabled”}

1 Like

@DanielTB80 has your retain issue been fixed?

I am experiencing a similar issue, I have 3 large nodes about 10TB each, while being paid for around 7TB. Some of my smaller nodes have received the filters from US1 without issue.

I might have not received a bloom filter on Mar 17, I forgot to check, but I’m likely in the similar boat as you and Mar 1 was the last time I got one from US1. Week of Mar 24 was an instance where garbage collection was not run on any nodes. Week of Mar 31 these nodes did not receive a filter, and not it repeats again for week of Apr 7.

Sample logs from one of my nodes:

2024-03-30T23:48:14-07:00	INFO	lazyfilewalker.gc-filewalker.subprocess	gc-filewalker started	{"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "bloomFilterSize": 1078086, "process": "storagenode", "createdBefore": "2024-03-26T17:59:59Z"}
2024-04-04T12:55:35-07:00	INFO	lazyfilewalker.gc-filewalker.subprocess	gc-filewalker started	{"process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "process": "storagenode", "createdBefore": "2024-02-17T13:02:55Z", "bloomFilterSize": 30330}
2024-04-05T05:39:13-07:00	INFO	lazyfilewalker.gc-filewalker.subprocess	gc-filewalker started	{"process": "storagenode", "satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "process": "storagenode", "createdBefore": "2024-04-01T17:59:59Z", "bloomFilterSize": 283992}
2024-04-06T23:00:59-07:00	INFO	lazyfilewalker.gc-filewalker.subprocess	gc-filewalker started	{"process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "process": "storagenode", "createdBefore": "2024-04-02T17:59:59Z", "bloomFilterSize": 1092462}

It receives bloom filters from all others but US1. I will note that all three of these nodes have been online since the last weekend, due to node update.

No change, I have not seen an us1.storj.io node retain run since March 1st.

The other nodes continue to show retain runs complete successfully.

root@datengrab:~# grep retain /home/storagenode/logs/storagenode1.log.1 | grep 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S | grep -v 'About to move piece to trash'
2024-04-06T06:14:24+02:00       INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "/mnt/sn1/storagenode1/storagenode/retain", "Created Before": "2024-03-27T17:59:59Z", "Filter Size": 2277668, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-04-06T06:31:55+02:00       INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "/mnt/sn1/storagenode1/storagenode/retain", "Deleted pieces": 1353374, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 8400999, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "17m31.453934572s", "Retain Status": "enabled"}