Disqualification during Graceful Exit


I’ve just realized that a few days after I started Graceful Exit on July, my node was disqualified on Saltlake satellite and I don’t know why. Anyone could help me?

Besides, I would like to finalize the Graceful Exit as soon as possible and, apart of that satellite, it’s pending for europe-north-1 one. Anyone knows why it’s still pending?

Thanks and cheers!

The disqualification reason will be shown in the node logs. You need to look for ERROR entries.

Yes, the satellite has been decommissioned and any held amount returned.


From the guide:

So the reason is likely that your node has failed more than 10% of transfers.
As @Stob suggested, you need to search for transfer errors and errors like “file not found” (search for “graceful”/“piecetransfer” and “ERROR”/“failed”).
To calculate a number of failed transfers you need to filter logs by “piecetransfer” and group them by PieceID and “ERROR”/“transferred”. Please note - the node will attempt to transfer each piece at least 5 times before it will be considered as a failed transfer, so you need to calculate only failed transfers which have 5 attempts for the same PieceID.
You may also give me a NodeID and I can provide you with exact numbers, however I cannot provide a reason - it’s stated only in your logs.

1 Like

Thanks, @Stob, for your answer. I can confirm that satellite returned me the held amount.

Thanks, @Alexey, for your answer, as well. Since, I’m not an expert, here you have the NodeID: 12oV5bdzCtFrxTequKDkkTBjH8Pd4Cf62sEfDNBPAJUUDuTT5zo. With the information you provide me, I’ll try to search for on logs.
In any case, it’s clear for me I’ve lost the held amount in that satellite. Well, it’s a pity!

It’s transferred 10,820 pieces and failed to transfer 2,235 pieces, pretty high fail rate I would say. Either your node has corrupted data, or your connection was terrible bad.
For example, for US1 it’s transferred 494,039 pieces and 0 failed.

1 Like

Thanks for info. The problem was with my connection. I’ve suffering issues when my telco provider changed my public IP. Thanks for the support. Cheers!

1 Like

I’m facing the same issue on a small node (ID: 1G7CA8T8NwUYLibFqR85TUpXEXKKbwFDLz93srFMPgRDpcvzFj) that I’m currently graceful-exiting: It got disqualified a few days ago on Saltlake:

There are a few things I don’t get:

  • It failed although all scores are perfect on the dashboard, as shown above
  • I thought that with the new graceful-exit mechanism, the node was not supposed to transfer files anymore and simply had to stay online for 30 days?

If I check the exit-status on this node, it tells me the following:

# docker exec -it storj_node_5 /app/storagenode exit-status --config-dir /app/config
2023-10-14T21:42:11Z	INFO	Configuration loaded	{"process": "storagenode", "Location": "/app/config/config.yaml"}
2023-10-14T21:42:11Z	INFO	Anonymized tracing enabled	{"process": "storagenode"}
2023-10-14T21:42:11Z	INFO	Identity loaded.	{"process": "storagenode", "Node ID": "1G7CA8T8NwUYLibFqR85TUpXEXKKbwFDLz93srFMPgRDpcvzFj"}

Domain Name                  Node ID                                              Percent Complete  Successful  Completion Receipt
saltlake.tardigrade.io:7777  1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE   15.83%            N           0a483046022100e35f852b2beadd3696ff4aa206dde97524c09e88470d75f9c47c9db9d48a3feb022100cf4d9d875d07320a50724f8fa6156f8d4cf1036e0727958ee70327ccdc50ff1710021a207b2de9d72c2e935f1918c058caaf8ed00f0581639008707317ff1bd0000000002220224d400222d8bb705507ddebe3bb8381a8d34edf85600ca604011670000000002a0c08b9e29ca906108baab0d203  
ap1.storj.io:7777            121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6  100.00%           Y           0a473045022100979aa4f52bdad116144a4d1e4ef160a99af8c8608065a5464237040ca6a94ef002204d0b5d2d07fdbc69049001aea5512413225f8fd2f0c394f7d97515441696d404122084a74c2cd43c5ba76535e1f42f5df7c287ed68d33522782f4afabfdb400000001a20224d400222d8bb705507ddebe3bb8381a8d34edf85600ca60401167000000000220c08e78785a90610ec87f08a01        
us1.storj.io:7777            12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S  100.00%           Y           0a473045022036280c2dd40a2647ab86cc7722deb8f551dbe6d43fa88fa29b72eff0ab3d8b06022100efa22d0fac30af1280d853684df3c049f4c19ca9c6995aa5d0ab7db3c96b9ef81220a28b4f04e10bae85d67f4c6cb82bf8d4c0f0f47a8ea72627524deb6ec00000001a20224d400222d8bb705507ddebe3bb8381a8d34edf85600ca60401167000000000220c08c8b9fea80610ee9deff501        
eu1.storj.io:7777            12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs  100.00%           Y           0a47304502202d2a568f6a9f5e02aae4eb51444bc109cb5e5b1521373dfbd1f0a4e3f3f87ee7022100a14894756d0e0ac1a5f69259fc14c512eb65a44e38e50be4f738561eef370ba01220af2c42003efc826ab4361f73f9d890942146fe0ebe806786f8e71908000000001a20224d400222d8bb705507ddebe3bb8381a8d34edf85600ca60401167000000000220c08f6acfea80610aec9dbf001        

I see some percentages above although I thought they were not meant to be used anymore.

I had a held amount of $1.30 on Saltlake on this small node so, you know… it doesn’t really matter but the behavior is somewhat confusing.

In my logs, I see millions of the following errors so I guess my node was not working properly:

2023-10-11T23:32:47Z	ERROR	piecetransfer	failed to put piece	{"process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Piece ID": "3M5STYLAPU4QEIIZMHXSBICTOL2QRP42PYOAFZGYQTIBH5I3NCBA", "Storagenode ID": "12A2ntyfgmDMmGDvpxwCy2ZDWgxB38MYNedn6rGYkYrPJYHMDBD", "error": "ecclient: upload failed (node:12A2ntyfgmDMmGDvpxwCy2ZDWgxB38MYNedn6rGYkYrPJYHMDBD, address: protocol: write tcp> use of closed network connection; write tcp> use of closed network connection; piecestore: piecestore close: write tcp> use of closed network connection", "errorVerbose": "ecclient: upload failed (node:12A2ntyfgmDMmGDvpxwCy2ZDWgxB38MYNedn6rGYkYrPJYHMDBD, address: protocol: write tcp> use of closed network connection; write tcp> use of closed network connection; piecestore: piecestore close: write tcp> use of closed network connection\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:244\n\tstorj.io/storj/storagenode/piecetransfer.(*service).TransferPiece:148\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func3:100\n\tstorj.io/common/sync2.(*Limiter).Go.func1:49"}
2023-10-11T23:32:47Z	ERROR	gracefulexit:chore.1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE@saltlake.tardigrade.io:7777	failed to send notification about piece transfer.	{"process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "error": "EOF", "errorVerbose": "EOF\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func3:105\n\tstorj.io/common/sync2.(*Limiter).Go.func1:49"}
2023-10-11T23:32:47Z	ERROR	gracefulexit:chore	worker failed	{"process": "storagenode", "error": "gracefulexit: context canceled while waiting to receive message from storagenode", "errorVerbose": "gracefulexit: context canceled while waiting to receive message from storagenode\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run:90\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing.func1:82\n\tstorj.io/common/sync2.(*Limiter).Go.func1:49"}

Taking care of a node is hard! :sweat_smile:

Disqualification during Graceful Exit is happened because your node is failed to transfer more than 10% of pieces (each transfer takes 5 attempts to transfer a piece to different nodes before considered as failed).

it’s not deployed yet, it should be deployed soon:

These are communication errors, so likely your router wasn’t able to handle a lot of parallel transfers.
This is another reason why we want to change the complicated Graceful Exit.

1 Like

Oh damn :frowning:

Another node got disqualified this morning which feels unfair if it’s because my router can’t handle the load… Especially as I’m trying to help the network here by gracefully exiting. Is there anything I can do to slow down those transfers and stop nodes that are exiting from being disqualified?

I guess changing these parameters:

# number of concurrent transfers per graceful exit worker
# graceful-exit.num-concurrent-transfers: 5

# number of workers to handle satellite exits
# graceful-exit.num-workers: 4

What about these new numbers:

graceful-exit.num-concurrent-transfers: 2
graceful-exit.num-workers: 2


Yes, you may try to reduce these parameters, save the config and restart the node.

1 Like

Thanks for your unfailing help, as always @Alexey.

Tried with 2 and 2. Still had quite a lot of errors.

I’m now down to 1 and 1, and although logs look better than with default values, there are still many errors. RaspbianOS is pretty chill right now (load average: 0.36, 0.26, 0.28 and an average constant upload of 2MiB/s). I don’t know what else I can do…

This graceful-exit mechanism seems pretty unreliable, I’m glad we’re moving to the shiny new one soon! :slight_smile:

Too bad I misunderstood that it wasn’t live yet… that’s my bad.

1 Like