Graceful Exit Guide (new procedure as of 2023-10-?)

thepaul · September 29, 2023, 3:41pm

Warning: Read this carefully before you start

Please read the following information carefully and ask any questions your might have before executing graceful exit. Once graceful exit is started there is currently no way to stop or cancel it. Think about the consequences twice before you call it.

Requirements:

Storage node has joined the network more than 15 months ago (requirement temporarily reduced to 6 months).
Storage node is healthy and hasn’t lost any significant amount of data. Disqualification during graceful exit is possible.
Storage node will have no huge downtime during the graceful exit period (30 days). The uptime score requirement is higher during graceful exit (0.8) than it is normally (0.6). If your node has too much downtime during the graceful exit period, the graceful exit will fail and you will not get back your held amount.

Start Graceful Exit

Are you sure you want to start graceful exit? Did you read the information above? Do you understand that graceful exit can’t be canceled?

Here is how you initiate it:

root@kali:~# storagenode exit-satellite
By starting a graceful exit from a satellite, you will no longer receive new uploads from that satellite.
This action can not be undone.
Are you sure you want to continue? [y/n]
 :y
Domain Name                       Node ID                                              Space Used
us2.storj.io:7777                  12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo   534.38 MB
saltlake.tardigrade.io:7777        1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE   629.72 GB
ap1.storj.io:7777                  121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6  68.04 GB
us1.storj.io:7777                  12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S  153.83 GB
eu1.storj.io:7777                  12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs  131.80 GB
europe-north-1.tardigrade.io:7777  12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB  13.79 GB
Please enter a space delimited list of satellite domain names you would like to gracefully exit. Press enter to continue:

us2.storj.io:7777 saltlake.tardigrade.io:7777 ap1.storj.io:7777 us1.storj.io:7777 eu1.storj.io:7777 europe-north-1.tardigrade.io:7777

You can exit the satellites one by one or all at the same time. The affected satellites will repair data off of your node as they determine appropriate.

During Graceful Exit

You can watch the status of graceful exit like this: (example output from a test satellite)

root@kali:~# storagenode exit-status 

Domain Name      Node ID                                              Percent Complete  Successful  Completion Receipt
127.0.0.1:10000  12fbck97kqEGbWPu673CpeyrXavtqgVriyv9pCfL3mpw3yz2zN9  0.00%             N           N/A

The “Percent Complete” field is a relic of the old graceful exit and is no longer meaningful. It will likely be removed at some point.

If you get the output No graceful exit in progress, that means graceful exit didn’t start because of the node age requirement. In the storage node logs you will find additional information such as: node is not yet eligible for graceful exit: will be eligible after 2020-04-02 01:18:23.910919 +0000 UTC.

In case of a crash, power failure, or other outage, please get your storage node back online. Graceful exit will continue.

Finish Graceful Exit

After the graceful exit period (currently 30 days) is complete, your node will no longer be in a graceful exit. Either it will succeed (if your uptime has been high enough) or it will fail.

At the end you will get output like this:

root@kali:~# storagenode exit-status

Domain Name      Node ID                                              Percent Complete  Successful  Completion Receipt
127.0.0.1:10000  12fbck97kqEGbWPu673CpeyrXavtqgVriyv9pCfL3mpw3yz2zN9  0.00%             Y           0a473045022100da86329cfb4f5bb16f0702c1d073c3a8b54787311b54855bcf01a8e245250040022003ef911b3b2b2bea86ba34cd4927223f2718cd35c3b7de7cc030cd3a8ce4959a1220db55bd9fa76e8938be5a7a25c970d48bde19936e269dcf69a3ab9fa41b5486001a207508f9a6138cdc4089ea075f1553736d472cb1d3afa4397496a8eb948d121200220c08abe5dcf0051086e6fefe01

Your node should automatically delete any remaining data for the satellite(s) it exited from when graceful exit is complete.

As long as graceful exit was successful you will get back your held amount with the next regular payout. “Completion Receipt” contains a signature from the satellite and is your ticket to get the payback. Please keep your storagenodeID, each satelliteID and each completion receipt. With this information you can open a support ticket if needed.

If Graceful Exit was initiated under the old system

If your node was already undergoing a graceful exit when we deploy this new procedure, it will continue with its graceful exit. If the 30-day period has already elapsed since you started graceful exit, your node’s graceful exit will automatically and immediately end, and you will pass or fail graceful exit based on your node’s uptime score. If the 30 day period has not yet elapsed, you only need to wait until it does elapse.

In short, your time spent gracefully exiting under the old system will still count toward your graceful exit period under the new system.

snorkel · September 29, 2023, 6:56pm

I don’t get the uptime requirement. As I understand the new system, the other nodes don’t actualy need your node to repair all the data stored on it to other nodes. They fetch the pieces from other nodes. Why should your node stay up in this time at all, and more than normal? What did I missed?

Alexey · September 30, 2023, 2:46am

@thepaul I may turn the original post to the wiki and we may update it to do not have a duplicate.

Alexey · September 30, 2023, 2:49am

The repair job will take pieces from your node too, if their segments need to be repaired.
Your node stores a one piece for the segment, and many segments could be affected, so we want to have a source of pieces to repair before your node is gone.

odarriba · September 30, 2023, 1:51pm

Is this new procedure already live?

If not, if I start GE in the old system, will it end if all pieces are transferred or it will wait for 30days too?

I have some small nodes (~1TB) to GE and it would take less than 30days on the old system

Alexey · September 30, 2023, 2:43pm

I believe it will calculate the whole 30 days online period.

Pac · September 30, 2023, 9:30pm

What’s the exact node version that does implement this new GE system?

thepaul · September 30, 2023, 11:36pm

It’s generally not necessary for your node to be online for a segment to be repaired, you are right. That’s why the system still works when nodes go offline, whether permanently or temporarily. When your node is still online, though, the data is a little bit safer and a little bit easier to get (say, somewhere between 1/80 and 1/29 safer). That’s not much, but if an operator wants to decommission hundreds of nodes at once, it could add up. Letting us know about your plan to take a node offline in advance is the “graceful” part of graceful exit.

This change is actually entirely on the satellite side, starting with the current release candidate for v1.89. The node already knew how to “wait for further instructions”; the new satellite-side code simply keeps telling the node to wait for further instructions until graceful exit is complete. We will eventually take out the storagenode side of the graceful exit code, but we’d want to make sure all satellites are running the new code first.

Not yet. It’s in QA right now.

I’m afraid it will still wait for 30 days. Checking for that case is a potential improvement we could make.

Note, though, that you might be overestimating how fast graceful exit worked under the old system. People have usually been surprised at how much time it can take to transfer pieces one at a time when you can’t choose the fastest N nodes out of M- with graceful exit you have to try to transfer each piece to the node it is assigned, and if that node is overburdened and slow then your node just has to wait for it. Out of all nodes that are currently in GE on us1, for example, they have been working on it for 32 days so far on average. And I’d have to dig harder for this statistic, but I think 1TB is a pretty average node size.

Alexey · October 1, 2023, 1:04am

@thepaul I converted the existing Graceful Exit Guide to the wiki, so you may update it.

littleskunk · October 13, 2023, 4:23pm

QA finished. We tested what happens if a node starts a graceful exit while the old code is active and is currently transferring pieces the old way. With the deployment, they will stop transfering any pieces and if they started graceful exit more than 30 days ago it will finish right away.

It will be enabled with the next satellite deployment early next week.

littleskunk · October 17, 2023, 9:51pm

Looks good in production as well. The first nodes have finished graceful exit with the new code.

Pac · October 18, 2023, 11:54am

So, it is live now?

I checked an exiting node and it seems the constant egress I had stopped, indeed. It has very low activity and way less errors in the logs.

thepaul · November 8, 2023, 10:22pm

I’m sure you know now, but for posterity: Yes, it is live now.

Kalachnikos · February 13, 2024, 3:47pm

Hello,
My node under windows 10 is disqualified since many many months.
Must I have to do a graceful exit?
If no, do I lost something?
If yes, is there a specific tuto for windows?
In any case, how to delete all the storj file in the hard disk?
Thanks in advance for your answers.

Knowledge · February 13, 2024, 3:51pm

If your node is disqualified, you don’t need to graceful exit. Any escrow that was tied to your node went to pay for the repair of the data that was lost when your node was dq’d.

I believe you can uninstall Storj and then wipe the data folder. It all depends on what location you specified during setup.

Kalachnikos · February 13, 2024, 3:55pm

Thank you for your quick answer
Storj was instal on a laptop with a external hard drive.

ItsHass · March 22, 2024, 1:47pm

is the requisite still 6 months ?

Alexey · March 23, 2024, 4:03pm

yes, and it’s still better than planned:

snorkel · April 28, 2024, 7:46pm

I still get log entries about Saltlake, even though I GE from it last year. This entry appears for all 4 satellites in log, once in 4 hours. Is it normal? Souldn’t the GE sat be ingored from future audits, etc?
I didn’t do the forget-satellite procedure; I did the classic GE call before the new implementations.

2024-04-26T05:13:51Z    INFO    reputation:service      node scores updated ....

nerdatwork · April 28, 2024, 8:10pm

What’s the entire log line ?