Graceful Exit Guide (new procedure as of 2023-10-?)

2024-04-28T21:11:56Z    INFO    reputation:service      node scores updated     {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Total Audits": 523134, "Successful Audits": 519418, "Audit Score": 1, "Online Score": 0.9991681623901493, "Suspension Score": 1, "Audit Score Delta": 0, "Online Score Delta": 0, "Suspension Score Delta": 0}
2024-04-28T21:11:57Z    INFO    reputation:service      node scores updated     {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Total Audits": 1301477, "Successful Audits": 1290617, "Audit Score": 1, "Online Score": 0.997728891381732, "Suspension Score": 1, "Audit Score Delta": 0, "Online Score Delta": 0, "Suspension Score Delta": 0}
2024-04-28T21:11:57Z    INFO    reputation:service      node scores updated     {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Total Audits": 2129063, "Successful Audits": 2104703, "Audit Score": 1, "Online Score": 0.9980742554063785, "Suspension Score": 1, "Audit Score Delta": 0, "Online Score Delta": 0, "Suspension Score Delta": 0}
2024-04-28T21:11:58Z    INFO    reputation:service      node scores updated     {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Total Audits": 2127844, "Successful Audits": 2108114, "Audit Score": 1, "Online Score": 0.9971667868874082, "Suspension Score": 1, "Audit Score Delta": 0, "Online Score Delta": 0, "Suspension Score Delta": 0}

Do you see successful GET_AUDIT for Saltlake in your log ?

I don’t grep any GET_AUDIT from any sat in the entire log.

It may return reputation as of now. I would consider this as a low priority bug, but will let the team to decide. Thanks for reporting!

Just exited Saltlake on a single node for now (considering more) but bandwidth used by the node hasn’t changed. Should the satellite stop uploading data to my node once GE starts?

In the previous version of the Graceful Exit it should. But I think not in the current one. I would ask the team.

Yes, it should stop selecting your node for new uploads. Two possible explanations for your continued bandwidth:

  1. You didn’t mention what type of bandwidth it is. Possibly you are measuring the total of outbound and inbound traffic, and it is the outbound traffic that is continuing?

  2. There have been some major changes to node selection since this change. Possibly this broke along the way. I will look into it.

2 Likes

Ok, I looked a little more into it and it appears that this particular node seems to have the majority of it’s traffic on the customer satellites (at the time) and not so much on Saltlake which is not the behavior I expected since pretty much all the other nodes are exactly the opposite. I picked a smaller node at random to exit Saltlake just for the sake of doing it and assumed the same behavior as others. When the overall bandwidth usage didn’t change I got curious if it was working or not as this is the first time I’ve ever performed a GE. Checking the dashboard though after rolling over to a new day, it clearly shows no bandwidth usage for Saltlake so it is working as expected. Apologies for wasting your time.

4 Likes

Awesome! Glad to hear it.

1 Like

When GE is finished, how much time it takes GC to clean the data? Does it follow the “7 days in trash” way? Or it’s instant deletion?
Should I let the GC do it’s thing or should I run the forget sat procedure, just to speed up the process?
What’s the maximum downtime requirement for a successful GE?

I do not think that GC is used. It should remove the data right away, if it’s failed, you may either remove it by using a How To Forget Untrusted Satellites instruction, or delete it manually.

there is no one. But your online score should be greater than 80% all the time.

there is no known dependency between these two processes. But perhaps you would figure out and share your expirience.

So I get that when GE successful finish receipt is received, the deletion of sat’s data is triggered. Is there a log entry I should look for? Like “… started, … finished”?
The … finished part should also coincide with disparition of the sat’s directory, but that was not always the case. :sweat_smile:

I guess it should be available on the info log level, try to search for grace.
But since it’s now not progressing, you may miss it, if you didn’t redirect logs to the file or if the log is already deleted.

I mean the log entries for deletion of GE sat’s data…

Look for gracefulexit:chore in the log.

2 Likes

See the new command with updated path:
https://forum.storj.io/t/how-to-forget-untrusted-satellites/23821/150?u=snorkel

The new GE version dosen’t show any progress in the columns inherited from the old version and that’s confusing for new SNOs. There could be a counter to count down the remaining days, or in the progress column could be diplayed the passed time like a percentage, from 0 to 100, to mimic the old way.

I would not bother improving experience for people leaving.
Also, to do ge node needs to be 6 month old, hardly “new sno”.

And lastly, since the process is taking fixed amount of time, progress bar would be just counting time. Calendars already do that.

The GE is not always a SNO leaving for good. Maybe some of his nodes start to fail and can’t move the data, or GE a test satellite to make room for customer data.
The simple fact that chooses the graceful way is to be apreciated.

1 Like