Graceful exit onto another node

It may be useful to be able to gracefully exit a node, but specify that all the data should be sent to a specific other node (presumably owned by the same SNO).

This would allow me to decommission old nodes that are no longer viable (old hardware or perhaps too expensive to run because of high W/TB) but basically keep the data and not have to start over and without having to use rsync which of course works but needs careful sysadmining.

In the generic case it would be something like merging two nodes I suppose.

The satellites would have to change all the piece ownerships in the database when the transfer is complete and perhaps audit a little bit more.

3 Likes

I think it’s a nice idea to be able to move a node using Storj’s software. One GE to one node, only it’s free

There is an edge case if the two nodes store two pieces of the same stripe.

Some similar discussion here: Join or split nodes

If you use VMs you could convert/copy the old nodes to the new ones using dd which would be much faster than rsync.

If you are using physical/virtual machines, you can convert/copy old nodes to new ones using “udpcast”, which would be much faster than rsync.

Directed GE could send only those pieces to other nodes. Besides, those won’t exist to begin with if the nodes ran on the same IP before.

1 Like

Yes, but this will not reduce the number of nodes and have a higher risk of failure due to admin error. While this is probably what I will end up doing, there may be cases when I would just prefer for the old node to go away completely, without incurring cost for the network and without actually loosing the data/income.

VMs are indeed very simple to work with and I agree that this feature would apply more to dedikated hardware nodes.

This case can show up in another way too, if the SNO moves a node from one /24 to another, where another node already exists. Then two pieces would be within the same /24, albeit on different nodes.

Thats probably not the end of the world though. Perhaps the satellites can scan their databases for cases like that and eventually issue a repair for the segment then remove the piece from one of the nodes.

Too complicated. What’s benefit for the network?
By the way, Graceful Exit may fail and your node will be disqualified, so rsync looks more safe.

Benefit is reducing costs to SNOs. Rsync is not compliant with storage node T&C.

Rsync is not compliant with storage node T&C.

Wait, why?! I thought everybody was doing it since the beginning of days…

You mean, everyone were rsyncing to consolidate storage to a single medium (banned by T&C requirement that you need to have a dedicated HDD per node), or everyone was ignoring T&C, because it’s outdated?

Because, you know… both of these statements are true.

Well, usage of rsync doesn’t necessarily imply consolidating nodes. It may mean moving to another disk to better balance the load, or to another host residing in a different /24 network.

But you’re right, both statements are true :smiley: and the /24 rule essentially admits that
4.1.4.1. Have a minimum of one (1) hard drive and one (1) processor core dedicated to each Storage Node;
guarantees nothing extra in terms of storage node reliability.

This was what the original question was about.

My original meaning was perhaps a little obscured, but I was thinking about consolidating a few nodes (like 5 at a time max?) into one larger node, in order to decommisson old hardware and simplify maintenance.

This would allow me to keep the data and not cause any additional load on the rest of the network. Of course, pieces from the same segment should be sent elsewhere but they must be a very small fraction so wouldn’t matter much.

this will be difficult, as you can combine nodes from different /24 networks and can happen that pieces from same file in one node, what is forbidden.

Yes, these pieces will need to be sent to some other random node.

:thinking:

Given how long it takes to ramp up a node and start making any money, it would benefit node operators to be able to decommission a node without taking a financial hit waiting for a new node to fill.

You’d guarantee a graceful transition because Storj itself would replicate and verify the content, eliminating user error, and getting a 100% “graceful” success rate as long as the node operator leaves the old and new nodes online until it completes. And you don’t bother other nodes with having to replicate lost content by eliminating the graceful exit failure potential.

Anything that makes life easier and more reliable for node operators should improve reliability and stability of the network as a whole.

As an outsider with a bunch of wasted disk space, a 1Gb symmetric internet connection, cheap and reliable electricity, and ~20-years experience running public-facing servers, I’m thinking Storj could be fun. And in the winter I pay for heat, so electricity is effectively free (the heat the server generates comes right off my heating bill) while in the summer it is rarely hot enough to pay for A/C in my “server room” so I don’t amplify my electricity bill with cooling costs. I have some advantages that might make me a good node operator.

I’m in the “curious” phase more than the “Yeah, this is a good idea” phase, but every time I see Graceful Exit I always see reminders that the process doesn’t work very well and it seems like getting disqualified is likely/normal. Maybe that’s true, maybe not, but that’s how it is presented.

Let us game theory it? If I were a node operator and planning to shut down a node, I’d balance the odds of a Graceful Exit failing and I lose out vs running the node normally to the last possible minute (getting a bit of extra income) and just pulling the plug (guaranteeing the fail that I’m expecting to see anyway).

I’m not actually that much of a jerk about it and I’d do it properly if I could, but it might not be to my advantage to do so.

Does encouraging node operators to just pull the plug benefit the network?

Does rewarding node operators for maintaining network reliability and stability benefit the network? Maybe not enough to justify the development costs and that’s a fair answer.

Graceful Exit may fail, if your node doesn’t have some pieces or they corrupted and have not detected by audit yet, it also possible that your connection is not good (for example, router cannot handle hundreds parallel transfers and drop some of them). In normal circumstances it even could not be noticed, but during stress of Graceful Exit, when (almost) everyone piece should be transferred and checked - these problems can pop up and disqualify your node.

The reward for successful GE is return of remaining held amount.

The suggested feature (to GE to a specific node) is complicated to implement - we still want to have pieces of the one segment being in at least different /24 subnets of Public IPs, so preparing the list with preference of selected node will be more resource consuming than current process, but no benefit for the customers.
I believe that here we likely will need help from the Community to make a pull request for this feature.

However, official software capable of copying/moving a node that is about to fail would be an advantage for the whole network. My first node went down, I didn’t have the skills to save it and Storj paid