If I need to move a node to another server and the only way is through wan, how could I do that?
This scenario could be used to sell a node also, so would this be allowed by Storj/Inveniam? Is there an official point of view?
A bunch of rsync passes. The same way you would move it anywhere else.
Storj has no control over node ownership. If node passes audits and is online - it is a good node. If its does not — it is not. Who owns node is irrelevant.
You can read more in the whitepaper. Every node is designed to be byzantine.
I wouldn’t recommend doing this with a bunch of parallel rsync processes, as that is extremely inefficient and unnecessarily heavy on the disks.
When I need to move a Storj node to another drive, I use tar instead. While the node is still running, I create a tar stream to benefit from strictly sequential reads and to avoid excessive random I/O and metadata churn. After stopping the node, I perform a final transfer using ionice to prioritize disk access.
The same approach works well over WAN.
Example workflow:
STEP 1: Initial transfer while the node is still running
Purpose:
- Sequential read access
- Minimal seek pressure
- No rsync metadata overhead
- Safe because Storj tolerates this (final sync later)
cd /media/lolleg
tar -cf - STORJX123 | ssh targethost “tar -xf - -C /media/lolleg”
STEP 2: Stop the Storj node to freeze the dataset
docker stop storagenode123
STEP 3: Final transfer with high I/O priority
Purpose:
- Copy remaining deltas
- Ensure data consistency
- ionice prevents disk starvation under load
ionice -c2 -n0 tar -cf - STORJX123 | ssh targethost “ionice -c2 -n0 tar -xf - -C /media/lolleg”
STEP 4: Start the node on the target system
docker start storagenode123
Why this approach is preferable to parallel rsync:
- strictly sequential disk access (far lower seek amplification)
- dramatically less filesystem metadata overhead
- predictable and controllable I/O behavior
- scales well with many nodes and large datasets
- works equally well locally and over WAN
- avoids I/O wait explosions caused by parallel rsync jobs
Using multiple concurrent rsync processes for this type of workload is usually counterproductive and often results in higher total migration time and degraded host performance.
If there is no official interdiction for selling nodes, maybe a dedicated thread would be helpful, for sellers to post their announcements.
I imagine this will benefit Storj Inc too, because they don’t have to pay for repair for GE nodes.
So win win.
I imagine you have to open the ssh port in ufw and router.
This would work for linux to linux transfers.
How about Windows to Windows or between Win-Linux?
Open port SSH port (22) on node, and port forward incoming port to 22 on Node internal IP. - just like the port forward/firewall setup for StorageNode.
I would recommend a different port than 22 for incoming port on router - you will soon be hammered by people/bots trying to brute force a connection - and ideally setup fail2ban on node to block attempts to brute force.
I haven’t seen people run parallel rsyncs on the same node. Typically they’re in series: like an initial sync (that may take a long time) to get the bulk of the data. Then stop the node and run a second sync that only has to deal with final changes. But that may have been what you meant.
And…
…there’s not sequential reads from that initial tar (especially for piecestore)… and seeks are going to be all-over-the-disk depending on where the filesystem put things. If you imaged a disk/partition (like using dd/ddrescue) then that’s a sequential read: but tar takes what it can get :). However if you made that initial tar to a temp location… that would be a sequential write. (And obviously rsync works fine when a node is running too)
That certainly works, however it doesn’t do the same thing as a final rsync. Tar will not handle files that have been-deleted-from-source but that still exist-on-destination (but rsync will with the --delete flag). For a large Storj node… where migration could take days over WAN… that usually means the destination ends up with a lot of trash (but that improves with hashstore). And obviously tar has no concept of resuming a transfer if a connection gets lost: it starts over.
But I get the idea: I also use tar for small nodes because you don’t even need to care about downtime. Just shut down, tar, move, untar, and start! I understand why people move large nodes over slow links with rsync though - having the ability to easily resume/repair is a killer feature!
Selling nodes in-the-open would turn into a Terms-of-Service argument. And Storj wouldn’t want to be involved in any way with buyers (potentially) getting scammed. I think what normally happens is someone mentions they’re going to GE some/many nodes… and then they get messaged privately from interested buyers.
And yeah moving nodes is easy enough over the Internet. @GfTmbH is correct that tar is simple: but for large nodes… that may take awhile… the ability to resume with something like rsync is very nice.
If you ask Storj, about things they would prefer not to happen: they will point you to ToS sections saying that they prefer those things not to happen. That is true. Absolutely.
However, at the same time…
If a SNO posts that for some reason they can’t/don’t-want-to run their nodes anymore, and are maybe considering GE… those SNOs may receive DMs discussing potential alternatives. Like pets: it’s better to put them up for adoption if you can’t take care of them anymore ![]()
The main problem with selling is the identity. The seller could screw you by starting a new node with the same identity because is malintented or by mistake. We’ve seen many cases on the forum were the SNO ran the wrong identity.
Or he could sell it to more than one buyer, and again, they end up with dead nodes.
So, yeah, like in the post linked by jammerdan, selling is to big of a risk. In theory nothing and noone prevents it, but it’s risky.
I believe we’ve brought this point up in the past a few times, but my memory is tricky sometimes.
Thinking about it… the only safe way I can see is some sort of confirmation through a second channel that you changed your node’s IP, and untill you confirm it, it can’t transmit or receive pieces. This way, an identity is linked to a particular IP, and can be used only with that, untill you confirm a new IP. It prevents running an identity in 2 locations. But it creates more problems that it solves for all SNO that have to change the IP frequently or don’t use that second channel of confirmation, like the email.
Yeah that’s a risk you could only really manage with reputation: a buyer would probably want to see the seller was a regular in this forum or something.
Like if I saw you were selling a node or two to reclaim space, I would trust that you would delete your copy of the node after it was transferred (and I made sure it ran). But if it was someone who just made their first forum post? No way.
See me edit above…
Neah, people can change in an instant. You can’t trust anyone on a public forum.
This all for $10/month?! Why even bother in the first place? And how much would one be expected to pay for the privilege to have this chore? Can’t be more than a year worth of “earnings”. Likely less than a quarter year, accounting for risks.
Economics of selling nodes don’t make sense.
It depends on the size/age of the node: usually under 2TB/1year it’s not worth it (since a buyer could just “grow their own”). But smaller nodes are usually 3-6months of used-space. Larger nodes can get up around 9+months. (Like a 8TB-used-space node could be 8TB x 1.5$/TB x 9 = $108 in stablecoins). And then the seller can still sell the HDD separately. It can make sense to them if their alternative is GE.
But it depends on a lot of things: it’s more if you’re buying the physical drive, or less if it’s lots-of-small-nodes. Lots of flexibility!
Since the buyer is taking all the risk: there’s definately a limit to what they’ll pay: since the seller could DQ the node at any time ![]()
You made a mistake there. What about the held amount? You have to include that in the price too! ![]()
It is like the projects where you had to buy the hardware and the “licence” to be able to make a buck a month, and people went for it.
Maybe an inspiration for Storj - lets start selling licences to be able to run nodes.
People are apparently starting to get desperate.
Single rsync with delete, you will probably max out your connection if it is in hashtable.
Do this a few time, and then stop the node, and do a final rsync.
This forum is not intended to handle sales, so such topic will be deleted, also because ADs are not allowed.
You may find a lot of online platforms with arbitrage to do so.
And of course, @jammerdan is correct
I think there is a bit of a misunderstanding here about what I meant by sequential.
I’m not talking about physical block contiguity on disk. Fragmentation obviously exists and applies to any filesystem. What I’m referring to is the access pattern.
A tar stream performs a single forward traversal of the directory tree and then reads file contents in a continuous, predictable order. From the kernel’s point of view this results in sustained forward reads, good readahead behavior, and very little metadata churn once traversal starts.
rsync, even when copying whole files, still performs extensive metadata operations: repeated stat() calls, directory comparisons, file opens/closes, and bookkeeping. This creates a much more fragmented I/O pattern, even on a single disk.
This difference is not about scale or “many nodes”. It is observable even on a single multi-TB disk, especially when the system is not completely idle. Over WAN, the effect is often amplified because a steady stream tends to behave better than many small bursts.
To be clear: I’m not saying rsync is wrong or unusable. It works perfectly fine for many setups. My point was simply that for large datasets, a tar stream is an I/O-friendly alternative with a more predictable access pattern. That’s an optimization choice, not a requirement.
The method with tar has several drawbacks:
- No resume of the transfer operation
- Deleted data in the source will remain in the destination
- The last transfer will be a full transfer, not a diff, which doesn’t makes sense in a previous transfer attempt.
- You need to run
rsync --deleteanyway after the last transfer, otherwise your databases could be corrupted due to not deleted temporary databases binlogs.
But you can replace the last transfer with rsync --delete, since you need it anyway, this will reduce the downtime and remove the deleted data.