I have 2 nodes on 2 different servers and they are both in the process of graceful exit from last satellite (us1.storj.io:7777) since about 24 hours ago .
However, the issue is only 5% transferred and total size of data left is ( 470GB & 660GB ).
Both of my servers are capable of transferring at 100mpbs speed but they are only utilizing around 3-4mbps, and cpu usage is less than 3-5%.
I am wondering if I can change graceful exit settings while the process is started so I can utilize 80-100 mbps transfer speed and manage to transfer all data within 48 hours.
My dedicated servers specs are ( 2 core - 4 threads cpu, 4-8 GB of ram, 100mbps connection speed )
These are my config.yaml settings, please let me know what to change and which settings I need to remove this symbol from, ‘#’ .
# in-memory buffer for uploads
# filestore.write-buffer-size: 128.0 KiB
# how often to run the chore to check for satellites for the node to exit.
# graceful-exit.chore-interval: 1m0s
# the minimum acceptable bytes that an exiting node can transfer per second to the new node
# graceful-exit.min-bytes-per-second: 5.00 KB
# the minimum duration for downloading a piece from storage nodes before timing out
# graceful-exit.min-download-timeout: 2m0s
# number of concurrent transfers per graceful exit worker
# graceful-exit.num-concurrent-transfers: 5
# number of workers to handle satellite exits
# graceful-exit.num-workers: 4
Sorry for the long post and thank you very much for your help.
AFAIK you cannot influence the speed, it is managed by the satellites, so you have to wait until it is finished.
I think I can increase the number of files transferred at the same time which will result in higher transfer speed.
You can try to increase number of parallel transfers and number of workers, however, it will increase the probability to be disqualified: if your router is not capable to handle hundreds parallel transfers, it can start to drop connections, and transfers will fail. With 10% failed transfers your node will be disqualified.
So if you going to increase these numbers (you also need to save your changes and restart your node), please monitor number of failed transfers.
If you notice errors like “no route to host” or great number of “i/o timeout”, you need to decrease what you have increased and restart the node and perhaps reboot your router.
I managed to test some settings and increased transfer speed, my only issues is that it sends files in burst like 40 mbps for few seconds then idle for a minute and then send more files. How can I push files at sustained speed or without any delays?
You can try to decrease this parameter (but there is probability that satellites will rate limit (ban) your node if it would be too small):
--graceful-exit.chore-interval duration how often to run the chore to check for satellites for the node to exit. (default 1m0s)
Looks like there are way too many situations where a node trying to help the network by graceful exiting end in disqualification…
Feels like satellites should be more flexible with how a graceful exit goes: even if there are some failed transfers, so what? all transfers that went well are that much that won’t have to be repaired. It seems like satellites shoot themselves in the foot by disqualifying exiting nodes.
Besides, why is the graceful exit process not auto-adjusting its speed to whatever is available (bandwidth and number-of-connection wise)? what is the point of waiting (
graceful-exit.chore-interval duration) between transfers?
I see why one would want to lower the speed of a graceful exit procedure if they need to have some bandwidth or connections available on their router if it gets saturated, but right now it seems like it’s the other way around: the graceful exit is slow, unless fine-tuned which could disqualify the node…
The report of the reason cannot be trusted. It could be “piece not found” or “piece corrupted”, but node could report something like “remote node is overloaded”.
Each failed transfer will be retried with another node four more times before considered as failed.
So if even after 5 attempts the transfer of the same piece is still failed, there is something wrong either with the piece or with the exiting node or your network.
The GE will be considered as failed if there are 10% of failed transfers.
You could try to implement such a logic. I believe this adjustment should not be automatic - only the owner of the node knows capability of their network and hardware. Are you sure that everyone is desire to be without an internet and high resources usage on the host for weeks?
Accept signed reports from destination nodes that previous batch is transferred and give time to the satellite to prepare the next list of nodes.
Thanks for the detailed answer @Alexey.
You do bring up many valid points.
My older 2 servers expired before completing the transfer, however I started graceful exit for another node like a week ago and I noticed something interesting.
My node had like 1.7TB of data stored, but when I started graceful exit on all satellites, graceful exit from all nodes except of one finished within 1 day. us1.storj.io satellite didn’t and it transferring data at slow rate.
Also another strange thing is from vnstat stats only 150GB of outbound bandwidth were consumed, however around 1.3-1.4 TB of nodes data were deleted after they finished graceful exit!
Even us1.storj.io satellite had like 0.6-0.7TB of data before graceful exit, and after I started the process and when it r reached like 5% completion I only found maybe 350GB left so where the rest of data went!
Perhaps these pieces already exist in the network, so the graceful exit finished earlier.
Pieces are transferred by your node, not the satellite, so it doesn’t involved in pieces transfers, only to give your node a list of destination nodes.