I need help with graceful exit

mauricio · May 21, 2021, 9:48am

So, I am trying to graceful exit a few nodes running using docker on Ubuntu.

The graceful-exit HOWTO suggests that I use the storage node command but I am unable to find this command on my system.

So what do you suggest I try? Can I gracefully exit my nodes using the web interface?

Thanks.

Stob · May 21, 2021, 2:10pm

Hi @mauricio
Presumably you are using the guide here - Graceful Exit Guide – Storj

As you’re running docker on Ubuntu you should execute the ‘exit-satellite’ command with the same parameters as your current docker run command. Then enter the satellite domain names one at a time, not all of them at once.

NotPaidForBugReport · May 22, 2021, 8:56am

You can exit the satellites one by one or all at the same time. The satellites you requested graceful exit for will need a few hours to create the list of pieces that need to be transferred. Don’t expect high traffic immediately.

After 12hrs 0.00% and no transfers

NotPaidForBugReport · May 22, 2021, 11:38am

Also “graceful-exit.num-concurrent-transfers: 100” (original value 5) not working

Alexey · May 22, 2021, 8:56pm

How did you come to this conclusion?
What the satellite, how many space is used by its customers and what errors in the logs?

NotPaidForBugReport · May 22, 2021, 9:44pm

There are almost no errors, but those that are on the receiving nodes.

ERROR piecetransfer failed to put piece {“Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Piece ID”: “7YAZQFSXKLLJ46G2DGEJR6QJRMNMGZHHEJVOXZ25WZJJJLZSKGSA”, “Storagenode ID”: “124QceJhGidrbtAEnJ8zW4UDoKkngbqGXCZu9GABWn8h99Y7bTP”, “error”: “piecestore: rpc: context deadline exceeded”, “errorVerbose”: “piecestore: rpc: context deadline exceeded\n\tstorj.io/common/rpc.TCPConnector.DialContext:92\n\tstorj.io/common/rpc.Dialer.di
alEncryptedConn:186\n\tstorj.io/common/rpc.Dialer.DialNodeURL.func1:107\n\tstorj.io/common/rpc/rpcpool.(*Pool).get:90\n\tstorj.io/common/rpc/rpcpool.(*Pool).Get:110\n\tstorj.io/common/rpc.Dialer.dialPool:152\n\tstorj.io/common/rpc.Dialer.DialNodeURL:106\n\tstorj.io/uplink/private/piecestore.DialNodeURL:49\n\tstorj.io/uplink/private/ecclient.(*ecClient).dialPiecestore:65\n\tstorj.io/uplink/private/ecclient.(*ecClient).PutPiece:219\n\tstorj.io/storj/storagenode/piecetransfer.(*service).T
ransferPiece:149\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func3:97\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

But the process itself is sooooooo slow: eu1.storj.io still 0.00% after 24h, saltlake.tardigrade.io ~2-3% (used 500-600 GiB storage per node).
My server: 2E5-2689, 2Intel i350 (8rx/8tx queues), 1Gbit/s guaranted chanel (colocation).

John.A · May 22, 2021, 11:05pm

It takes time.
My first took several days

NotPaidForBugReport · May 23, 2021, 7:44am

In general, the average output speed should be at least 50% (500Mbit/s) of the bandwidth of my channel. And if the speed is lower, then this means either the satellites are bad, or the nodes cannot work at normal speed.

mauricio · May 23, 2021, 7:46am

So, I gave your suggestion a try and I got the following error message.

“You are not allowed to initiate graceful exit on satellite for next amount of months:
Error: You are not allowed to graceful exit on some of provided satellites”

I find this message quite a surprise as all my nodes were created around February 2020.

Any suggestions?

Alexey · May 23, 2021, 8:05am

This is likely US2 satellite, it’s 4 months old, so no one can exit from it gracefully at the moment.
You can take a look into your logs, there would be exact date when your node would be eligible to exit from the satellite.
You can also add --log.output=stderr option to the graceful exit command and see these messages right on your screen.

NotPaidForBugReport · May 23, 2021, 8:09am

Alexey, why impede exit and artificially low speed? This is pretty bad! (not about us2)

Alexey · May 23, 2021, 8:11am

I have no information about any such limits.
I pinged the team though.

The transfer is happening between your node and destination nodes, the satellite is not involved in data transfers. So, if there are limits, they likely on destination node’s side. Or your router can’t handle the load.

I have had a ticket where operator was unable to exit successful with Mikrotik router. I did not deep into details of the Mikrotik configuration, but when the operator connected their PC directly - the GE was finished in a few hours.
So, I would suggest you to try the same.

Errors was exactly like yours, also - “no route to host”

mauricio · May 23, 2021, 8:35am

Thanks Alexey.

The error message goes away if I avoid exiting from US2.
I’m off to repeating this procedure for my other nodes.

Thanks

NotPaidForBugReport · May 23, 2021, 8:37am

If the limits depend only on the receiving nodes, then they are made of shit and sticks. My server can send data much faster and in 100+ parallel transfers. Also I dont use Mikrotik.

Alexey · May 23, 2021, 8:47am

You are welcome!
I would also suggest you to use zkSync (L2) to receive payout, if you see that the owed sum would be less than 4x transaction fee for ERC20, because payouts on L1 (Ethereum) are subject of Minimum Payout Threshold. There is an emergency payout though, when you exited from all satellites or disqualified on them and doesn’t have other nodes with the same wallet.
In general, your node will be disqualified on US2 after two months of offline, but this is mean that your payout on L1 could be held until that.

mauricio · May 23, 2021, 9:08am

I have been farming Storj since the v2 days. I have more than 200Tb of available space, speedy fibre up an down links, and almost perfect uptime.
Over a year of farming v3 has yielded 7Tb of data.

For me, farming Storj simply does not make economic sense since v3.

I conclude that Storj implements a centralized system with many arbitrary restrictions and limitations that uses blockchain technology for payment.

All the best. If I get my exit-fees great if not, so be it.

Alexey · May 23, 2021, 9:27am

I suppose in the one location. We treats all nodes in /24 subnet of public IPs as a one node, because we want to be decentralized as much as possible.
So this amount of space is unlikely will be filled for a long time.
The Community proved that the maximum used space for the one location could not be greater than 20TB:

gabedealmeida · May 25, 2021, 2:11pm

After speaking to some of the engineers at the company, we suspect that you might be having bufferbloat problems.

mauricio · May 25, 2021, 4:29pm

Hi Alexey,

Exiting is progressing very slowly in my case as well. (One satelite is at about 5% and I started exiting days ago).
Is there anything I can do to speed this up?

Thanks

Alexey · May 26, 2021, 9:29pm

Yes, there are several options, but changing them is very dangerous operation:

After config change you should save it and restart the storagenode.
If you increased concurrency, please, monitor your logs for transfer failing, if the number of failed transfers increasing, then you need to reduce the concurrency.