Speed of graceful exit

littleskunk · January 22, 2020, 5:16pm

Please try graceful-exit.min-bytes-per-second: 17000 it should cancel most of the connections within 2 minutes (depends on the piece size). Don’t try higher values because we don’t want to risk a high error rate. This is only a test to see if the timeout works or if we have 2 bugs here.

littleskunk · January 22, 2020, 9:33pm

@beli something is going wrong now. In the database I can see a 5% error rate for your storage node. That is still Ok to finish graceful exit but it concerns me. Have you lost any data or is it because we are too agressive with the config changes now? In your logfile I noticed a lot of download traffic. I didn’t take that into account. How fast is your internet connection? Did you notice a higher error rate recently?

Vadim · January 22, 2020, 9:38pm

@littleskunk

if you give only 1 posible ip for uploat it is botleneck, as if node is overloaded, or something else you cant trunsfer piece to him.

littleskunk · January 22, 2020, 9:41pm

Sorry I have no idea what I should respond. First of all I did post the testplan in this channel. Some of these tests should give you a good hint what is going on. Second right at the beginning of this tread we have some additional informations about a change that was deployed on Stefans satellite. I have to assume you skipped all that. I don’t know a polite answer so I better don’t even try it. Maybe someone else can jump in because I have clearly no idea at which point in the conversation I have lost you.

Vadim · January 22, 2020, 9:52pm

I am monitoring this thread from begining. For me it looks like GE working slow, becase Satelite give only one posible endpoint to upload piece. And if node is slow then All is slow.
In my opinion it shold work like Uplink, for each piece, Satelite send 100 posible nodes, and who is faster this will win tha race. Slow nodes will just loose the race. Then it will work fast. Node just send back conformation and address to satelite.

littleskunk · January 22, 2020, 9:56pm

Sorry I am unable to join that part of the conversation and will leave that open for someone else. Your assumption is wrong but this is also the wrong thread to talk about that. Let me focus on the question how we can speed up the transfer without having to explain the design. You can find the design document on github if you are interested.

beli · January 23, 2020, 4:30pm

vdsl 100/40
no data loss

I just notice errors i’m not responsible. (just the ones i see on live log)

littleskunk · January 23, 2020, 4:33pm

Ok that means we have to decrease the satellite batch size back to something closer to 100. The increase to 1000 seems to be too much.

littleskunk · January 24, 2020, 10:15pm

With the next release I am changing a bunch of default values for graceful exit. I am struggling with the minimum speed. The current default is 128 bytes per second. That is way too low. What would be a better value? I don’t want to push it too high. I have no idea what I should put in there.

Vadim · January 24, 2020, 10:23pm

It could be half of minimum requeaered speed. As writen in Storj.io, when we register for node minimum is 5 mbit upload, so 2,5 mbit will be OK?

littleskunk · January 24, 2020, 10:40pm

In the logs from one of the graceful exit nodes there is a lot of download traffic in parallel. I don’t think we should target for 50% of the bandwidth. But I like the idea of caluclating it that way. How about 1%? That would be 50K per second. In other words 5KByte / s. A typicall ´piece is 2MB. It should finish in about 400 seconds. That sound like a good starting value even for slow nodes.

BrightSilence · January 24, 2020, 11:04pm

I like that way of calculating it. I was thinking 1% as well. It will also depend on the other traffic the receiving node has going on. I know of at least a few nodes which may not necessarily meet the minimum download speed requirements, which could slow down transfer when they receive pieces from a GE node. That should not reflect badly on the node trying to GE.

beli · January 25, 2020, 6:53am

Mhh…
Migrated the storagenode to another qnap - so i have more time to GE.

Interesting thing:
During no transfer to new node-Phases i have > 70% CPU Load and io-Load?!

Could this be a performance issue? Actual machine is a TS431x2-8g with (imho) enough power?!

Vadim · January 25, 2020, 9:48am

I hope it will be configurable, as for example i have 300/300 mbit network, and it too slow for me.

littleskunk · January 25, 2020, 9:38pm

At the moment your node is looking good. No additional errors and I see good progress.

beli · January 26, 2020, 5:53pm

Good progress is relative

I have now following config
graceful-exit.num-concurrent-transfers: 20
graceful-exit.chore-interval: 15m0s
#graceful-exit.num-workers: 2
graceful-exit.min-bytes-per-second: 1024

The cpu is fully loaded, and the hard disk is acoustically very active. It seem that GE is stressing the hardware very much.

Domain Name                       Node ID                                              Percent Complete  Successful  
Completion Receipt
satellite.stefan-benten.de:7777   118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW   
3.09%             N           N/A
asia-east-1.tardigrade.io:7777    121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6  
14.18%            N           N/A
us-central-1.tardigrade.io:7777   12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S  
13.03%            N           N/A
europe-west-1.tardigrade.io:7777  
12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs  2.01%             N           N/A

littleskunk · January 27, 2020, 12:03pm

Lets try to find out what is using all the CPU cylces

curl localhost:7777/mon/trace/svg?regex=WalkSatellitePieces > WalkSatellitePieces.svg

beli · January 27, 2020, 12:21pm

https://alpha.transfer.sh/6lA8F/WalkSatellitePieces.svg

littleskunk · January 27, 2020, 12:25pm

It looks like most of the graceful exit functions are not implementing monkit. We can only see that this transfer was 30s but no additional details

littleskunk · January 31, 2020, 4:59pm

How fast is graceful exit with the latest release?