graceful-exit.min-bytes-per-second: 17000 it should cancel most of the connections within 2 minutes (depends on the piece size). Don’t try higher values because we don’t want to risk a high error rate. This is only a test to see if the timeout works or if we have 2 bugs here.
@beli something is going wrong now. In the database I can see a 5% error rate for your storage node. That is still Ok to finish graceful exit but it concerns me. Have you lost any data or is it because we are too agressive with the config changes now? In your logfile I noticed a lot of download traffic. I didn’t take that into account. How fast is your internet connection? Did you notice a higher error rate recently?
if you give only 1 posible ip for uploat it is botleneck, as if node is overloaded, or something else you cant trunsfer piece to him.
Sorry I have no idea what I should respond. First of all I did post the testplan in this channel. Some of these tests should give you a good hint what is going on. Second right at the beginning of this tread we have some additional informations about a change that was deployed on Stefans satellite. I have to assume you skipped all that. I don’t know a polite answer so I better don’t even try it. Maybe someone else can jump in because I have clearly no idea at which point in the conversation I have lost you.
I am monitoring this thread from begining. For me it looks like GE working slow, becase Satelite give only one posible endpoint to upload piece. And if node is slow then All is slow.
In my opinion it shold work like Uplink, for each piece, Satelite send 100 posible nodes, and who is faster this will win tha race. Slow nodes will just loose the race. Then it will work fast. Node just send back conformation and address to satelite.
Sorry I am unable to join that part of the conversation and will leave that open for someone else. Your assumption is wrong but this is also the wrong thread to talk about that. Let me focus on the question how we can speed up the transfer without having to explain the design. You can find the design document on github if you are interested.
no data loss
I just notice errors i’m not responsible. (just the ones i see on live log)
Ok that means we have to decrease the satellite batch size back to something closer to 100. The increase to 1000 seems to be too much.
With the next release I am changing a bunch of default values for graceful exit. I am struggling with the minimum speed. The current default is 128 bytes per second. That is way too low. What would be a better value? I don’t want to push it too high. I have no idea what I should put in there.
It could be half of minimum requeaered speed. As writen in Storj.io, when we register for node minimum is 5 mbit upload, so 2,5 mbit will be OK?
In the logs from one of the graceful exit nodes there is a lot of download traffic in parallel. I don’t think we should target for 50% of the bandwidth. But I like the idea of caluclating it that way. How about 1%? That would be 50K per second. In other words 5KByte / s. A typicall ´piece is 2MB. It should finish in about 400 seconds. That sound like a good starting value even for slow nodes.
I like that way of calculating it. I was thinking 1% as well. It will also depend on the other traffic the receiving node has going on. I know of at least a few nodes which may not necessarily meet the minimum download speed requirements, which could slow down transfer when they receive pieces from a GE node. That should not reflect badly on the node trying to GE.
Migrated the storagenode to another qnap - so i have more time to GE.
During no transfer to new node-Phases i have > 70% CPU Load and io-Load?!
Could this be a performance issue? Actual machine is a TS431x2-8g with (imho) enough power?!
I hope it will be configurable, as for example i have 300/300 mbit network, and it too slow for me.
At the moment your node is looking good. No additional errors and I see good progress.
Good progress is relative
I have now following config
The cpu is fully loaded, and the hard disk is acoustically very active. It seem that GE is stressing the hardware very much.
Domain Name Node ID Percent Complete Successful Completion Receipt satellite.stefan-benten.de:7777 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW 3.09% N N/A asia-east-1.tardigrade.io:7777 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 14.18% N N/A us-central-1.tardigrade.io:7777 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S 13.03% N N/A europe-west-1.tardigrade.io:7777 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs 2.01% N N/A
Lets try to find out what is using all the CPU cylces
curl localhost:7777/mon/trace/svg?regex=WalkSatellitePieces > WalkSatellitePieces.svg
It looks like most of the graceful exit functions are not implementing monkit. We can only see that this transfer was 30s but no additional details
How fast is graceful exit with the latest release?