Two new blueprints/design drafts seeking feedback: Replacing TLS with Noise and TCP_FASTOPEN

jtolio · January 17, 2023, 9:34pm

Hello everyone!

I have two new design drafts I wanted to share with you and collect feedback on. They are related in that they both eliminate one of our connection handshakes and thus both significantly improve performance.

The first one is the longer, more involved design doc. It describes how we can improve performance by replacing TLS with the secure channel protocol that Wireguard uses (Noise). In terms of implementation it has a lot of details, but once implemented, it should be extremely simple for storage node operators to use - it should just work in all cases that TCP and current connections do, so there’s nothing to do! Upgrades will simply gain this functionality once done.

Noise over TCP (uplink to storage node)

The second one is shorter, simpler, and much easier to implement. Unfortunately, it does require some configuration change and deployment questions for storage node operators. This one requires an opt-in change for storage node kernel settings (at least on Linux, maybe not on Windows, it’s the sysctl command in the design doc) that should be completely harmless in almost all cases. Storage nodes that don’t set this setting will still work just fine, but if most of the network starts enabling this setting, nodes that don’t may start to lose more upload and download races, since nodes that do will complete operations faster.

TCP_FASTOPEN

Please let me know what you think about these changes! We’ve actually implemented both of these already in a test environment, and we were able to successfully make small file transfers in only 35% of the time with these changes, which is like a 2.8x speedup! So we’re fairly excited about rolling one or both of these forward.

Thanks!

Toyoo · January 17, 2023, 9:59pm

Not much time to read now, but have you thought about editing both of these into technical blog posts? They’re written in a style which is very easy to read, yet full of information. They deserve more than just be hidden deep inside a code repository ;-), especially the background/context sections and notes on benchmarks. It would be nice to link to them on social media.

jtolio · January 17, 2023, 10:04pm

Oh, that’s very kind of you to say! Thank you. I wonder if perhaps the social media posts that would deliver the most “oomph” would be ones that ended with “and so we did this and now Storj is faster!” so I suppose I’ve been most interested in getting this shipped first.

But you’re right, this is probably valuable material. I’ll keep that in mind!

BrightSilence · January 17, 2023, 10:38pm

Does this apply only to native installs or docker as well? Does this require settings changes on the host machine for docker installs or can this be done inside the container?

Alexey · January 18, 2023, 5:04am

For docker installs you will use sysctl opts and likely privileged mode for the container, the host should be configured to allow this.

jtolio · January 18, 2023, 12:49pm

I still need to do some more testing with containers to be honest. One thing I’m not sure of is how often the storage nodes in our network run as root inside their containers. I haven’t tried yet, but how do people feel about storage node software attempting to set the systctl flag automatically? I’m not sure how often it would succeed, but if it failed it would just move on, and the operator would have to set the flag with the host kernel manually.

BrightSilence · January 18, 2023, 1:09pm

Yeah, fair point. I was just asking because I run on Synology and any such changes tend to get overwritten by OS updates. Then again, I already have this in a script running at boot time to solve an issue that came up with QUIC.

sysctl -w net.core.rmem_max=2500000

So no big deal, I can just add the setting for TCP_FASTOPEN.

zip · January 18, 2023, 1:48pm

I went, quite quickly, through the TCP Fastopen blueprint. You are mentioning network topology has to support it. I assume most of the SNOs are using NAT and I wonder if the router might be a problem here considering such connections as invalid and closing them prematurely.
Would it be possible to implement some indicator to Storagenode software as you did for QUIC that would help us to understand if it actually works for us when enabled?
I would say sysctl isn’t a problem as we are doing that for QUIC already.

jtolio · January 18, 2023, 2:48pm

This is a really good question.

So, if the router the storage node is behind doesn’t support TCP_FASTOPEN, my understanding of the problem (though this would be good to more rigorously test with a variety of different nonconforming routers) is that the router will simply drop the TCP_FASTOPEN connection establishment packets, which is pretty much the same behavior as if a packet hits the firewall. So, as far as the client is concerned, it will appear like it’s talking and no one is responding.

If a storage node is behind such a router and the network has some percent of clients that are attempting TCP_FASTOPEN, then when the storage node flags support for TCP_FASTOPEN, I imagine the percent of new connections the node receives would drop some percent (probably the same percent of times clients are trying TCP_FASTOPEN), because the router is just dropping the connection establishment packets. Unfortunately, the clients’ operating systems have memory on which peers support TCP_FASTOPEN, so disabling TCP_FASTOPEN wouldn’t necessarily fix the problem for the node immediately. The node would have to wait until client TCP_FASTOPEN support memory flushed out (which does happen over time).

In the case that the node’s network topology is great for TCP_FASTOPEN, upon enabling TCP_FASTOPEN, the node would suddenly start seeing some percent of successful TCP_FASTOPEN requests.

The hard case is when the node’s network topology has partial support (e.g., maybe some routes support it well and some don’t). I don’t know how to even detect this case. If there is a slight drop in new connections after enabling, the node operator may assume that TCP_FASTOPEN was stalling for some routes, but it may also be a natural lull in load. Even more, perhaps the node operator might prefer to leave TCP_FASTOPEN on because, of the connections it is receiving, it is winning more of the races.

So, given all of this, I’m not sure what would be on the dashboard. A graph of the rate of new non-TCP_FASTOPEN connections and a graph of new TCP_FASTOPEN connections? I actually haven’t checked if these values are things an unprivileged storage node can ask the kernel for, so I’m not even sure if this dashboard is possible on an unprivileged node.

QUIC is easy - it worked from the Satellite or it didn’t. I suppose we could extend the TCP_FASTOPEN design to work the same - the Satellite is the first check, and clients do not try TCP_FASTOPEN unless the Satellite had a successful TCP_FASTOPEN?

Open problems with that idea I need to think through:

Maybe TCP_FASTOPEN works for most Uplinks even if the Satellite didn’t! Maybe that’s enough for the SNO to win more races and earn more.
This would mean the Satellite would have to double dial (the first dial indicates support to the OS that TCP_FASTOPEN works between the peers, and the second dial would confirm that it worked), and so this is more resource usage for checkins.
This certainly expands the scope of work for this task quite a bit (the Satellite checking, the database keeping track of which nodes the Satellite was successful with, the dashboard feature of showing whether the Satellite was successful). On the other hand, the graph-based approach is also quite a lift for the dashboard.

I feel like the graph based approach (or even just a counter based approach of number of successful connections) leaves more control with the SNO than having the Satellite keep track of what worked, but yeah, I’m definitely open to feedback about what folks prefer.

zip · January 18, 2023, 3:19pm

Maybe just exporting the percentage via API would be sufficient. We would be able to graph that with Grafana for example and over time would be able to tell if this value is rising, falling etc.
Regarding the detection, if it won’t be possible to get it from the Kernel, maybe modifying Uplink library to indicate in the payload to Storagenode if this connection was meant to be fastopen or not.

BrightSilence · January 18, 2023, 3:26pm

I mean, I’ll look at whatever is presented and make sure it works. But I think the biggest issue with having a setting that needs adjusting is just that a massive swath of (especially existing) node operators will just never know or bother to change it. Isn’t that a big part of what made QUIC not succeed (yet) as well?

If you then also expect SNO’s to interpret numbers in order to know whether they should have it on or off, I think you’re simply asking too much and making the hurdle even bigger. I’d say there needs to be a simple check that tells node operators whether it works (and crucially links to docs to make it work if it doesn’t). Additional info is always appreciated of course. Especially if there are real scenarios where it would partially work. Is there a way we could easily test whether it would work on our network right now?

jtolio · January 18, 2023, 4:22pm

To your point about hurdles, thankfully I think the Noise blueprint will be a piece of cake - no knobs, nothing to change, it will just work for everyone.

For the TCP_FASTOPEN blueprint I think there’s three separate phases. The first phase is internal testing on test networks. We’re in the middle of that but it’s going really well.

The second phase is external testing on the real network. During this phase we’d want to engage with as many SNOs as are interested, and yeah, having the data visible in Grafana is probably acceptable, for motivated SNOs to be able to test and see how often and how well it works. We would not expect that it is widely available on the network though. We’d probably avoid enabling TCP_FASTOPEN on real client software during this phase so that SNOs could enable TCP_FASTOPEN support for testing without fear of missing out on real traffic.

What we learn from that second phase is probably what would dictate what we need to do for the third phase, which is how do we make it as easy as possible for SNOs to enable it in cases when it makes sense, and not in cases where it doesn’t make sense. I agree that reducing hurdles here is the top priority.

I guess my point is it might be useful to separate the tasks we need to be able to start trying this out on the real network from tasks that we need to be able to achieve wide adoption.

It’s definitely worth pointing out that this project has some important differences from the QUIC effort:

The QUIC protocol has no fallback support - either the node accepts the UDP packets and responds, or it doesn’t. Both the client and node need to support QUIC, in addition to all of the middleware boxes along the way. To deal with this, QUIC-supporting clients still dial TCP in parallel for every connection, wasting resources, so that there’s a fallback connection to use if QUIC times out or fails.
TCP_FASTOPEN is useful in that there is no new port configuration, no new firewall management, and gracefully falls back if either endpoint is missing support for TCP_FASTOPEN. If either the client or the node indicate no support for TCP_FASTOPEN, that’s okay, and things will still work even if the other peer tries for TCP_FASTOPEN. Both have to opt in, but neither need to. The only similarity here is the middlebox support needs to exist, and we don’t really know the prevalence in our network of broken middlebox support.

I think we need to get to phase two to even understand that last issue.

KernelPanick · January 19, 2023, 4:36am

I’ll add that if it requires running container with CAP_ADDs, SYS/NET_ADMIN. AKA --privilege mode. Some operators and environments may not allow this. Giving additional permissions to the container could potentially break trust with the host in the event those extra permissions could be used maliciously on the host. Kind of breaking the micro-segmentation barrier on that one.

Hopefully it can remain the same permissions if desired, and only manual sysctl changes required.

Pentium100 · January 19, 2023, 7:06am

Does anyone have any tool to check this? Just like it is possible to check if a port is open etc.
Also, if/when it is decided to use this, I’d like instruction on how to make my docker node support it. I guess it is not enough to just set the sysctl parameter.

littleskunk · January 19, 2023, 11:21am

This is a design document. I would say you are a bit too early to update your node.

Pac · January 19, 2023, 9:24pm

My global feeling about that, IMHO, is that the node software should be as simple to setup as possible.

So I’d tend to prefer the Noise over TCP thing.

Unless your aim is to target more knowledgable and specialized SNOs.
Right now, running a Storj node already requires some knowledge in networking, commandline, docker (unless using a standalone node), static mounts notions and techniques for basic monitoring.

I feel like each added complexity would shrink the number of potential SNOs.

Alexey · January 21, 2023, 4:08am

As I read the blueprints - none of them are mandatory. So less tech savvy SNOs will be able to setup a node. But later, if they want to improve their usage, they likely would learn how to do it.

BrightSilence · January 21, 2023, 6:05pm

Sure, but for network performance reasons it would be best if as many nodes as possible enable this. Otherwise the upgrades will have limited impact for customers.

jtolio · January 30, 2023, 4:39pm

Okay, three new updates:

Noise changes are starting to be merged. As this thread seems to have confirmed, there really should be no impact to SNOs, but if you want to follow along, the Gerrit review topic is https://review.dev.storj.io/q/topic:noise-g2g
The first TCP_FASTOPEN change, as aligned with feedback in this thread, should hopefully go in soon into an upcoming release. You can follow along in Gerrit here: https://review.dev.storj.io/q/topic:tcp-fastopen, and this is the first change: https://review.dev.storj.io/c/storj/storj/+/9251. We’re adding support for TCP_FASTOPEN server-side. We’ll also need to add client-side support, but as discussed we aren’t planning to enable it for customers just yet until SNOs are happy with the tooling and we understand when it should be enabled or not.
Good and bad news about the tooling, bad news first. It appears that client-side code can’t tell if a specific TCP connection used FASTOPEN or not, as per this StackOverflow answer: c - How to know if sendto() with TCP Fast Open actually used Fast Open? - Stack Overflow. That post is from 2014, so it’s possible new functionality has been added since, but I only did a cursory look and didn’t see it. If still true though, what that means is the Satellite won’t be able to tell the storage node if the Satellite succeeded at using TCP_FASTOPEN or not. On the good news side though, it turns out it’s actually really easy to find out if TCP_FASTOPEN is working on the server-side, which is actually what we want anyway! Perhaps we can add the results below to the Node dashboard. Details below:

SNOs can find out if TCP_FASTOPEN is working by running:

netstat -s | grep FastOpen

The output may look like this:

    TCPFastOpenActive: 3
    TCPFastOpenPassive: 142
    TCPFastOpenCookieReqd: 1

TCPFastOpenActive is the number of times the kernel participated successfully as a TCP_FASTOPEN client, whereas TCPFastOpenPassive is the number of times the kernel participated successfully as a TCP_FASTOPEN server. TCPFastOpenCookieReqd is the number of times a client tried to ask for a TCP_FASTOPEN cookie for future use. If you get non-zero TCPFastOpenPassive, then everything is good to go.

Recall that you may need to run sysctl -w net.ipv4.tcp_fastopen=3

You may want to do some debugging here, so I made a small Python TCP_FASTOPEN test utility that will function as both a client and a server. For a correctly set up server, ./fastopen.py will increase the client and server counters in the netstat output.

Help needed! I haven’t tested any of this on Windows yet. How does this work on Windows?

BrightSilence · January 30, 2023, 7:42pm

Seems simple enough. Will definitely give that a try later. Just to check, for complete testing we ideally want to set up port forwarding and test this with the client running external from our network, correct?