Please enable TCP fastopen on your storage nodes

Same with me sorry to say. But some nodes return the message: TCPFastOpenCookieReqd: 16

when adding --network host i get this error
WARNING: Published ports are discarded when using host network mode
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: sysctl “net.ipv4.tcp_fastopen” not allowed in host network namespace: unknown.

docker run -d --restart unless-stopped --stop-timeout 300
-p 28967:28967/tcp
-p 28967:28967/udp
-p 14002:14002
-p 5999:5999
-e WALLET=“0xf82835c11e78a44C0F5176991b624D8635B8702C”
-e EMAIL=“@gmail.com"
-e ADDRESS="
*:28967”
-e STORAGE=“18.5TB”
–mount type=bind,source=“/mnt/storj2/storagenode-new”,destination=/app/identity
–mount type=bind,source=“/mnt/storj2/storagenode-new”,destination=/app/config
–log-opt max-size=10m
–log-opt max-file=5
–sysctl net.ipv4.tcp_fastopen=3
–network host
–name storagenode storjlabs/storagenode:latest
–server.address=“:28967”
–console.address=“:14002”
–debug.addr=“:5999”
–log.level=info
–filestore.write-buffer-size 4MiB
–pieces.write-prealloc-size 4MiB
–storage2.piece-scan-on-startup=true
–operator.wallet-features=zksync

You don’t need the sysctl command. Just use host network, and make sure fast tcp is enabled on the host, and use the host network.

2 Likes

so like this by removing line –sysctl net.ipv4.tcp_fastopen=3

$ netstat -s | grep FastOpen
TCPFastOpenPassive: 15
TCPFastOpenPassiveFail: 18
finally a result!

1 Like

You don’t need to specify those ports if you use network host.
Look at the last post in this topic:
https://forum.storj.io/t/my-docker-run-commands-for-multinodes-on-synology-nas/22034

Edit:

You may place the exact link to the post, this way future readers would not guess what’s post was the last on the time when you posted a link :slight_smile:

2 Likes

I tried on iPhone/Safari, but it just shows the link to the forum. The only way to link something is to long-press on the topic’s title and link the entire topic. I saw others linking the exact post, but maybe it works only on PC…

Click on
image
under the comment you want link to. And copy the string from a popup.

Then you can modify the URL manually if needed:

https://forum.storj.io/t/please-enable-tcp-fastopen-on-your-storage-nodes/22886/138?u=arrogantrabbit
                        └─── irrelevant text, can be edited or removed ──┘  │    │         │
                                                          topic ID ─────────┘    │         │
                                                          comment index ─────────┘         │
                             you, the fella who linked, for tracking and achievements ─────┘

That last /138 – is the comment index.

4 Likes

I know about that button; I don’t know what I was doing wrong, but now, after this state-of-the-art explanation, it works flowlessly. :sunglasses:
Thank you for +1 to my knowledge level.

3 Likes

After 20 days of enabling TCP fastopen, I don’t see any difference between traffic.
Synology1, TCPf off, 7.09TB - 314GB egress, 634GB ingress.
Synology2, TCPf on, 6.68TB - 303GB egress, 635GB ingress.

2 Likes

You should rather compare success rates before and after enabling TCPFO. Since enabling TCPFO helps you win more races. This would be upload - successful, failed and canceled, with and without TCPFO.
Look here :point_down:

2 Likes

Yes, you’re right. I forgot that those stats account for lost races too.
I will compare the success rate between the 2 machines on a period of time, because before and after is not so precise. The traffic may varry in time, but for the same period, it’s pretty similar for both machines; they are identical, and the internet connection is very similar. They are in the same location too, but on 2 ISPs.

Is there a way to trace the success rate, without changing the log level to info, and without prometheus/grafana?
The debug functions will display this?

prometheus is just getting the information from the monkit endpoint of the storage node. Since you did not exclude that from your list you can use it. I mean the logical next step would be to setup prometheus and grafana but thats up to you.

I don’t want to setup prometheus and co, because I think it creates logs and takes up resources from the node. I don’t want anything else besides the node to use the HDD. Even the log.level is set to fatal.
Maybe I’m mistaken about how Prometheus and Grafana works, I realy don’t know.
Anyway, I activated the debug port, I restarted the node (with stop and rm) and now I access from a PC the specified URLs:

http://192.168.1.201:5999/mon/stats
http://192.168.1.201:5999/mon/ps
http://192.168.1.201:5999/mon/func

I can’t understand any of those outputs.
Does it shows success rates for uploads and downloads?
How to display just that info?
The stats are counted since when? Since node’s installation, recreation (with rm) or restart (without rm)?

You’d usually look at the source code to understand the specific entries. They’re not documented anywhere else, I assume they can change from release to release. The entries in the mon/stats endpoint that I’ve personally found useful are:

  • audit_success_count — a quick peek on the number of audits for node vetting progress,
  • upload_started_count — number of uploads attempted (excluding rejected attempts due to concurrent requests limit) since last node restart, to compare the rates of change across several nodes.
  • upload_cancel_count/upload_failure_count/upload_success_count would probably be used to replace log parsing for the success rate script…
2 Likes

I believe, in the end, all that matters is the amount of data stored in a given time (a month or so) and the egress payed in that time.
I’ll wait the end of the month for comparition.

Here are the results after 19 hours. The uploads seems pretty similar between the 2 nodes, but with downloads, things get very different… and strange. Please notice the huge number of downloads started on NODE 2 vs NODE 1, and the similar number of successful downloads. This results in a very low success rate for downloads, but the earnings are quite similar. Can anyone explain this?

STRINGS:
========
upload_started_count
upload_cancel_count
upload_failure_count
upload_success_count

download_started_count
download_cancel_count
download_failure_count
download_success_count


RESULTS:
========
NODE 1 - TCPfast on:
upload_started_count=130564
upload_cancel_count=64
upload_failure_count=2997
upload_success_count=127502
upload_success_rate=97.65% /19h

download_started_count,action=GET=73097
download_cancel_count,action=GET=2997
download_failure_count,action=GET=144
download_success_count,action=GET=69956
download_success_rate=95.70% /19h

NODE 2 - TCPfast off:
upload_started_count=131193
upload_cancel_count=56
upload_failure_count=3368
upload_success_count=127766
upload_success_rate=97.39% /19h

download_started_count,action=GET=107658
download_cancel_count,action=GET=34173
download_failure_count,action=GET=18
download_success_count,action=GET=73466
download_success_rate=68.24% /19h

After 24h, I checked all nodes and the difference between TCP_fastopen enabled and disabled is none.
The discrepancy in the above results manifest to nodes with TCP_fastopen enabled too. The official info is that the clients didn’t enabled TCP_fastopen in a large percent, so maybe is not so utilised?..
Here are the result; I can’t say why some nodes perform better for downloads than others, but the uploads are pretty similar. Maybe the routers play some part? The best performer is Router5, which is 250$, the others are 50-100$. Each node has it’s own router and subnet. The number just defines a different model.

NODE1, TCPF ON, ISP1, ROUTER1, USR 95.9%, DSR 72.72%.
NODE2, TCPF ON, ISP1, ROUTER2, USR 98.28%, DSR 85.04%.
NODE3, TCPF ON, ISP1, ROUTER3, USR 98.48%, DSR 91.31%.
NODE4, TCPF ON, ISP2, ROUTER4, USR 97.37%, DSR 74.39%.
NODE5, TCPF ON, ISP2, ROUTER5, USR 97.94%, DSR 95.57%.
NODE6, TCPF ON, ISP1, ROUTER1, USR 98.2%, DSR 89.43%.
NODE7, TCPF ON, ISP1, ROUTER1, USR 98.63%, DSR 71.49%.
NODE8, TCPF OFF, ISP2, ROUTER4, USR 97.63%, DSR 72.67%.