I am setting up outbound firewall rules for the storj node and found blocked traffic to agent.tracing.datasci.storj.io:5775. I cannot find info on the forum related to this.
What is the purpose of this communication please?
From config.yaml:
#address for jaeger agent
#tracing.agent-addr: agent.tracing.datasci.storj.io:5775
#application name for tracing identification
#tracing.app: storagenode
You should not set outbound rules at all, they also must be either disabled or you need to add a permissive rule to allow connection from any port of your node to any host and any port in the internet.
This agent is sending the anonymous usage statistic from your node, you may disable it, if you want.
considering the current climate on the internet, I would argue, that everybody absolutely should control any traffic coming in and out of managed networks.
Storj node operator should block all unknown traffic, allowing established and related connections and new traffic, that is recognized. Especially, when the storj software lives its own life and updates itself without node operator intervention required.
It could be argued, that IP addresses can change over time, so it is up to node operator to watch for changes, or allow connection by port without destination IP.
Over last couple of days I have seen no other blocked connections, storj node reports all good so this would be a complete list for now.
Could you please point me to any documentation, that roughly describes, what data are collected by collectora.storj.io:9000 and agent.tracing.datasci.storj.io:5775? I have no problem to support by sending such data, if that helps smoother node operation and software development.
Sounds like you mixed the inbound connections which must be protected with your own connections, which you makes (outbound). Since the node is p2p software, it cannot work normally, if you block or filter outbound connections or block inbound connections to the node’s port. If you want to allow your node working normally, then please disable all outbound filters/blocks, they are meaningless for p2p.
The best documentation which we have for the statistics agents is our Open Source code on GitHub.
But here are also design and readme:
You can disable this tracing in the configuration (or specify a localhost for example), then it wouldn’t send any statistics about the node behavior.
I have not mixed the in and out. I am discussing the new traffic coming out of the storage node. As long as I properly allow related and established incoming traffic, the node is fine. It is a week of testing today. I see no more recognized traffic and the node behaves and operates as expected.
Thank you for pointing me to the documentation. I will consider to let the telemetry and error collection enabled.
Customers initiate connections to your node. It would not have worked if you did not allow new inbound connections.
Telemetry makes it possible to improve product experience, including for you personally: vendor can only address issues they know about. There are no reasons to disable telemetry, other than performance burden of a poorly implemented one (looking at windows OS). I’d argue, disabling telemetry shall be subject to consideration, not enabling.
Please notice, that I am discussing new outbound traffic initiated by the storj node software only.
It certainly is true, such telemetry enables rapid product improvements, but transparency about such enables trust. A simple network communication port table listed on prerequisites page with short description of each could for sure satisfy not only curious, but advanced operators as well.