Node returns 503 when accessed over http instead of status json

arrogantrabbit · December 26, 2025, 4:36am

I’m checking for “all healthy” flag returned in json when attempting to contact the node over http from a single monitoring cloud host, every 30 seconds.

Recently some nodes started returning 503. They work fine otherwise, including returning allhealthy: true when checked from another instance.

Has the storagenode implemented some sort of abuse blocking that could block my “abusive” monitoring host?

Roxor · December 26, 2025, 4:42am

Do you still see 503’s when polling at longer intervals: like 5min/15min?

Alexey · December 26, 2025, 4:42am

Do you use any kind of proxy/tunnel before these nodes?

arrogantrabbit · December 26, 2025, 4:56am

Haven’t tried. I’ll increase the interval. But it worked for years with these settings. I also have TCP port probes from the same host — and those succeed.

Does not seem to track to the tunnel — some nodes directly connected, some via VPN (iptables masquarading to workaround cgnat)

Alexey · December 26, 2025, 7:13am

Do you also have errors in the logs when this happen?

I suppose that not. Usually 503 are happening when the reverse-proxy is unable to contact the proxied service (so a client-facing part is working, but the needed service is not). I do not think, that the node has the same structure - it’s a service itself and doesn’t include proxy, otherwise you also should see the same error in the node’s logs.
It could be also a web-server - backend chain, but for the node - I doubt it. The only web server is the web dashboard.

arrogantrabbit · January 9, 2026, 6:12pm

You are right, there were no issues reported in the logs. grep ERROR /var/log/storj.log | grep -v "download failed" | grep -v "manager closed" | grep -v "context deadline exceeded" produced nothing.

Increasing interval did not help.

The issues resolved themselves in a few days without me doing anything – not rebooting either of the services (except any possible node updates that could have happened in the meantime – but then all my nodes would update at the same time, I’m not using official updater yet, and yet, the http checks started working one by one over the course of a week or so)

I’m out of ideas, and issue not present anymore, so I won’t think about it until it comes back, and will do more digging with tcpdump.

Alexey · January 10, 2026, 4:02am

I have a guess that a tunnel/proxy/VPN before the node is caused that.
Because otherwise we could have a lot of complains here, if it was a storagenode’s issue. Furthermore, the satellite is unlikely to be pleased to receive a 503 error from the node. However, the latter statement may be incorrect, as the satellite doesn’t use HTTP, but DRPC, and I don’t know if it has an equivalent to the 503 error.

arrogantrabbit · January 10, 2026, 5:01am

At least two of those are connected directly (one Xfinity, one ZiplyFiber)

Maybe indeed http is bad test — as it’s probably deprioritized in favor of actual DRPC.

Alexey · January 10, 2026, 6:08am

Interesting, then I have no idea, except the http service was overloaded somehow. But it should return a 429 error code in that case, so

elek · January 14, 2026, 8:33am

Your reputation DB may have a problem.

Check the endpoint manually with HTTP. The error message supposed to be part of the HTTP response (in addition to the HTTP 503 error code).