Too many errors after 1.47.3 update

My nodes are running fine, Win and NAS Docker, no Internet or power drops. QUIC OK. They always showed Suspension and Audit 100% on all nodes, and above 98-99% Online.
I always had:

ERROR	piecestore	upload failed

around 20-30/day for 2,5 TB of data stored, and some

ERROR	piecestore	download failed

a few per day.
After the update to 1.47.3, the upload failed errors increased 5x. I see 167 errors on 29.01 for one node, from 35 errors on 27.01. The other node has the same 4.5-5x increase in errors. They are on different subnets. What is the problem? Is this normal?

What’s the reason for failing uploads and downloads?
If it’s “context canceled” errors or similar, then it’s a simple coincidence.
You can increase this failing numers on any version - just restart your node, it will start a filewalker, and if your disk is slow (i suspect that), your node will start to fail uploads and downloads because your disk is too busy.

That does not look like an abnormal amount for one hour. These types of errors have always existed. I don’t really see cause for concern here. This could simply happen due to different usage patterns and may increase and decrease based on customer behavior.

1 Like

Before the update there were fewer…
I checked my third node that is just 1 week old, and has the same errors from the same satellite.
Why all errors come from the same satellite on all 3 nodes?

Probably a customer on that satellite uncleanly interrupting connections before they are done. Or a customer with high latency to your node.

1 Like

Do you use any kind of monitoring? If you turn on the DEBUG endpoint you can check actual metrics either with prometheus or manually. It can be useful to check the ratio of the started / finished downloads instead of the number of log events.

This is always close to 100% for me (~ >99%) running node from a datacenter…

download_success_count{field="value"} / download_started_count{field="value"}

I don’t know what and how to use monitoring. I just look at the log files. So… I don’t use monitoring and I don’t know what is prometheus. I can follow guides step by step, if any available.

1 Like