As I was just seeing a “File does not exist” log entry on a regular GET request one of my nodes, I was asking myself what is happening with that information.
Will that be reported back to the satellite? If not, wouldn’t it make sense if that information was reported to the satellite either by uplink or the node. The satellite then could use that information for repairing and or auditing additional to what it is already doing.
As far as I know, only GET_AUDIT and GET_REPAIR can affect the audit score. The uplink would not see it in an info logs level, but perhaps can see it in the debug logs level.
What I mean is not necessarily to affect the score, but to pass the information to the satellite that there is an issue with a piece, that it was not able to download it due to error “File not found” so that the satellite can act on it. Action could be to audit the piece directly or to put it in the audit queue. Or to mark this piece a piece that needs to be repaired and don’t select that node for that piece again.
It would be really tough to do this well. Why would the node tell on itself? That just seems like a bad idea. And if the uplink reports it, then how do you prevent people from using that to damage the reputation and take down nodes?
All valid arguments. However, if such a report only triggers the satellite to verify the piece through an audit by itself, I think the risk of a node’s reputation being falsely damaged by a malicious actor is relatively low.
Right now my logs are being flooded with “unable to delete piece” messages. Which you could probably argue could be skipped for this check. But if a node would actually have lost the data, it would start flooding the satellites with these messages on normal GET requests, significantly impacting their performance. It’s a lot of additional load that isn’t necessary as the current audit/repair system has already proven to be plenty reliable in ensuring data security without that additional load.
Maybe it could provide an additional channel for even more piece integrity and availability and faster resolution. Instead of relying on AFAIK more or less random checks to detect issues, the satellite would be directed to specific problematic pieces. This approach could also prevent clients from attempting to access non-existent pieces.
For instance, I recently had a lucky piece that was being downloaded at a rate of like 100 times per minute and more. If that piece were to become unavailable, the satellite would continue to send hundreds of clients to the node, resulting in unnecessary and unsuccessful requests. If the satellite would learn fast that the piece does not exist, it could start repair or provide other ips to the requesting clients, serving them better with a higher level of quality.
You are right, I don’t mean these types of messages that’s why I also considered uplink to report back to the satellite. I am specifically talking about the scenario where there is a clients attempt to download a piece and the node cannot provide it because it does not have it.
If there would be a sudden surge of “file does not exist” messages from a particular node on get requests, it likely indicates a problem with that node and its pieces availability that the satellite should be notified about and could take action on. This could provide an additional layer of piece problems actually reported complementing the existing auditing mechanism that finds piece problems randomly.
This is a good idea and we’ve talked about it before. Any data we get about a node being unreliable can be useful. But the downsides that have been mentioned do make it more complicated to implement. Hopefully we’ll be able to implement that at some point.