Truth be told, the storage node isn’t in a position to know if an upload is successful or not. The peer that knows if an upload to a storagenode was successful is actually the Satellite (and the Uplink temporarily by proxy, but it has no consistent memory). A storagenode may think an upload to it was successful, but only the Satellite keeps track of which nodes were actually part of the fastest set. Even if a node never gets a cancelation and by all appearances the upload looks successful, unless the Satellite agrees, that data is considered garbage. The uplink attempts to alert the storage node if it was unsuccessful in a variety of ways, and whenever the Uplink can tell the storage node it lost the race, that is good, because the storage node then can preemptively clean up that data instead of leaving it around and waiting for garbage collection.
So! We have a possible hypothesis (though we still need to collect data to figure out if it’s right). Hashstore has a much smaller critical section that waits for disk activity than piecestore does. Perhaps hashstore is better at receiving cancelation requests than piecestore is? Perhaps piecestore has an unreasonably high success rate because a higher percentage of uploads are false successes? In this scenario, a lower success rate may actually simply mean a more accurate success rate.
So here is what we need to gather and check (and perhaps forum readers can help):
- What is the percent of unsuccessful uploads we actually expect in practice due to long tail cancelation? In theory, this is as high as 30% (!), because uploads to the network upload 70 pieces per segment and only wait for the fastest 49. So, 21/70 pieces are theoretically canceled, but do we sometimes do better than 49? How often?
- How often is very-recently-uploaded data immediately eligible for garbage collection? For hashstore? For piecestore?
- What percent of pieces are considered successful across all nodes? Hashstore nodes, piecestore nodes? It should match the Satellite, right? If it’s higher than what the Satellite tracks as successful, then we have “false” successes.
Basically, a network-wide average success rate of 100% is actually bad. We always upload more than we need to be successful, and, theoretically, across all nodes (some nodes will be “successful” less often than others, i.e. win less races) we should be seeing about 70% piece success across all pieces (49/70). If the network as a whole is reporting average scores higher than 70%, then that means the network is gaining garbage at a faster rate than we expect. 30% of uploads should be considered failed by the nodes, so the nodes can clean that data up instead of letting it sit around unpaid until garbage collection.
Suffice it to say one thing we’ve been noticing on the hashstore-based Select network is less garbage. Perhaps this is why, and perhaps we shouldn’t be afraid of lower-than-piecestore success rates? But this is just a hypothesis, we’ll try to disconfirm it.
Edited to add: an individual storage node can certainly be more successful and try to target a 100% success rate. Any success rate above the network average means that it is more likely to win races than the average node. Any success rate below the network average means that it is less likely to win races. But the network average should very definitely not be 100%, or something has gone wrong and all the nodes are storing way more data than they are getting paid for (garbage). So I would expect the average node even here on the forum is losing races in double digit percents. This should be showing in both hashstore and piecestore. If hashstore says this and piecestore doesn’t, then I’m actually more suspicious of piecestore.