Downloads only of freshly uploaded data?

I’ve set up some monitoring over my node. In particular I plot the number of uploaded and downloaded pieces per minute and the amount of data written and read from my physical disk. I see that the amount written more or less correlates with the number of uploaded pieces.

I observe solid 20 pieces/minute downloaded which should translate to 40+MB data read per minute. However the amount of read data is next to zero which makes me think that either most of the data sits in the OS cache (my RPi 4 has 4GB of RAM) or that the read request is for an incomplete piece (no idea if this is a thing).

If it’s just a test traffic then I feel it’s not testing the node close enough to what would be a production traffic (I’m sure it’ll read cold data almost always).

This is often the test pattern though there are also ongoing tests that test old data, just not at the same frequency. If you ask me a lot of the test traffic is just to keep SNO’s happy until customer load takes over. But retrievability tests on existing files are being done constantly as well.

It is indeed possible to download smaller parts of a piece. Audits do this for example. They are called stripes.

Thanks for the explanation. While I understand the “make happy” reasoning, I feel it’s doing its share of harm. For more geeky SNOs it’s important to set up (or optimize their setup) in a way that it would handle the real traffic reasonably well. If the current test traffic is only reading hot data then it gives those SNOs a wrong idea of how good their setup is.

And then after a real traffic comes in it might become a surprise when e.g. disks suddenly become a bottleneck and cause e.g. 100% context cancelled errors. Which in turn will make SNOs unhappy.

I see what you are saying. I think the context cancelled issues are much more common on uploads though, which wouldn’t be impacted by a change in download behavior. Additionally if downloads are saturating your disks, your making the big bucks :slight_smile: , that’s a problem I’d like to have.

Other than SSD cache, I don’t really see what you could be doing to improve read performance. And you really shouldn’t be wasting money on SSD cache unless it’s something you already have in place for other uses. But that’s just my opinion.