Same piece being downloaded multiple times

I’ve occasionally noticed in my logs that the same piece appears to be downloaded multiple times over the course of several minutes. Is this a bug somewhere, or expected behavior? If it’s expected, what’s actually happening?

2019-09-03T00:19:04.527Z        INFO    piecestore      download started        {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:19:08.282Z        INFO    piecestore      downloaded      {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:19:55.749Z        INFO    piecestore      download started        {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:19:57.548Z        INFO    piecestore      downloaded      {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:20:40.079Z        INFO    piecestore      download started        {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:20:50.778Z        INFO    piecestore      downloaded      {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:21:46.699Z        INFO    piecestore      download started        {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:21:48.181Z        INFO    piecestore      downloaded      {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:22:44.830Z        INFO    piecestore      download started        {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:22:46.159Z        INFO    piecestore      downloaded      {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:23:59.591Z        INFO    piecestore      download started        {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:24:02.300Z        INFO    piecestore      downloaded      {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:24:54.654Z        INFO    piecestore      download started        {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
2019-09-03T00:24:57.330Z        INFO    piecestore      downloaded      {"Piece ID": "25NTUH7DC7ZYLFMMTIJSDXOTDVTLMY7HBYV5D5YEH3GFRDIC2IMA", "SatelliteID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}

At the risk of sounding obvious: customer actually downloading the same piece multiple times…?

2 Likes

Hmm, my understanding was that upload/download was from the perspective of my node, e.g. “download” is my node acquiring content and “upload” is my node uploading the content at the request of the user. At least, based on my observations, the amount of storage seems to go up when “dowload” happens and my egress traffic goes up when “upload” happens (at least some time later, when the roll-up occurs).

I also have many more failed upload requests than download requests, which would make sense given my understanding of what these terms mean because I have much more downstream bandwidth than upstream, so it would make sense to me that my node would acquire content better than it delivers content. I would expect to see a lot more errors where my node is unable to deliver content faster than the required number of other nodes.

Storgeez is right.

U are the hdd.
So upload is uploaded to your hdd and download downloaded from your hdd

2 Likes

Interesting, this is a bit confusing as I’d expect the terms in the log on my node to be from the perspective of my node. Thanks for the clarification.

This makes me wonder why so many upload requests are failing when I have 400Mbps downstream bandwidth…

I thought that att first to but its the other way around😀.

You might just be to far from the sattelite and the file is downloaded from other nodes faster than yours

2 Likes

Now you’re getting the terms backwards, don’t you mean uploaded to other nodes faster than mine? :slight_smile:

I wonder if it might be worthwhile to change the log entries to “storage” and “retrieval” to avoid this confusion?

Haha you are right. Uploaded was the right word😀.
Im just tired. On My first cup of coffee this morning.

You could proposed in https://ideas.storj.io/

From my understanding the terms were chosen to be ALWAYS specified from the point of the customer to avoid confusion. Since they are unified, once you establish that they are used universally, it is always clear what you are talking about. Different points of view are confusing.
They used to call writing software onto microcontrollers “downloading onto”, which would imply the controller initiates the transfer, which is impossible. Now I see some use the proper “upload” instead.

Also the log has PUT and GET at the end, which is STORE and RETRIEVE basically.
The only thing missing is deletes, they don’t say DELETE.

2 Likes

Right. It might be helpful to spell this out somewhere in the documentation. There’s a lot of terms in the log and dashboard that don’t appear to be documented (at least anywhere that turned up in a Google search) so it’s up to the operator to either ask or try to figure it out.

I assume these are the HTTP verbs. In this case I would assume that the satellite initiates the call… but if the satellite gives the node some kind of directive and the node makes the HTTP call then the terms would again reverse their meaning.

That’s the problem I was running into… it’s all a bit ambiguous unless you know exactly what’s going on.

Ideally the log would eventually only be something you look at if something is wrong. The terms are meant for debugging much more than to be an actual user interface. Documentation would be nice, but it’s a matter of picking priorities.

Most of the traffic happens directly from the uplink (customer). The uplink initiates the transfers, the satellite merely tells the uplink what storagenodes to upload to.