Lots of "uploads cancelled" after 12h of downtime

TheMightyGreek · July 4, 2020, 12:44pm

Just a question because I’m not sure if what’s happening is normal or might be a sign of an underlying problem.
So I started a node about two months ago with a 1Tb HDD on an old PC, I then switched to an odroid HC2 to minimize power consumption (electricity is pretty expensive here in Switzerland). Once the node got vetted I had a fairly good upload rate filling up my node (30-40Gb per day with spikes to 120Gb).
Yesterday I got a new Seagate IronWolf 8Tb HDD and migrated all my data on that hard drive. Because the odroid only has one sata port I copied the data with rsync on another computer (node was offline during that time. When I tried to restart my node on the odroid it wouldn’t boot and after a little bit of research I ended up re-flashing the SD card and reinstalled docker. Now my node is running but I get very few successful uploads (less than 5% looking at the logs) when I used to have a 70% success rate before the migration.
So my question is, could that be linked to the fact that my node has been offline for roughly 12h or is it an indication that something is up ? I also checked the logs and I didn’t get any errors.
Sorry about the long post but I tried to be as specific as possible.

Cheers,
Gab

nerdatwork · July 4, 2020, 1:04pm

Upload canceled messages are normal.

Then you are fine.

Don’t be sorry for that. Most of us here prefer

TheMightyGreek · July 4, 2020, 4:36pm

Nice, however is there an explanation for that drop in accepted uploads ? I mean if I understand the storj system correctly that means I “lose the race” way more often than before switching hard drives. The network configuration is exactly the same and I used to run an old and crusty 1Tb HDD I got with the computer for 50 bucks so the seagate one should be way better.

nerdatwork · July 4, 2020, 4:45pm

There is currently a bug which shows accepted uploads as upload canceled for which an issue is already open.

github.com/storj/storj

Too many canceled uploads

opened 03:25PM - 12 May 20 UTC

closed 06:16PM - 22 Jul 20 UTC

littleskunk

Bug

My log is full with upload canceled but my used space keeps increasing very fast…. I noticed that even repair uploads are canceled which makes no sense because there is no cancelation. ``` 2020-05-08T21:26:14.997Z INFO piecestore piecestore/endpoint.go:316 upload started {"Piece ID": "26H22TD6PI5NQUM47PQ7JRHSLMADCBF5SL2B3M6UR6QXFVTN4UEQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT_REPAIR", "Available Space": 5689929559552} 2020-05-08T21:26:15.000Z INFO piecestore piecestore/endpoint.go:300 upload canceled {"Piece ID": "26H22TD6PI5NQUM47PQ7JRHSLMADCBF5SL2B3M6UR6QXFVTN4UEQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT_REPAIR"} ``` What is the difference between `uploaded` and `upload canceled`? Why do we print out different log messages? I would like to optimize my storage node to offer the best performance. Upload canceled sounds like something I should try to avoid. -> We need better logging.

TheMightyGreek · July 4, 2020, 5:04pm

Interesting, strange that I didn’t have that issue before or at least I didn’t notice it.
I just copied my log to a different location and started a new one so I get “fresher” averages. I’ll keep track of bandwidth use and successful uploads for the next few few days to get an average piece size. If indeed successful uploads appear in the log as failed uploads I should get an unrealistically high average size piece. Thanks for the quick reply and I’ll post an update when I get a significant sample size.

EDIT: My upload success rate is about 4% right now which is lower that I’ve ever had it.

TheMightyGreek · July 5, 2020, 7:08am

As of now I have 60 Gb of new data on the node for roughly 1700 pieces downloaded. That gives me an average piece size of 34,27 Mb, any thoughts on that ?

SGC · July 5, 2020, 9:21am

might be a vetting thing on a new node… usually the most prominent piece size is something close to 2mb, however piece sizes are not fixed… it may simply be storjlings using the vetting of new nodes to test performance of different piece sizes across many different systems…

basically you shouldn’t need to worry about piece sizes, as it is to a SNO almost irrelevant…
larger piece sizes might help with IO i think my node has like 6million files and when one gets to those numbers… some filesystems can run into trouble… atleast when it goes into the 10’s of millions and such… for every 1mil files i add to my pool’s it takes about an hour longer to scrub the pool to verify it’s consistency.

would be interesting if disk latency actually affected the log upload successrates… maybe trying to move the docker log would be a fun little experiment…

as you can see when we compared bandwidth in the bandwidth utilization thread…
people with very low successrates will have nearly the exact same bandwidth utilization as somebody with high upload successrates…

so that really confirms that they don’t matter at all… and some SNO’s have also tracked the pieces and in almost all cases cancelled pieces are actually stored and thus paid…

TheMightyGreek · July 5, 2020, 10:06am

My conclusion is that many of the cancelled uploads are in fact successfully uploaded to my node as @nerdatwork said. Interesting that I didn’t have that bug before reinstalling the node, or at least it wasn’t as apparent.
Thank you guys !

nerdatwork · July 5, 2020, 11:14am

When you physically check the pieces they are ~2.3 MB each.

BrightSilence · July 5, 2020, 12:53pm

That’s a maximum, they can be a lot smaller, but not any larger than that.

They are. I’ve looked into this when my rates started dropping. Mine are around 25% success now, yet all pieces I checked that were supposedly canceled ended up on my node anyway. You can ignore the upload canceled messages for now. They’re wrong. It’s merely a logging issue, not an actual issue with your node.

May I ask, how did you acquire the numbers you provided? As the result of your calculation is an impossibility, I’m wondering where the original numbers came from.

TheMightyGreek · July 5, 2020, 1:13pm

What I did is that I tracked the amount of data in my hard drive overnight with the dashboard and it ended up being around 60Gb. I also tracked how many pieces were successfully uploaded (according to the log) with successrate.sh and found that approximately 1750 pieces were reported as “successful uploads” in the log. So dividing 60GB or so by 1750 gave me 34.27 MB. I figured it was the easy way to see if pieces that were logged as “cancelled uploads” were in fact ending up on my hard drive. Turns that there are lots of them given that the maximum size of a piece is 2.3MB.
According to my rudimentary calculations the number of accepted pieces is more like 18’000 if we assume that all the pieces are 2.3MB (which they are not).
So about 90% of my successful uploads appear as cancelled uploads on the log…
I guess that undermines the utility of successrate.sh, at least the part about uploads.

EDIT: I just passed the cap of 1TB of data stored on my node so I guess things are going well haha

BrightSilence · July 5, 2020, 1:16pm

Ok that explains it. Both numbers are actually not accurate. The used space on the dashboard is only updated once or twice a day. So unless you average over a longer period of time it really depends a lot on when you looked at it. And as previously mentioned most if not all of the canceled uploads end up on your node anyway.

shoofar · July 5, 2020, 1:19pm

I am new to this (my node has around 10 days), so it might be wrong answer.
I think you are mistaking uploads with downloads.
Downloads/ingress are the stuff that puts data on your node.
Uploads/egress are the stuff that is going from your node to users.
And my inclination is that you switched to bigger HDD, but you also switched to slower processor (less power consumption) which might not be powerful enough to send data in timely manner from request for uploads.
That’s what I would interpret from your info here.

I would suggest to try move that new drive back to previous computer and check if this improves results.
But as I am saying, it’s just speculation, cuz I am new to this.
Regards!

Alexey · July 5, 2020, 1:21pm

The other way around. All wordings from the customers point of view: downloads from the network, i.e. egress from your node, uploads to the network, i.e. ingress to your node.

TheMightyGreek · July 5, 2020, 1:21pm

StorjDashboard
I was referring to this “used space” it’s not very accurate (I don’t know if trash is taken into account) but I can update it by refreshing the page.

BrightSilence · July 5, 2020, 1:50pm

My bad, I thought you were referring to the other graph. I guess this one does update closer to real time.

Alexey · July 5, 2020, 1:53pm

I guess - yes

I think my graph will show the truth

The used space do not include trash: 998.81 + 0.87054 + 0.31849 = 999.99903 ~ 1TB

TheMightyGreek · July 5, 2020, 2:04pm

True I noticed that earlier today but I guess my brain didn’t completely process the information haha
By the way cheers to the storj team the new dashboard looks amazing !

moby · July 7, 2020, 3:13pm

Just wanted to post an update on here. It seems like after a newly uploaded piece is committed on a storagenode, the uplink (or satellite) often cancels the connection immediately afterwards, so that even when the piece is successfully put on the disk, the storagenode logs out “upload canceled”. This is purely a bug with logging/metrics, and we have a fix for it here: https://review.dev.storj.io/c/storj/storj/+/2234