Same piece downloaded 90,000 times in 1 hour

dantidote · June 27, 2020, 12:07am

Yesterday, I noticed my download count spiked higher than I’ve ever seen. The logs show it coming from one piece. I suppose one user could be downloading a file over and over. Seems pretty unusual though. Anyone else see this before?

2020-06-25T07:33:58.766Z INFO piecestore downloaded {"Piece ID": "4WWRGTUKPVMJ7Y7XCSZB3RE4RXHAGOC6NG3WIC7BN2IIJJG7AJNQ", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET"}

jammerdan · June 27, 2020, 4:32am

We need more customers like this.

kevink · June 27, 2020, 5:09am

wow I wish I had customers like that too Gives some nice egress. Hopefully the file was big
Sadly I don’t have anything set up to have statistics over the amount of operations.

SGC · June 27, 2020, 8:20am

so 90k of 2mb files 180gb worth of downloads, so 3gb a minute being like 50mb/s

seems very high unless if you got 500-1000mbit internet, and then still kinda high
but i guess it could be smaller than 2mb… 90k times tho…

well it could be some sys admin getting ready to present his arguments to why his business should move to tardigrade… we are bound to start to see much more weird and unique traffic like this, now that we are getting closer to the 6 month from launch point… this will most likely be the time where corporation will start to look at their data of the limit use they have been testing tardigrade with…

seeing what works and what not and what stuff costs… and now some are ready to scale up, while others are ready to move on to test something else, if the system didn’t meet their parameters…

exciting times for storj … soon we might see some proper customer data

BrightSilence · June 27, 2020, 11:14am

Files can be much smaller then that. They only end up being 2.3MB if the original segment was the max of 64MB. Any file smaller than that or segment at the end of a file smaller than that would result into much smaller pieces, down to something like 4KB minimum. Below that point it’s small enough that the file can be stored in line on the satellite and would be smaller than the metadata would have been for that file.

Mark · June 27, 2020, 7:04pm

Did you see a corresponding spike in your dashboards bandwidth graph?

I hope the customer didn’t have a bug in their system that results in a large bill.

The talk about file sizes got me curious so I made a histogram of file sizes in my blobs folder.

File size2

Raw data. The few 4 and 16MB files seem strange. A customer running a custom client perhaps?
File size, Number of files
1k 127,381
2k 86,534
4k 58,043
8k 42,255
16k 37,384
32k 36,052
64k 35,668
128k 46,914
256k 32,325
512k 45,465
1M 30,853
2M 250,789
4M 3
16M 2

anon68609175 · June 27, 2020, 7:15pm

Hello! Hi, this was my small test

kevink · June 27, 2020, 9:16pm

I guess most 2M files are from testing satellites. It would be interesting to see what real customer data looks like.

Storgeez · June 28, 2020, 1:41am

Very nice graph! What did you use to generate it? Some script or is there existing software for it?

Mark · June 28, 2020, 2:05am

I used this linux command:

find . -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'

that I found at

I copy/pasted the output data of that command into LibreOffice Calc (spread sheet program) and used it to make a chart

SGC · June 28, 2020, 6:41am

speaking of files, if you feel like testing just how bad your system is at dealing with say 1million files…

there this is a great way to generate them, i find copying 1mil files helps me getting an idea about how well the system does at processing lots and lots of files…

like say in the case of zfs… i can literally see my scrub time go up by an hour or two every time i add 1million files to the pool… also copying them can be a bit of a challenge… i think at my present setup i can manage in less than 3 minutes,

tho i am running 512k recordsizes now, the record i think was at 32k recordsizes and took less than 1 minutes to copy the 1mil files on the same pool, sync standard on all of them… have tried at sync always… but throughput kills that for me.

anyways i figured if anyone was interested in just how much of an effect working with big numbers of files have on the system, they are so easy to general… will take a bit tho

right has to remember to add the link

https://www.heatware.net/linux-unix/create-large-many-number-files-thousands-millions/

oh yeah and his math is kinda off… forgets a 0 here and there … i usually make 1mil files so i get a more accurate result… ofc you can manage with 100k files… will most likely put a good bit of strain on your system anyways

dantidote · June 28, 2020, 12:17pm

It’s strange, I do not see an increase in network activity during this spike. Neither on the storj dashboard, or my system metrics.

SGC · June 28, 2020, 2:06pm

i doubt everybody would be affected… i also checked mine and couldn’t see anything… but it may just drown out in the avg when i view daily or weekly… i did try to scan through max, but also no real big spike, but there was a slight increase in activity but it’s always jumping up and down… so that doesn’t really mean much.

if it was just one file / object it would at max be like 29 or 90 SNO’s that would be hit…

Mad_Max · June 28, 2020, 2:31pm

Check dates of these files creation.
It probable are “leftovers” from the previous network version not properly deleted at last network wipe.

Storgeez · June 29, 2020, 1:03am

Thought it was external software, thanks!