Satellite Successfull Download-Rate at 50%

Until today I monitored my logs manually trough docker logs --tail 20 storagenode.
What I’ve seen is, that my satellite uploads work just fine (upload starts and finishes properly).
Only my satellite downloads seem to have a problem, because all Downloads I’ve seen have failed.

Summary

download failed {“Piece ID”: “HWPBRJZJQSSHKTHCUZ4ET3U5ICE5FYQGZ36L65JISIT5TVEJ7F6Q”, “Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Action”: “GET”, “error”: “piecestore: piecestore protocol: write tcp 172.17.0.4:28967->100.70.131.26:42072: use of closed network connection”, “errorVerbose”: “piecestore: piecestore protocol: write tcp 172.17.0.4:28967->100.70.131.26:42072: use of closed network connection\n\tstorj.io/drpc/drpcstream.(*Stream).pollWrite:189\n\tstorj.io/drpc/drpcwire.SplitN:25\n\tstorj.io/drpc/drpcstream.(*Stream).RawWrite:233\n\tstorj.io/drpc/drpcstream.(*Stream).MsgSend:266\n\tstorj.io/common/pb.(*drpcPiecestoreDownloadStream).Send:1168\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload.func3:643\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

So I found the successrate.sh script , let it run and got a bit shocked:

image

To my system:
Host is a Unraid System
Ubuntu 19.10 runs in a VM with a 4TB HDD dedicated only for Storj (nothing else runs or is stored on that hdd).
UPS for the entire System (Server, Switch, Router, Modem)
Internet:
Fibreoptic with 400 Mbit / 200 Mbit and a Ping of 12 ms (what I actually receive directly in my Ubuntu VM)
Static IP

I then thought that my hdd might be stressed, so I tested it:


Reached 120MB/s at Read with a access time of 1,76 ms (didnt tested Write because had to unmount it for it).
Seems pretty good so far.

Edit 1:
Checked my logs right now again and found that:


Error contact chore.

Can anybody relate?

Hi Cross,

Those numbers look just fine. The downloads fail because others beat your node in the race to upload to the client. My download success rate hovers around 60%. There are many factors that affect this, but it is not something to worry about. It is part of the normal operation of the system.

You can check out this thread for some comparisons, although older posts may not be a good reflection of the current behaviour of the network.

1 Like

Hi baker,

thanks for you feedback. You’re right. That seems like I loose the race very often. But that is not acceptable for me and the hardware / ISP I use as well that I get paid by successfull uploads to the clients.
So I tried to optimized my server settings and network connections, so that my Ubuntu VM is privileged in my arrangement and that seems to help:

I have been seeing all of those same error messages for download fails from the same satellite. Although, my download success rate has been hovering between 20-30% since the big uptick in download traffic this month.

I have almost 90% on Download(Egress) and 99% on Upload(Ingress)

Just within the last day, I’ve noticed my download success rate has jumped back up to ~50%, but still not great. Upload success rate hovering around 90-95%.

My stats

Moscow
--- A L L  S A T E L L I T E S---
 
118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW     : 46,64% bandwidth, 88,43% effeciency
12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs    : 16,53% bandwidth, 99,59% effeciency
12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S    : 18,83% bandwidth, 99,6% effeciency
121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6    : 18,00% bandwidth, 99,64% effeciency
 
Name          Value                                                       
----          -----                                                       
1. DOWNLOAD   [--------------------------    ] 86,672  	617565 of 712532  
2. UPLOAD     [----------------------------- ] 98,2502 	1400738 of 1425687
3. GET_AUDIT  [------------------------------] 99,9936 	31406 of 31408    
4. GET        [--------------------------    ] 86,0256 	561099 of 652248  
5. GET_REPAIR [--------------------------    ] 86,7849 	25060 of 28876    
6. PUT        [----------------------------- ] 98,2497 	1393415 of 1418241
7. PUT_REPAIR [------------------------------] 98,3481 	7323 of 7446      
9. EFFECIENCY [----------------------------  ] 94,392  	2018303 of 2138219

France

--- A L L  S A T E L L I T E S---
 
118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW     : 38,79% bandwidth, 99,43% effeciency
12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs    : 20,26% bandwidth, 99,94% effeciency
12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S    : 24,55% bandwidth, 99,77% effeciency
121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6    : 16,40% bandwidth, 99,96% effeciency
 
Name          Value                                                       
----          -----                                                       
1. DOWNLOAD   [------------------------------] 99,6591 	411881 of 413290  
2. UPLOAD     [------------------------------] 99,7256 	855775 of 858130  
3. GET_AUDIT  [------------------------------] 100     	8338 of 8338      
4. GET        [------------------------------] 99,6521 	403541 of 404950  
5. GET_REPAIR [------------------------------] 100     	2 of 2            
6. PUT        [------------------------------] 99,7408 	850262 of 852472  
7. PUT_REPAIR [----------------------------- ] 97,4373 	5513 of 5658      
9. EFFECIENCY [------------------------------] 99,704  	1267656 of 1271420

Estonia

Network 300/300 Mbit fiber.

So I copied a 800 MB Log file to my PC. Running the success rate script and realized the amount of RAM it’s using to just process the file. 7 GB. I had about 1 GB left of RAM and I have 16 GB of RAM in total.

I have now setup a task scheduler job on my NAS Server to execute a batch file that will delete the logs on the 1st of ever month so I can avoid this next time.

I am now am just waiting for the script to finish. Think it’s 2-3 months of data.
Here are my results, looking pretty good.

PS D:\Users\User\Desktop\SuccessRate> .\successrate.ps1 storagenode.log
========== AUDIT =============
Successful: 22635
Recoverable failed: 0
Unrecoverable failed: 0
Success Min: 100%
Success Max: 100%
========== DOWNLOAD ==========
Successful: 215780
Failed: 114317
Success Rate: 65.3686643622935
========== UPLOAD ============
Successful: 1402299
Rejected: 0
Failed: 134345
Acceptance Rate: 100
Success Rate: 91.2572463107916
========== REPAIR DOWNLOAD ===
Successful: 90
Failed: 0
Success Rate: 100
========== REPAIR UPLOAD =====
Successful: 90
Failed: 64
Success Rate: 99.2915651981404
PS D:\Users\User\Desktop\SuccessRate>

How is it using 7.4gigs of ram for only a 800MB file…

I don’t know. So far it seems to be processing. So far it is displaying this below sense I posted.

PS D:\Users\User\Desktop\SuccessRate> .\successrate.ps1 storagenode.log
========== AUDIT =============
Successful: 22635
Recoverable failed: 0
Unrecoverable failed: 0
Success Min: 100%
Success Max: 100%
========== DOWNLOAD ==========
Successful: 215780

I would recommend keeping the old log files and executing the script on them using another PC instead. You never know when your node might get in trouble & log files can help diagnose any issue.

1 Like

I realized that, in my batch file after it will stop the services. I changed it from
(del /q “path-to-log”) to

ren “%ProgramFiles%\Storj\Storage Node\storagenode.log” “storagenode_%time:~0,2%%time:~3,2%%time:~6,2%_%date:~-10,2%%date:~-7,2%%date:~-4,4%.log”

So it comes out looking like this
storagenode_104651_01232020.log

So I guess this will work for renaming the logs

1 Like

My used 4,2GB on 400 mb file. Mystory.