Upload/Download Dynamic Histogram (and geographic distribution visualization)

I was curious what is all that egress latelty, so I ask Gemini to create a tool to see uploads/downloads in realtime. After few iterations, I got what I wanted. The code can definitely be improved, but it works, and it’s absolutely phenomenal return on an hour of guidance.

Example usage:

  1. Dynamic realtime histogram of last 100 uploads/downloads:
    python3 log_monitor.py /var/log/storj.log --log-x-scale --log-y-scale --x-labels 16 --lines 100
    
  2. Entire thing, also updates dynamically
     python3 log_monitor.py /var/log/storj.log --log-x-scale --log-y-scale --x-labels 16 --ingest-all
    
  3. Or to see what’s going on with respect to entire histogram – by highlighting bins last 10 items went into:
    python3 log_monitor.py /var/log/storj.log --log-x-scale --log-y-scale --x-labels 16 --ingest-all --highlight 10
    

Screenshots and screencasts see at the linked gist – for some reason can’t attache webm/webp here.

9 Likes

Great script - I’m surprised what the tools can crank out in so little time!

Someone certainly is pulling a lot of data out. Th3Van’s graphs show 48h+ of more egress that ingress!

I’m curious, who. My upstream of 50 Mbit is saturated since days on 2 IP-Adresses. First it was a lots of repair: 1/4 normal usage and 3/4 repair, now it flipped around.

It’s quite good! The key is to talk to it like you would to an intern – capable, but not very experienced. And have a lot of patience. Then you get great results.

Well, why wouldn’t we find out? (30 min of furious typing later):

For those playing along at home – here is the source: GitHub - arrogantrabbit/storj-activity-viewer: AI generated activity visualizer from a storagenode log file

12 Likes

Looks like most uploads are done by the “west coast”. Are these all up and downloads from your nodes? Looks like there are just a few “spots” making all the traffic.

Can you filter out satellite gateway uploads?

How could I do that? I would need to know gateway IP addresses, that are probably dynamic.

But nothing prevents feeding the tool filtered log to monitor..

tail -f /var/log/storj.log | grep -vFf gateway-ips.txt > filtered.log

It’s from one node. I guess it is not unexpected — node is in thr San Francisco Bay Area, so it’s expected it will get more traffic from nearby. And concentration it’s either large clients or s3 gateway, as @toyoo pointed out above