If you have any ideas for improvements I can surely ask robot-friend to implemen those.
Updates:
Aug 22: handle log rotation.
Aug 23: fix bandwidth plot stability
Aug 24: fixes to work with iCloud Private Relay (reconnect websocket as needed)
Aug 25: table to show history of success rates
Sep 11: A bunch of forum goers suggested improvement: Heatmap, concurrency, success rate by satellite, varous metrics on plots
Sep 15: Multinode support. Provide name and path to the log file on the command line: --node "Node1:/mnt/pool1/.../storagenode.log"
Sep 17: Overhaul the Live Traffic Heatmap with Anthropic Opus.
Sep 18: Overhaul Data Transfer Size Distribution – turn it into bar graph and add failed pieces on top in red with Anthropic Sonnet thinking.
I found Opus to be vastly superior for complex architecture tasks to Sonnet (expected) and even Gemini Pro. I’ll ask it to redo the rest of the charts and plots at some point in the future, i’m trying to moderate my spending on api costs.
Multinode link (no authentication for now, accessible by everyone):
You can add history for every hour here, may be as a graph. Also multinode support will be great. Like if you have folder with files storagenode1.log, storagenode2.log etc.
Also you can divide all the world map on square elements and count pieces in every of them. Then there will be heatmap of your uploads )
There are a couple Grafana/Prometheusdashboards out there I’ve heard are used for larger installs. Once node metrics are in Grafana you can do pretty much anything you want with them.
These are aggregated sizes reported in the “Size” in the logs.
For sure. And not be limited by whatever can be streamed. I’ll see how well robot friend can deal with grafana. Or at least try to get stats from API instead of parsing logs. After i figure out why cloudflare daemon gets concussed there.
It does – I’m absolute dumbass when it comes to web development, all those html/javascript shenanigans are black magic to me. And after reviewing it anyway – I"m even more convinced that I would never want to write that type of code by hand.
AI brought it in the realm of possibilities.
put app.py and index.html from the gist into the same folder
Edit path to the log file in app.py
Run it. The easiest to run would be with uv – it will take care of all python dependencies:
uv run app.py
if you don’t have uv – install it, apt-get install uv, or pkg install uv or brew install uv or from here uv.
If you don’t want to install UV – that’s fine, but you’ll need to manage python environment yourself, something like:
I’ll incorporate suggestions from this thread shortly and update the gist
P.S. I do realize that probably prometeus/grafana does all that but:
I don’t have mental capacity to learn all that in the evenings
Those are overkill massive solutions, and I"m still on the quest to power efficiency. I’m considering to try something API based – either rest, or perhaps DRPC – the same way multimode dashboard works. Or, in fact, ask Robot-friend to fix usability issues in multimode dashboard. But that’s after I’m done playing with this toy
Asked Gemini to fix the issue when it stops updating the page after a while. It seems to happen after the log file was getting rotated. Updated the code to watch for I ode changes. New version is on gist.
Imma give you the ‘Extremely Groovy’ award in the category of best Storage node monitor. Thanks for your efforts, I’m sure everyone is appreciative - AI is coming a long way, and fast.
2 cents,
Julio
P.S. I might just set this up for Windows, if I may.
Pretty simple set-up. I have python39 & node.js installed on most of my windows boxes, so I only needed to ‘pip install aiohttp geoip2’ dependencies. ‘python websies.py’ and away she goes…sweet. Imma stare at it for while now. Again thanks AR.
It kind of still does: 99.99% success rate is clearly fake, and it means maybe everythign is right; howeer if you do see success rate lower than 95% – then something is definitly horribly wrong.
So thre is still some value in it. And with hashstore the values appear to be closer to real – so once these nodes complete migration the usefulness of this will increase.
Observing this myself. My hypothesis is that this is the adaptation of the power of two choices algorithm. As the node reaches peak, more concurrent operations slow down all your operations, so the success rate as seen from the satellite PoV drops. And then your node regains it over the next minute as your operations are not bottlenecked by concurrency.
It would probably average out better if the satellite took longer history for estimating the success rate, but there’s likely value in keeping it short.