Another AI-generated web dashboard

:white_check_mark:

Awesome work The dashboard looks great and multi-node support is a huge win. Maybe add simple alerts (email/Discord) for downtime or low success rates — that’d make it even more useful.

Small heat map update (using Anthropic Opus. Sonnet and Gemini could not figure things out and got stuck in a loop. Total cost - $9.40)

eyecandy

The events appear at the scren at teh cadence matchign actual timestamp of log entries!

3 Likes

How Large should the Logfile be, to get good measurements for the success rate? I capped mine to around 10Mb size since it filled my small OS Drive. I want would like trying this dashboard soon. Am I understanding this right, it’s “just” a Python script? So Installing python on my system and run it with “nohup” should work? Are you planning to run it via Docker? That would be great.

It measures within last hour. It’s configurable.

M? Or G? You can move log file to your storage drive.

Install uv instead, it will manage python, virtual environment, and application dependencies for you.

uv run websies.py --node "name:/path/to/log/file" …

I mount the log files over nfs from multiple nodes to the monitoring host that runs the script.

No, why?? Python virtual environment, including the one managed by uv, is enough.

Yes. Or with disown (e.g. if you use zsh, add &! in the end – it will both background and disown the process. And if you use bash – you’ll have to call disown explicitly. But come one, migrate to zsh already – much better shell with much better license.). Or screen. or tmux. Better yet – create systemd daemon.

You would want to redirect it’s logging somewhere – it keeps local sqlite databases with node stats and logs various maintenance events there. So if you are running it on raspberry pi – connect a real SSD.

For those playing along at home, running this behind Cloudflare Zero Trust, and getting frustrated at why does this heliocentric salad on the stick fails when iCloud Private relay is enabled – this is the answer right here: go to your cloudflare zone → Speed → Settings and turn this shit off:

In the meantime, I asked Robot-Friend to implement a few more improvements:

  • Add Hashstore stats
  • Make plots more persistent on server restarts and page reloads
  • Fix map dragging
  • Improve error reporting (by agregating errors that only differ by data – such as IP addresses or sizes)

You might need to “Reload the page from Origin” to update the browser cache for this to work. Enjoy :slight_smile:

Hashstore Stats

Better error grouping

Better traffic plot

The data on the dashboard led me to an interesting insight: on large pieces (over 1MB in size) my sucess rate is absolutely horrific on all nodes except one. Which one? The one on the gigabit symmetric line (Node 4 on my link above). The other nodes are connected via cable internet with 35Mbps upstrem bandwidth. Evidently, this matters.

The fact that for smaller transfers success rate is fine across all nodes indicats the problem is not latency, but throughput. Again, maybe success rate is lying more on small pieces, and its equally horrendoues, I don’t know. Either way, nothing I can do about it.

2 Likes

Yep. I’m looking at success rates for small and large pieces separately from time to time. In my case I’m usually losing small piece races, potentially suggesting I’m rather far from usual customers; can’t really do much with this. But large pieces go pretty well.

@arrogantrabbit Just fired up the “Storagenode Pro Monitor” on one of my hosts with a few nodes - great work!! It’s very useful and good and creative thinking on the types of statistics.

A few thoughts on new features:

  • Sorting in the lists (especially the HS Compaction stats for node name AND time consumed, but preferably all headers)
  • Multiselect for which nodes to view data for
  • Possibility to hide / collapse the different sections
  • Popup info on traffic section, text is black on dark background in my browser)
  • Traffic section, when viewing size, could be nicer with GB TB etc. Perhaps overlay on the graphs?

..if you really want to go all in..

History on the hashstore compactions with some graphs for development in runtimes and sizes? Maybe with multiselect on s0/s1 for each sat?

This is actually better for some parts of the system stats than my grafana/prometheus that I’ve been tweaking and optimizing for a long time!

1 Like

Sure! That was the point — I did not want to hammer nails with a microscope (the whole grafana thing is a bit overkill) and since I was going to ask AI to create dashboard there anyway, because I don’t have neither interest nor time to learn it — why not have it create the small app instead.

I can’t take credit for it though — it’s all Gemini/Opus. My contribution here is limited to writing system prompt, then stating what I want, and pestering the AI until it fixes all the bugs or loses its mind. Rinse, repeat.

I started with giving Gemini storagenode log and asking it to surface useful information. It picked Python+javascript, created the first version, and we then iterated over it since, in separate clean chats. I make a point not to look at the generated code. Treat it as a black box.

The key is once one agent gets stuck — ask another to solve the problem.

I’ve tried using local models (whatever fits on my Mac mini m4 pro 64GB) but Gemini and Opus are faster and better. Local models that fit on my Mac are not “smart” enough. Or I did not find a good one.

I also noticed sonnet gets lost quite quickly. And opus needs a lot of detailed guidance. Gemini is a hit or miss — so you need to retry a few times until it gets what you want. So it’s chaotic but ultimately useful.

So I’m going to tell Gemini to incorporate your suggestions one by one (I found it’s best to compartmentalize the work, focus on one specific thing at a time, and stress to the model multiple times not to mess up existing work, and keep clearing the context often, otherwise they go off the rocker quickly) and after few iterations of complains and convincing the model to “think harder” upload new versions.

Hashstore table improvement was already on my list, I was waiting for some data to get collected first to get an idea what we are looking at and what can we possibly want to see. The rest — great ideas. Will get them done shortly.

Btw you may want to git pull — there are some very recent (an hour back?) fixes for the live data (live data was missing repair traffic while historical data had it)

3 Likes

:check_box_with_check:

compaction

Nodes and visibility

4 Likes

Looks really nice, I will have a look later :smiling_face_with_sunglasses:

I did notice that my logrotation with copytruncate breaks the flow, and it halts getting new data when it happens. Also, in this state you can’t break the process with ie. CTRL-C.

2 Likes

Huh, interesting. I think copytrucate would indeed hang. Newsyslog renames the current file and creates an empty one. This is tracked by watching inode.

I’ll ask robot friend to also handle copytruncate. This probably will result in some performance improvement because i think right now its polling the data.

2 Likes

:white_check_mark:

Separate question — why are you doing copytruncate in the first place ans opposed to rename and signal? I know storagenode ignores sighup, so I tell it to log to stdout and run it with daemon utility that captures stdout and does handle sighup. And other responsibilities well behaving daemon should. freebsd_storj_installer/overlay/usr/local/etc/rc.d/storagenode at bfa27a9c74198d7737e092de583d86a145fa6fab · arrogantrabbit/freebsd_storj_installer · GitHub

At the time of joining StorJ a few years back, mounting a log path into each docker seemed like an easy way to get logs out of docker into the host filesystem for management. I’m running them in a compose stack.

Not pretty, and probably better solutions out there :wink:

I’ll have a look at other options, maybe stopping/starting your dashboard at logrotation could be a fix as well.

I don’t have too much spare time for this just now, so it’s going on the todo list :slight_smile:

It shall work with copytruncate now.

2 Likes

Cool ! I have updated and are looking forward to seeing it still run tomorrow morning :smiling_face_with_sunglasses:

And, I had a look at all the changes and additions, looks really good so far!

Great work

1 Like

"Node 1 name:/var/log/node1.log
Where are un linux logs located?

If you use docker, then docker logs storagenode.
Docker usually stores these logs there:

/var/lib/docker/containers/<container_id>/<container-id>-json.log

However it’s better to configure storagenode logs to be stored on your expected location, see

1 Like

Realized the timestamp resolution in the log file is 1s, so the animation was naturally clamped to 1s cadence. Ugly.

Now the monitor recovers timing from the actual log entry arrival time.

This of course does not work if the log file is on a network filesystem. For such case there is now an option to use log forwarder.

Animation is now smooth and accurate :heart_eyes:

1 Like