Wish i was a go(lang?) programmer - diagnostic tools for SNOs

ATM have 5 storage nodes - 2 on BSD Jails on TrueNAS - seem to be slow but very reliable though I’ve had to do some customisations over the years for update etc. 2 on Raspberry Pi’s using USB 3.5" adapters using standard docker installs and 1 as an lxc docker based on my Proxmox server using a similar USB adapter.
That node is still vetting.
The HD on the proxmox guest exploded and needed recovery and after transferring to a new HD ended up with a few files in lost+found. Using a python script found from these forums to recover piece filenames from files, I ended up restoring those files after fixing the filesystem. There were 2877 missing files I managed to restore.
Using more information from these forums, I’ve since managed to hack the python script to extract the hash values of the pieces so can now validate pieces.
Out of the 2877 pieces I recovered, 2 of them ended up being corrupt.
I’m now running a scan of all the pieces on that node and so far have found 33 pieces with issues - 3 of which were deleted whilst the scan was on.
As the topic says, I wish I was a golang programmer that could provide diagnostic tools for storage node operators to help health-check our systems when we have issues.
To be honest, I’d throw away a lot of the gear I’m using if I was paid enough to replace it, but we’re not, so we keep junk going for as long as we can. We really need more diagnostic tools to help.
I’ll try and provide my python/shell scripts once I’ve had a proper chance to test them - not in panic.
I also wasted a lot of time trying to work out how to calculate blake3 sums on a debian 11 system - I even tried to build the relevant packages only to find there is a compiled golang binary available, although I haven’t found a piece encoded with it yet as they all seem to be sha256 hash checked so far.
Tooling is important after an issue, to determine if it’s worth keeping a node going or tossing it - especially before you expose them to clients or to satellites.

5 Likes

The Community contribution is very welcome! It would be a quicker way than wait while our very busy developers could find time for a such tool, but meanwhile your tooling can help someone else too.
So, please do not hesitate to publish it!

I hope you make a github or gitlab project

1 Like