Script to run used space filewalker limit to one at a time

littleskunk · July 3, 2024, 1:38pm

I am running 8 nodes on 8 HDDs connected to a Pi5. As you can imagin this is kind of borderline but it works surprisingly good. (still testing and I am going to share it in a different thread in a few days). One of the challenge is to stretch out maintenance jobs to make sure ongoing uploads are not impacted. CPU time is rare so lets use it most efficient.

Here is my script to run filewalker on all 8 nodes but only 1 at a time. You might want to copy and adopt that for your needs. It does require access to the debug endpoint of the storage node.

for i in {8..1}
do
        echo $i
        sed -i 's\storage2.piece-scan-on-startup: false\storage2.piece-scan-on-startup: true\g' /mnt/sn$i/storagenode$i/storagenode/config.yaml
        sudo systemctl restart storagenode$i
        sleep 30
        sed -i 's\storage2.piece-scan-on-startup: true\storage2.piece-scan-on-startup: false\g' /mnt/sn$i/storagenode$i/storagenode/config.yaml
		
        while :
        do
                if (( `curl -s 127.0.0.1:$((16000+$i))/mon/ps | grep SpaceUsedTotalAndBySatellite | wc -c` <= 0 ))
                then
                        break
                else
                        sleep 30
                fi
        done
done

Now there is a chance that one of the storage nodes failed to run it for what ever reason. Maybe a restart or so. The following 2 code snipets will print out more details.

for i in {8..1}
do
	echo $i
	curl -s 127.0.0.1:$((16000+$i))/mon/funcs | grep -A 2 SpaceUsedTotalAndBySatellite
done

Example Output:
8
[5437966273498512387] storj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalA                                                                                               ndBySatellite
  parents: 631927965255748052
  current: 0, highwater: 1, success: 1, errors: 0, panics: 0
7
[488868047661742421] storj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAn                                                                                               dBySatellite
  parents: 2052289714535893614
  current: 0, highwater: 1, success: 1, errors: 0, panics: 0

for i in {8..1}
do
	echo $i
	curl -s 127.0.0.1:$((16000+$i))/mon/funcs | grep -A 29 SpaceUsedTotalAndBySatellite
done

Example Output:
8
[5437966273498512387] storj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite
  parents: 631927965255748052
  current: 0, highwater: 1, success: 1, errors: 0, panics: 0
  success times:
    0.00: 22m49.347719168s
    0.10: 22m49.347719168s
    0.25: 22m49.347719168s
    0.50: 22m49.347719168s
    0.75: 22m49.347719168s
    0.90: 22m49.347719168s
    0.95: 22m49.347719168s
    1.00: 22m49.347719168s
    avg: 22m49.347710786s
    ravg: 22m49.347719168s
    recent: 22m49.347710786s
    sum: 22m49.347710786s
  failure times:
    0.00: 0s
    0.10: 0s
    0.25: 0s
    0.50: 0s
    0.75: 0s
    0.90: 0s
    0.95: 0s
    1.00: 0s
    avg: 0s
    ravg: 0s
    recent: 0s
    sum: 0s

Mitsos · July 3, 2024, 3:29pm

So the 1 node per core doesn’t apply anymore?

littleskunk · July 3, 2024, 3:47pm

That is unrelated to this topic. The Pi5 has 4 cores. To be safe I would still recommend running just 4 nodes on it and not 8 as I am doing it right now. I doesn’t change much on the script above. I would still run just a single used space filewalker at a time.

Mitsos · July 3, 2024, 3:48pm

Should we start a new topic on who can “bend” the supplier terms then? I would love to know more about this topic.

littleskunk · July 3, 2024, 4:02pm

You might have a wrong impression of why I am running this Pi5 experiment. I am not recommending 8 nodes on a Pi5. I am sharing my script with you and it will work even with just 2 nodes.

Craig · July 3, 2024, 4:15pm

It looks like this script assumes use of the native Linux client and has been configured for management with systemctl. Would similar be able to be done for docker installs replacing the systemctl entry with docker restart of the container? Unsure if the docker container will re-read the yaml on restart of if it would take a full stop/run to do that.

I guess the for i in would also need to be swapped with some logic that would get the container list from docker ps -a -q as well…

Thanks for sharing your script! I’ll be watching for your RPi thread as I’ll want to jump in with some of my recent observations of my RPi node.

EDIT: hmm… the curl line throws a twist in there for docker as well. I am interested in figuring this out for docker nodes and you’ve given me a great start to investigate!

MORE EDIT: I also realized I am not yet making the debug available on my nodes. For those like me that need this info, here’s littleskunk’s previous post on enabling debug, at least for docker nodes:

littleskunk · July 3, 2024, 5:04pm

You already nailed it. Yes this should work with docker as well. Instead of obtaining the list of storagenodes I just named them storagenode1 - storagnode8. It should work the same way with docker. Just replace the number in the name with $i

Craig · July 3, 2024, 5:51pm

Sigh… and today I learned that you can manage docker containers by the name. This entire time I’ve been managing them by the container ID value. I swear at one point I tried to do something via the docker container name and it wasn’t allowed, forcing me to instead use the container ID. Well this is going to make my life a lot easier including tweaking this script for docker systems.

Solu · July 3, 2024, 6:02pm

Awesome thank you, now i will think again about my 48 node cluster

EasyRhino · July 3, 2024, 8:24pm

seems like this might also be a good idea if more than one node is sharing a disk or array.

(of course you’re not supposed to be running multiple nodes on one disk)

Toyoo · July 3, 2024, 8:38pm

Seems like a lot of work. I’ve just wrapped the file walker with a shared flock(2).

striker43 · July 4, 2024, 3:59am

On a raspberry pi?

Solu · July 4, 2024, 5:12am

Well, i trust the pi to handle a lot but 48… could become quite a challenge

So for this time i’ll go with a HP server

Craig · July 5, 2024, 2:39pm

Would you mind expanding on this? I looked at the man page for flock and it seems this is for a higher level programming than a shell script. Maybe you meant to link to flock(1)? I’ve never used this in Linux but from knowledge of object locks on other platforms (I’m an IBM i (AS/400) admin/engineer by trade) I can definitely see how the logic could work. Thanks!

Toyoo · July 5, 2024, 9:15pm

No, I’ve literally wrapped the filewalker itself with flock(2) by patching the source code.

flo82 · July 6, 2024, 7:20am

That is against the terms:

You will not modify or attempt to modify the Storage Node Software for any purpose

https://www.storj.io/legal/supplier-terms-conditions

pasatmalo · July 6, 2024, 7:56am

This script is not modifying the software in any way, all that its doing is coordinating the enabling/disabling of the piece scan on startup flag in the configuration file. Changing values of a configuration file is not modification of the software. Modifying the software would be changing the actual code to alter specific behaviours.

flo82 · July 6, 2024, 8:44am

The script is fine.

With “sourcecode” I thought he was referring to the storj sourcecode. If it is not the storj sourcecode it will be fine and i’m sorry.

Alexey · July 6, 2024, 11:04am

However, I would prefer to have a PR for the team’s review. It sounds like a useful addition…
@Toyoo FYI

Toyoo · July 6, 2024, 2:02pm

Sorry, I don’t have time to create a PR out of it. Besides, I have only implemented a Linux versionof it, I don’t know how to do this on Windows. However, it’s simple enough that I’m pretty sure any Storj developer could reimplement it pretty quickly.