[Tutorial] How to run your own satellite (part 1)

Hi there,

I’ve spend a few days playing with satellite, it’s nowhere near ready but I think, while I’m at it if I share my experience, the more people can host satellite the better for storj ecosystem.

One thing I notice, official storj implementation is not a fixed thing, it subject to change, so you need to stay on top of it, I learn a few things about storj direction when reading comments from Gerrit (storj’s review platform), if you read it - you are more prepared for the future.

Another cool trick I discover, you can download storj zip file https://github.com/storj/storj/archive/refs/heads/main.zip , upload it to CLAUDE and start asking, no other AI can do that (let me know if you know other AI can do this).

Part 1. Preparing your foundational infrastructure:

Run a satellite is a long game, you need to be well prepared:

  • a domain (maybe?).
  • a place to host/mirror source code.
  • a CI/CD system.
  • and a sound backup plan for your infrastructure.

Domain

While searching for your ideal domain, be aware of Domain name front running - Wikipedia, search via some reputable provider, or use whois command line to search.

Source Code Hosting

Any provider would be fine, but if you want to use some friendlier for opensource (eg: codeberg, etc..), be aware that they have some storage limit (100MB for private repo), it come to my attention when mirror projects from github.

Personally nowaday, I use forgejo, very simple to self hosted:

docker volume create forgejo_data

docker run -d \
  --log-driver none \
  --name=forgejo \
  -p 6157:3000 \
  -p 6122:22 \
  -v forgejo_data:/data \
  --restart=always \
  codeberg.org/forgejo/forgejo:15.0.2

Pointing DNS to your server and asking Caddy to serve:

forgejo.yourdomain.com {
    reverse_proxy 127.0.0.1:6157
}

It’s THAT simple! And if you are on dynamic IP (most of us are), use ddclient.

CI/CD

Forgejo have an integrated CI named Actions, but I’d prefer Jenkins where you can have freestyle job, tie not to any repo, and have ton of built-in and plugins (with of course steeper learning curve). To install and maintain it:

docker volume create jenkins_home

docker run -d \
  --log-driver none \
  --restart=always \
  --name jenkins \
  -p 7514:8080 \
  -p 50000:50000 \
  -v jenkins_home:/var/jenkins_home \
  -v /var/run/docker.sock:/var/run/docker.sock \ # for Docker in Docker
  -v /usr/bin/docker:/usr/bin/docker \
  jenkins/jenkins:latest

# Need to rerun everytime you upgrade jenkins 
DOCKER_GID=$(stat -c '%g' /var/run/docker.sock)
docker exec -u root jenkins groupadd -for -g "$DOCKER_GID" dockerhost
docker exec -u root jenkins usermod -aG dockerhost jenkins
docker restart jenkins

# install custom software, need to rerun everytime upgrade jenkins
docker exec -u 0 jenkins apt update
docker exec -u 0 jenkins apt upgrade -y
docker exec -u 0 jenkins apt install golang -y
docker exec -u 0 jenkins apt install python3 -y
docker exec -u 0 jenkins apt install python3-venv -y

Backup Plan

If your file system that have snapshot functionality, use that, otherwise, this is what I do: create a freestyle job to run backup periodically.

# jenkins
tar --ignore-failed-read --warning=no-file-changed --warning=no-file-removed -cvf jenkins_home_backup_$JENKINS_HOME_BACKUP_TIMESTAMP.tar.gz --exclude='*/.cache' --exclude='*/workspace/*' --exclude='*/.rustup' --exclude='*/go' --exclude='*/.cargo' -C $JENKINS_HOME .

# forgejo
docker exec -u git forgejo forgejo dump -c /data/gitea/conf/app.ini -f /tmp/forgejo_dump_$FORGEJO_DUMP_TIMESTAMP.tar.zst --type tar.zst

And send it to some place (other machine or cloud).

Maintaining Jenkins/Forgejo

To upgrade Jenkins/Forgejo (it will tell you if there are newer version), simply do this:

docker pull jenkins/jenkins:latest
docker rm jenkins -f

# forgejo require you to know the exact version
docker pull codeberg.org/forgejo/forgejo:15.0.2
docker rm forgejo -f

# then run docker run command above
# (see above)

The prepare step go much further, see you in part 2.

Part 2: [Tutorial] Run your own satellite (part 2) - Set up cockroachdb
Part 3: [Tutorial] Run your own satellite (part 3) - Generate certificates
Part 4: [Tutorial] Run your own satellite (part 4) - Building storj binaries
Part 5: [Tutorial] Run your own satellite (part 5) - Familiarize with jargon
Part 6: [Tutorial] Run your own satellite (part 6) - Database migration
Part 7: [Tutorial] Run your own satellite (part 7) - Satellite api
Part 8: [Tutorial] Run your own satellite (part 8) - Jobq

Eh? How?

This is a nice write up about jumping through hoops for no good reason, and that has absolutely nothing to do with running a satellite in the first place.

The useful tutorial would be about how to operate this as a storage business: pricing models, billing disputes, customer support, SLA enforcement, legal liability, abuse complaints, data retention, operational staffing, incident response, recovery planning, and whether there is even a viable business here in the first place.

That’s what would be useful. Running satellite is an uninteresting mundane implementation detail, that will change anyway by the time the above will be figured out.

Right now the only people for whom running satellite is interesting is storj themselves, and they are not exactly overwhelmed with customer demand.

Then on your points — you made three, outside of how to run generic containers and that you need source control.

  • domain “maybe”? How do you imagine scenario where you don’t need a domain?
  • Backup by tarring raw files from disk? Seriously?
  • THAT simple — if you want your server to die under abuse from the great internet, sure, go right ahead.

the rest reads like an ad for your favorite (in the best case, huge benefit of the doubt extended here) CI solution, nobody heard about and should not be using.

Dude. If you are writing tutorials, have someone who has actually run public infrastructure review them first. Preferably someone who has dealt with outages, abuse reports, corrupted backups, legal complaints, and customers screaming at 3am because “simple” turned out not to be simple at all.

Sorry what? You are not in a position to write tutorials.

Most of these aren’t necessary for cases where a company uses their own hardware and just wants a local installation. I was looking into this kind of a setup for one of my customers—they already have on-prem hardware and just wanted to have fast object storage layer on top of it. But I agree OP’s post just scratches the surface even of the technical complexities.

How is that feasible on a small scale? I.e. what’s the point? What’s the benefit over minio (or ceph)? You have to have hundreds of nodes to make math work alone. Otherwise it’s no better then replication. If you own, control, trust the hardware - you don’t need 90% of what storj does. Storj’s useful property is untrusted, geographically distributed storage operated by unrelated untrusted parties. An on-premise cluster is a different problem

Minio was never able to scale up enough, and with the recent moves to close it up, difficult to trust. Ceph requires carefully constructed networking, not something easy to do if a company is not willing to rewire servers. Storj’s advantage is exactly that because it was designed for the complexity of WAN networking, it will excel at any LANs too, even badly configured. Granted, 90% of what Storj does would not be used—but the same case is with Minio or Ceph. They also have tons of features that would not be used.

Interesting. I guess then why not!

@Toyoo is correct we have had multiple customers who replaced Ceph (and several similar systems) with Storj. In conjunction with Object Mount they get a fast almost infinite network storage. It’s also very easy to scale and you not limited only by LAN, you can also join WAN nodes without any issues.