Publc coming out. Bye, bye Hetzner!

syncamide · June 17, 2023, 10:09am

Hi! My name is Alexey, and I have an internet addiction I’m excited to introduce my new “temporary” project. At the moment, I have accumulated about 600 terabytes of data, which are located in the Hetzner data center. I am currently live-streaming them to a data center in my home region on my own hardware.

Within our community, it is well known how many difficulties storage has brought us. I personally have been dealing with this issue since the release of v2 and continue to struggle with it. Nevertheless, at this moment, this project is the most successful of all my endeavors.

I decided to create this blog to share my experience and knowledge with those who are also passionate about technology and programming. So far, I have been running it exclusively in Russian, but who knows, perhaps it’s an excellent opportunity to practice English if there is a demand for it.

I would be happy to answer any questions regarding the project. Welcome to my show!

Alexey · June 17, 2023, 11:25am

It’s a pity that you have it only on TG…
Could you please copy it here?

на русском

это очень ограничивает круг читателей, если вы используете только Telegram.
Не могли бы вы скопировать статью сюда, если она имеет отношение к Storj?

syncamide · June 17, 2023, 11:53am

Hi, Alexey, since this is mainly a blog about the process (which is quite lengthy), there’s no option for me to post it on a forum.
It’s like a challenge to transfer 600TB of data by autumn.

Gasp0de · June 17, 2023, 12:47pm

Have you ever been profitable? I just checked and the cheapest server on Hetzner I could find was something like 50$ for 2x8TB. That would only give 25$/month storage (when filled!) plus egress, so likely not profitable. Or did you store the data on network storage, such as a storage box?

syncamide · June 17, 2023, 1:10pm

I am making a profit approximately from the third month of renting servers at Hetzner (excluding the current period when I had to purchase a large amount of equipment). Currently, I have five SX13X servers at Hetzner. Just now, the first one became available it costs 155 euros per month (I have and more cheaper servers), and I can give it away to someone completely for free it paid until today.
I have a notice about my servers in my blog. Looks like I can translate and post it here

Gasp0de · June 17, 2023, 1:25pm

Still doesn’t make any sense to me. How much disk capacity does that server have for 155€/month?

syncamide · June 17, 2023, 1:36pm

It has 160tb of raw space so it cheaper than 1 € per tb

Gasp0de · June 17, 2023, 1:44pm

Interesting, the current offer is 250€ for 160TB. If it is still profitable, why do you move your data instead of starting new nodes in the new datacenter?

Toyoo · June 17, 2023, 2:56pm

Look at the server auctions, not at the regular offer. Offers close to 1 EUR/TB are pretty common there. Though, this comes at the cost that drives are of various mileage.

syncamide · June 17, 2023, 8:00pm

in my experience hetzner doesn’t always install new disks in new servers. This is especially true for nvme media, I can say this authoritatively, as I went through 40 servers (I number them all in order)

syncamide · June 17, 2023, 8:08pm

It’s simple, now storj has entered the unit economic testing stage, and in the very near future we can expect a decrease in income. I calculated and came to the conclusion that hosting here in Moscow would be cheaper than it is now in Hetzner.
Growing new nodes from scratch is a costly endeavor, especially at the moment.

syncamide · June 19, 2023, 1:46pm

Hi, guys! Here’s a translation of a post from my blog. Please give me some reactions if I should continue posting translations here. Thanks!

How I Started with Storj
I started to storj somewhere closer to the end of v2, around 2019. Initially, I installed the software on my laptop and computer, but there was no specific plan to earn money. It was just pure enthusiasm, like many others still have. The nodes naturally didn’t run 24/7, and at that time, it didn’t seem critical. After some time, it was announced that the network would be closing down and the v3 network would soon start with stricter uptime requirements and penalties. During my participation in the v2 network, I collected a couple of hundred gigabytes or so. I kept the nodes online for as long as possible, as the developers requested back then. I don’t really remember how many tokens I managed to earn in v2.

Nevertheless, by the start of the third version of the network, I seriously wanted to earn with Storj. (I only regret one thing: “seriously wanted” instead of “seriously earn.” After four years, I realized the difference between these two approaches.)

At the very beginning of v3, three nodes were launched, and since it was still in its infancy and a hypothesis test, I ran them on my employer’s IP addresses and resources. I named them as follows: Reiser, Blade, and Syntax. I thought these were cool names, and it helped me remember where each node was located and personalize them in my mind. Later on, I abandoned the names and started running the nodes with sequential numbers, but these three still have their names. These nodes operated for several months, and by that time, I had become deeply involved in the project, constantly hanging out on the forum and reading all the messages (by the way, I still do that). After a few months, I decided to move away from using my employer’s resources and become self-sufficient. Even then, I knew that hosting servers at home wasn’t feasible for me, so I decided to rent a server from Hetzner. I’ve been familiar with Hetzner for a long time, and I genuinely believe that they are the best hoster, no matter what anyone says. They provide an incredibly cool service and hardware for the money. My first server had 15 hard drives with 6 terabytes each and a RAID controller. Since I had a weak understanding of the workload profile at that time, I made the mistaken decision to create a single RAID 6 array. It was extremely difficult to migrate away from it later due to constant file walker issues, which never stopped because of IOPS and the number of nodes running on it. That first server paid for itself in just two months (and yes, I always buy servers at auctions or on the forum’s marketplace thread, which allows me to save on setup costs, and sometimes I can get a server for free or for a symbolic $10-20).

After some time, I increased the number of nodes to 23. By that time, due to the hoster’s specifics, it was clear that I couldn’t take one head and stack a ton of blots on it using JBODs because Hetzner simply didn’t offer such options. Therefore, I had to migrate nodes frequently as the disks on one server filled up and move them to another server. I started writing maintenance scripts: for updates, restarts, launching new nodes, and fully automating the port forwarding and VPN setup (only manual start is required). I also conducted crazy experiments to find the right file system and squeeze the maximum number of nodes onto minimal resources.

During my exploration of file systems, I oscillated between ext and ZFS several times, and each time, there was a complete migration from one file system to another and back until I settled on ZFS. It was quite a painful endeavor because everything was copied using rsync, and it took a tremendous amount of time. I quickly realized that using RAID for nodes was impractical for several reasons:

RAID 5: Too high risk of array failure and losing all the nodes (the more disks in the array and the larger the capacity of each disk, the higher the probability of failure, especially during rebuilds).
RAID 6: Too much overhead in terms of resource usage.
Creating multiple small RAID 5 arrays with 3 or 4 disks: High overhead.
Creating a RAID from a larger number of disks: Limited by IOPS.
Therefore, after the first server, I abandoned RAID and ran all nodes on individual hard drives. By the way, not a single one of the 50 disks currently running in Hetzner has failed in the past 4 years. The same goes for all the intermediate servers, even though they were used and had intense file walker activity with constant migration back and forth.

my blog in telegram:

syncamide · June 21, 2023, 9:30am

Hi, guys! I want to thank you for the overwhelming amount of feedback. I didn’t expect it, and it’s really delightful and motivating! I continue to publish translated posts from my blog.

How to destroy 100 tb of data

After I finally managed to get off the first server, I started actively configuring the hardware in Hetzner and, in my opinion, the most successful series for storage is the SX13x. It has almost everything I need: a large amount of RAM, two fast NVMe drives for the system and metadata, and a whopping 10 hard drives at 16 terabytes each. From the very beginning, I became addicted to using Proxmox, although I had no idea what that system was before getting the storj server.

By the time the mining boom for Chia on hard drives started, I had accumulated around 120 terabytes of data in the storj network, and I was in the midst of migrating data from one of the old servers to the new one. This year was marked by two major failures for me:

I managed to corrupt 100 terabytes of storage data during the migration.
Missed opportunities during the Chia frenzy.

How I corrupted 100 terabytes:

Nothing foreshadowed trouble. As usual, I was migrating nodes in a semi-manual mode, experimenting with different parameters of ZFS send/receive. I quickly noticed that using the default parameters, ZFS send generated much more traffic than what was actually written. In some cases, the volume of transmitted data was three to four times larger than the actually written data. This was unsatisfactory for me because monitoring progress based on such data was simply impossible. After reading the documentation, it was discovered that the combination of ZFS send parameters “-Lec dataset@snapshot” generates a data volume that practically matches the “written” property of the dataset. Now, the snapshot sending progress reflected reality quite accurately. The process of sending the first snapshot takes a considerable amount of time during which a significant delta accumulates. In my case, transferring the first snapshot took 3-4 days, depending on the size of the node. Smaller nodes, of course, moved faster. Here’s what I did: I launched a batch of 3-4 nodes to fully utilize the gigabit connection, waited for the successful completion of the initial data transfer, stopped the nodes, started the delta, and once it finished, I started the nodes on the new host. The downtime usually lasted a couple of hours, plus or minus.

After verifying through the logs that the nodes started successfully and there were no errors, I initiated the migration of the next batch. This went on for about two weeks, and nothing foreshadowed trouble… until one evening I received a bunch of disqualification emails. I couldn’t believe my eyes. I started investigating - the audits were catastrophically failing on the first batch of nodes, some nodes had successful audits, and on others, the values were on the verge. It was clear that this was just the beginning of an avalanche. An apocalypse awaited me. Every day, I received more and more disqualifications. Everything I had been working on for the past year and a half was crumbling before my eyes, and I couldn’t do anything about it. It was a complete disaster.

I posted on the forum, reached out to @Alexey, went through the logs to check for failed audits - everything was clean. There were no failed audits. I started talking to the guys in the telegram’s ZFS channel (there are many amazingly helpful people there, I recommend it). I couldn’t figure out what the problem was. There were no data transfer errors, and the snapshots were transmitted without issues. Scrubbing the pools also showed no problems. I started reproducing the migration once again to identify the issue and came across a reproducible bug when sending an incremental snapshot with those infamous “-Lec” parameters.

Here’s what happened in the end: since the migrating batch consisted of nodes of different sizes, the smaller nodes had accumulated a more significant delta, which, after the incremental snapshot, contained more corrupted data. It looked like this: two servers with identical datasets and snapshots, on the source, all files at the top-level and under the snapshot were readable and undamaged, but on the receiving server, after the incremental snapshot, all the files that had changed during the accumulated delta were corrupted (they were still intact under the first snapshot). The nodes that migrated faster and waited for larger ones to complete disqualification were disqualified first, followed by the larger ones, and so on. By the time the disqualifications began, it was impossible to do anything because the source had already been removed. Two weeks had passed from the completion of the first migration to the first disqualification.

Special thanks to everyone who supported me during this difficult time. Some even shared resources from their backup fund. I want to say a huge thank you, guys, you really supported me. I wouldn’t say I was devastated, but it was still very sad. Out of 120 terabytes, only 20 remained after the migration.

The Chia frenzy was ahead… It was an exhilarating time, and I was extremely motivated. I made several new acquaintances and gained very interesting experience. The storj fiasco quickly lost its relevance, and I fully immersed myself in Chia…

Next post in my blog describes my adventures in the realm of Chia. I’m not sure if it’s worth publishing the translation here since it’s not directly related to storj. However, many of us, if not all, are enthusiasts in one way or another, and perhaps many would be interested in reading about petabyte-scale plotting and everything related to it. If you’re interested, let me know with a reaction. Thank you!

heunland · June 21, 2023, 10:42am

This forum Is not the right place to post about your Chía plotting experiences, please post those on Chía forums instead. Thanks

syncamide · June 21, 2023, 3:52pm

okay…
This post is just part of the storytelling. If suddenly someone is interested in reading about it, just like it here, I will send you a link to the web version of the post translation in a PM

syncamide · February 28, 2024, 1:17pm

Hetzner is out of the way

Roxor · February 28, 2024, 1:54pm

Wow, it took you a long time to move all that data! Did you add any new nodes during the migration? How many are you up to now? ( th3van needs some competition )

AiS1972 · February 28, 2024, 2:24pm

You are the best, bro!

But let’s expand the topic of those who left the road.
Germany has left the chat…