Node Churn - Why did you shut down your node?

Knowledge · November 8, 2022, 7:55pm

I’m curious, if you’ve had a node and it contained more than 1 gig of data at any point, and you ended up shutting it down for some reason, why was that? I can envision the following…

Unrecoverable errors
Not enough earnings as expected
Technical issues so node wasn’t able to stay online.
Got bored and went on to other things…

I imagine there are other reasons. And I am curious as to how many people fall into these numbers, not just SNO’s quitting, but rather even SNO’s that are here but ended up having one or more nodes die for some reason and then kept on with a new node. Why did your node die, and how much data was on it?

I think with #2 above, the company could change the “Average Earnings” statement on the sign up page. While not exactly wrong, it’s not an average, and it can make someone think they might make $50 a month running a node. We should probably show true averages with some examples. So that people have a better understanding of what their earnings are like. That would likely reduce churn.

Note - This isn’t an official question from Storj Labs. I am asking for my own curiosity but this info may be used to provide examples to the rest of the company.

Vadim · November 8, 2022, 8:20pm

In my 3 years coworking with storj, I have several died node. very little times it was hardware error.
Major problem was layer between chair and keyboard.

littleskunk · November 8, 2022, 8:23pm

I almost got disqualified for high IOwait. The internal file system check is missing a timeout. The storage node doesn’t notice that it is going to fail all audits. I would call that a bug and not a feature.

Another issue I had was my network connection. I had package drops. I got suspended for low uptime at some point, fixed the package drops and managed to recover. The point is there is no warning. On my side I was unable to detect that issue. I had to get suspended first. Not a great experience. That issue was fixed in the meantime. Clement added uptime and audit score to the metrics endpoint. I now have it on my grafana dashboard. Next time this might happen I would get an email warning from my grafana dashboard. I would assume not everyone is able to setup grafana. Other operatore might still run into that problem.

I got disqualified for that garbage collection issue a few month ago. Communication was terrible. If I wouldn’t work for storj I would have terminated the node that was disqualified. Only because I had access to more information I was keeping my node online and the disqualification was removed at some point. Next time I hope this gets communicated in time. Or lets hope that it gets communicated at all because even that is still missing.

Too many tiny pieces on my hard drive. For hours my hard drive was just busy running used space calculation on startup and also garbage collection. I found a way to bypass that limitation. Again I would assume average operator still has to deal with this issue.

CutieePie · November 8, 2022, 9:15pm

Needed the storage capacity for something else (I’ve had a few TB’s of node released (~16TB) as I needed the space back, and there is no easy way to reduce the amount of storage allocated so they just got switch off… Did try a Graceful exit, but on TB’s it was literally at 0-3% in a week as it was stuck uploading 1k file to nodes ! that could do with some work as it is limited by concurrency and not piece size )

CP

digitalfrank · November 8, 2022, 9:18pm

I started with 3 3tb nodes which then increased to 5 in the space of 3 months, the last two of which were 14 and 6 tb. All 3tb disks but with Raid 5 or raid 6. I bought a 3tb batch of 30 pieces at a bargain price so I could afford it and I also have several hp proliant g8 servers that I don’t use. Unfortunately for my carelessness I did some maintenance work on the server after about 6 months and I swapped various disks between the various raids 5 and 6 which destroyed the two large nodes. Consider that I have had storj nodes for about 38 months but the two lost nodes initially then I rebuilt and restarted them and now the oldest node is 32 months old with a raid 6 from 8 3tb disks and has been running since then with a capacity of about 16TB of which 13 are occupied. Then over the years I have integrated other nodes and now there are 10 nodes of various sizes scattered here and there, at a friend’s house or at the grandmother’s. At home in my CED I have the 6 largest nodes for a total of 40tb and the other 4 are 3tb. So today I manage about 52 TB of space of which about 75% full. I confirm from personal experience that HGST discs are the best in the world. In the last 4 years I have only had 2 failures. They are all SAS disks operated by cached HP controllers.

Toyoo · November 8, 2022, 9:19pm

One node, long time ago failed because of networked storage having occasional latency hiccups, leading to IOwait and other standard issues (high memory consumption, connections piling up, etc.). Now I have enough experience to debug and deal with these kinds of problems, at the time I didn’t.

aad · November 9, 2022, 12:45am

Bought a couple of used drives. Two of them started to fail. First one I wasn’t able to save, second one I noticed in time and was able to copy the data off to another.

DD7 · November 9, 2022, 2:26am

I am not surprised at all by the statistics. I AM the statistics.

Here is what actually happens:

When I first setup a storage node, I expect my 4TB to be fully utilized in days and earn $1.5x4 per month in no time. A few days later, I see that the actual rate is 5GB/day. At this rate, it’s gonna take years. My disk could die before then. So I pull the plug.

Later, I join the forum and learn that after vetting period the rate could go up to 1TB/month and egress can be a big part in the payout. Now that’s better. So I start a new node. Why do I start a new node instead of using the old one? I want a clean slate. I want to make sure that all past history/reputation has zero effect on this thing. Beside it’s just a few GBs.

Now I start to think about the most efficient way to do this thing long-term and plan for the future expansion. So I experiment with things like lvm, mergerfs, NFS, USB drive, etc. Also I try running it in docker, windows, docker in windows, docker in windows vm, etc. You know, just to test the boundary and to see if there is a better setup. For each experiment I create a new node to make sure past history/reputation has no effect on the result. I let it run for a day or two and then discard it.

So many nodes die by my hands. I am sorry that confuses you.

DD7 · November 9, 2022, 2:52am

Just curious. How do you know the SNOs are here? You can some how match my ip with the ip of my node? That’s some serious breach of privacy, no?

BrightSilence · November 9, 2022, 3:19am

Had a similar issue caused by my router early on. Before we even had a dashboard to report these issues. I got disqualified, but luckily at that time it wasn’t permanent yet and I was able to recover after the router issue was resolved.

BrightSilence · November 9, 2022, 3:21am

You’re taking some massive leaps there. I’m pretty sure @Knowledge was just asking people here whether they lost nodes and why. No IP matching needed.

DD7 · November 9, 2022, 4:02am

@BrightSilence

@Knowledge was just asking… → correct
No ip matching needed → agreed

But that doesn’t answer my question: how does @Knowledge know they are here?

I’m sure there is a good explanation… right? @Knowledge ?

BrightSilence · November 9, 2022, 4:50am

Cause there are lots of node operators here? Many of whom may have lost nodes… I know they are here. Don’t need any inside info for that.

Knowledge · November 9, 2022, 5:15am

Obviously, users on the forum have over the years I have been here talked about having their nodes fail and or starting over. How extensive that is and how common is why I ask. I have restarted nodes several times over the years for one reason or another.

I don’t have your IP address. I’m not an engineer. I’m a Community Admin. I don’t have access to any of the data that Storj Labs has.

Even if I had that data, which I don’t, there would be no reason to link users to nodes to ask the question I asked. If you’ve been on the forum for any length of time you would.know that there are SNO’s who have shut down their nodes, exited, and/or replaced them.

Lastly, if you are concerned about someone linking your forum account to your node address, I would suggest maybe using a VPN. As I realize for some, no amount of me telling you I don’t have access to such data would reassure you. It would be best then that you manage your own security rather than ask me to explain the dots you are trying to connect.

jammerdan · November 9, 2022, 5:32am

I have lost nodes due to drive failures and own incompetence.

ZBS · November 9, 2022, 6:54am

i lost an 8TB node due to the covid restriction…

I own a lot of nodes over different locations, one of them fail for power outrage but the office was closed because of lockdown and when i tried to go there with my car, the police force me to go home

I watched this node get suspended and then disqualified day by day without the ability to do anything, not even the node-exit.

I think a lot of operators face some problems like this in the past year.

jorma · November 9, 2022, 6:59am

My first node got disqualified after 9 months. It was a 5TB node, 100% filled with data.
In my opinion it was a bug that caused nodes not to write any failed audits into the log file under special circumstances. I remember that this bug got discussed in another topic shortly after describing my problems here: 100% Disk workload without any traffic - #22 by jorma

I did provide the log files in my forum topic, but to my dissatisfaction the answer I got from Alexey was that it’s not possible that I have log files after reinstalling the node. He didn’t even look at my log files (link had no clicks at the moment of his answer), he must have been really sure that I’m an idiot (which Iam sometimes, but I’m also a fully certified Linux administrator) haha
He must have completely underestimated my abilities of creating a copy of the log files before deleting the disqualified node.

I then started from scratch and expanded to more nodes with more space. I’m now with 6 nodes and a total of 80TB available space, which filles up very slowly.
The average income after roughly 2 years of running them is $50/month, the electricity is about $20 a month. Money wise this only makes sense if I have to deal with it less than 30 minutes per month. Just by reading the payment model topic this morning I already spend more time.

My total investment for Hardware was about $6000, and I don’t even managed to do a RAID. I planned to secure my nodes with a raid once they made the money to buy the missing hardware for it. But I guess this never will happen.

At this moment I consider stopping my nodes and sell the hardware because I will sell my apartment and start travelling. I would have the option to run my nodes in another place which is about 2000km from where I life, but I kind of doubt that its worth the trouble to move them.

DD7 · November 9, 2022, 7:03am

Relax… I know you are good. Appreciate what you have done and I’m grateful.

Roberto · November 9, 2022, 7:11am

I lost my first node after 7 months, created in April 2021, due to a disk break and due to inexperience / incompetence I was unable to recover it. I now have 2 nodes

peter_linder · November 9, 2022, 8:05am

I have lost 3 nodes, 2 from disk failure and 1 due to operator (me…) mistake.