I wish we had a way to completely recover nodes from database corruption or minor disk failure. Maybe a process to restore db from the satellites (why do we even need it on nodes?), hash check all the stored pieces and send the data to the satellite?
Then apply some non-lethal penalty proportional to amount of data lost and get done with it.
I feel like everyone will benefit from it - satellites will not have to rebuild the data and pay for repair bw, SNOs will not get their nodes disqualified after an unfortunate accident, SNOs get an incentive to report data loss right away and thus network is never late to reconstruct and redistribute the data.
I suggested that a few times. I guess implementing it would be too difficult or not worth it. Which makes the official recommendation of not using RAID even more strange.
You can use the RAID, however, the simple setup: one storagenode - one HDD is much simpler for everyone. You do not need to learn something or spent money.
But - if you want to, you can use a RAID, we just can’t recommend it, because it’s more expensive for the setup and support.
I guess I am used to different types of recommendations an requirements.
In this case, however, we have:
No way to back up the data.
Pretty much no tolerance for data loss.
No “partial graceful exit” - one bad sector in a 10TB drive during it and you’re done. Even if normally one failed audit would not disqualify the node.
Hard drives that do occasionally develop bad sectors even if they do not just fail completely.
So, the combination of the recommendations and the requirements is setting up SNOs for failure. The single-drive setups will either be disqualified during audits, or, if they somehow manage to last 16 months or whatever the requirement is, fail during graceful exit.
This is a rather controversial statement I haven’t seen before, but the fact seems to me - It’s most profitable for Storj for us to follow those recommendations. It also happens to be the easiest and cheapest setup for SNOs to follow. So long as the network performs up to par. The network does the repairs, and StorJ reaps the benefit of collecting escrow from prematurely failed nodes.
It’s a tough balance to deter cheaters & encourage good behavior. Though i don’t really know what a better answer would be.
Unfortunately Storj do not have any benefit from failed nodes, the escrow is not enough to cover the cost of repair:
The reason for suggestion to have one node per HDD is different - we want to have a lot of SNO, and it’s easy for most of users just setup a one storagenode per HDD.
And yet, if there is a problem and I lose 100MB out of 5TB, I cannot do graceful exit and upload the remaining data back to the network. I have to just shut down the node and allow the network to repair the whole 5TB. If I don’t shut down the node, I may just be disqualified later with the same results - network repairing the whole 5TB.
It is the same with backups. Let’s say I run a node on a single drive, but back ip up daily and have 5TB of data. The drive fails and I cannot just upload the data from my day-old backup and instead the network just repairs the whole 5TB instead of whatever files were uploaded since the last backup.
I agree with wanting to have more SNOs, but it looks weird to me when the recommendation is for a simple setup, but the requirements are for datacenter-level reliability, which the simple setup is pretty much guaranteed to not have.
This is discussed many times already. The HDD can work without issues about of two-three years. Usually they can work longer. The timeframe is 15 months, most of HDD should survive.
However, it’s up on SNO to decide - run a one node per HDD or have a headache with RAID and HA setups.
We can help with both. But we can’t demand this from SNO
I guess that’s where my understanding of recommendations is different. The recommendation should include RAID, UPS etc, but if people want to cut some corners they can. If they do not get lucky and the drive fails, well, it’s on them.
Now if, say, I didn’t know any better and followed the official recommendations, then the drive failed (or some problem wiped out 100MB of files), I would feel like I was being punished for doing what I was told by the people who told me to do it.
Of course, anyone who knows any better would read the requirements and see that the recommended setup is not going to be enough, so they would prepare accordingly.
I tried to calculate the monetary consequences of early node disqualifications, but it depends on too many variables - time to dq, vetting period, amount of data stored and added every month, egress traffic activity, the way other nodes holding the erasure coded pieces left the network, etc. It varies a lot between extremely positive and negative numbers, so in the end I would say network neither gains nor loses significant amounts of money.
But if network does not benefit from taking escrow and wants as many users as possible, why there’s no simple way to protect nodes from disqualification?
One could run zfs or ceph - something that will verify data integrity upon read and have redundancy available to correct the error if needed. But this is way above a user with a raspberry pi in a closet, especially if people are not encouraged to use raid to keep things simple.
This applies to uptime dq as well, if you’re required to keep downtime below 5h/month, a single prolonged downtime has an ability to kill your node. And if storage nodes are expected to be run on residential connections in non-datacenter environment and managed by untrained people, not a team of professionals, such downtimes will inevitably happen.
The disqualification for downtime is currently disabled: Design draft: New way to measure SN uptimes
you can run zfs, ceph, cluster, RAID, if you know how and accept the costs of support.
We can’t suggest such setups to everyone. Even RAID is not a simple setup.
DQ for uptime is currently disabled, but it’s in the specs, so I expect it to be enabled when the monitoring works properly. And 5 hours is a bit on the low side if people expect to go to work etc and leave the node unattended for 9 or so hours (not every workplace allows you to take some time off to fix the node) or days when going on vacation.
The fact that graceful exit fails on a single damaged piece (so, even more strict than regular audits) does not match with the statement that repairing pieces costs more than the escrow. In that case wouldn’t you want to get as much data as possible from the damaged node?
All in all, the difference between the recommended setup and the expected reliability from that setup is what’s the main point here. With the recommended setup the expectations should be lower.
Yes expecting raid and many more advanced solutions from new non professionals is way to much to ask.
But let’s be reasonable most of us are somewhat professionaly involved in tech… This project isn’t aimed at non pros?
I don’t think a rpi with an USB attached drive is a good idea anyways…
Has anyone ever tried to backup a node and restore it?
I mean an image based agent less backup for a small node. Restore should not take that long and if the machine goes online again will the network accept its contents!
There needs to be a re integration routine… Is that planned for the final release??
It’s not enabled, but it’s present in requirements. It depends on the window size, but still, 5 hours a month sound terrible and basically guarantee that storage node will be lost during any prolonged outage. 60 hours a year is a lot better, but might not be enough if the idea is to keep nodes as long as possible.
But the main issue lies with data storage. Graceful exit is impossible if even a single bit of data is corrupted. Node failing audits will be disqualified. So, a node is essentially doomed if HDD storing it develops even a few bad blocks. SNO can’t do anything about it even if there’s a spare drive.
The Windows NT kernel supports mounting volumes on paths so drive letters are not an issue.
However, if you want unlimited expansion, I would suggest looking into ODROID-HC2 which is what I’m using. It’s effectively ~$70 for a single-bay NAS (cheaper per-bay than any NAS I can find) and it’s capable of running the storagenode software directly on it. It’s as expandable as far out as your power and Internet connection will allow – just buy more units and attach them to your network.
Wouldn’t the recovery of a node also considered donwtim. Lets assume i have a node that goes kaput but i have a backup in place that automatiallc detects said failure and virtualy boots the node via backup files (Veeam, Storagecraft and others can doo this and its fairly quick)
The node would go online again in lets say 3 hours (rebuild time?). In this scenario nothing would have happened to the files on the note and it would rejon the network?
Downtime = no penalty atm!. but in production the node operator gets punished when the node is offline for more than 5H/Month?
Data corruption = node operator gets punished… restart from scratch…
Then again … the harddrive failures are more common than hardware or VM malfunctions
The restored node will not have files uploaded in step 3 and will be disqualified for failing audits.
Yep, both requirements imply datacenter-level reliability (with no “scheduled downtime” or “our server blew up, here are the backups from this night” options as well).