Is the Backup of a Node (and its contents) reqired?

i took lot of time to get it, old nodes getimg much more egress than new ones, during test, now is all flat.

My node has been around since the last network wipe (and long before) and I have 4.6TB as well, just like @Pentium100, who’s node I’m pretty sure has also been around since long before the last network wipe. So it seems you clearly got more data.

In some point about a month i used Proxies to nodes, it is posible that significant amount of data i got there.
But i got strongly minus in money, as it is expencive configuration and i stoped to use it.

False alarm everyone. I verified in the code that node selection is done in the right way and every /24 subnet receives the same amount of data. Differences must have come from the temporary proxy setup.

More info in this post:

You loose some files(all recent) if current data folder is replaced by outdated backup version. Then yes - you will loose some most recent files (uploaded after backup was made).
But what if you merge backup with current folder? Simplest way is just writing over it.
It will overwrite files which exits both in backup and in current(possible damaged) data, add missing files from backup and keeps files creates after backup was made. It also add some unnecessary data (like files deleted after last backup) but it not a big deal - just some harmless junk.

It will not help in case of total data loss (like complete HDD failure due to dead motor for example ou burnaout controller) as backup is all you have in such case. But can help in case of minor data damages / corruptions like bad sectors on HDD or transient software/hardware glitches.

It s not true. Increasing/decreasing HDD space 2 folds do NOT increase/reduce your income potential 2 folds. Because most of the income supposed to flow from traffic/bandwidth used not from HDD space rented (compare 1.5 USD per TB of HDD space used for month with 10-20 USD per TB of egress traffic served).
Reducing available space 2 folds sure affect potential income but much less than 2 times. And it start actually affect income only after node is already full (used all allocated space). Which takes a lot of time if we talking about large volumes of data. It will take many months from node creation to fill even small RAID arrays if SNO shoose bundle up all disks in same machine into RAID instead of running separate node per each HDD.

Also only most primitive raid 1 scheme reduce available space 2 folds. RAID 5 with 3 or 4 disk will reduce space only by 1/3 or 1/4. Eg 3x4 TB disks in RAID5 gives 2x4 = 8 TB of usable space instead of 12 TB without RAID. So we actually should compare about 25%-33% space “loss” and may be about 10-15% of total income (composed from payment from space + bandwidth) loss due to it AFTER node is already full vs risk of losing all accumulated escrow.

Actually you still can call GE in such situation. Satellite will mark you failed it due to errors in data / missing of some data but still gets all data remaining from your node transferred to other nodes .
So its a win-win situation for a network as it get both at same time: all the data is still usable (not lost/damaged) and your escrow.

Network will loose data and may encounter need of significant repair only if you surely know in advance that you already lost/damaged some of stored data, so there is no chance for you to finish GE with succes and because of that instead of starting GE just shut down node and wipe data stored without waste of time and bandwidth for GE.
But if you are “good fellow” and nonetheless still want to support network you still can call GE and satellite will gladly collect all undamaged data from your node.

You can’t be so certain either way. It’s fairly reasonable to think that if you hold twice as much data that data serves twice as much egress traffic as well and since you’re not paid for ingress which would not be correlated with storage, the assumption of it scaling 1:1 is actually pretty fair. Unless it turns out recently uploaded data is downloaded more often. (like it was during some of the tests that were in no way representative of actual customers usage)

2 Likes

Well, if I didn’t think the data was damaged, I probably would not start GE. If I start failing audits or notice bad sectors in the hard drive, I would start GE, but, since the requirement for GE is even more strict than normal (you can fail at least a few audits and not be disqualified, but a single missing file and GE fails), my options are to keep the node running and hope that it does not get disqualified (not enough failed audits) or just shut it down and create a new node. There is no reason to start GE, I might as well start the new node right now so it can start the vetting process immediately.

1 Like

What if let’s say the OS drive fails or for some reason the identity files get lost. A backup of the identity files would be viable to reinstall the OS and setup storj on a new drive and transfer over the identity files from backup?

You can also store the identity files on the same drive as the node data.

1 Like

That’s what I’m doing currently. From my understanding, if the drive with the data fails, there’s no point in having a backup of the identity files anyways.

1 Like

Drive failures come in many varieties. Some make all data unavailable, but some only hit a small number of files, making the rest recoverable. In the latter case if the failure hits identity files, the node is lost despite that there’s enough data to recover the node.

Identity files are small. It doesn’t hurt to back them up.

1 Like

How would you be able to tell if enough data is recoverable?

Standard data recovery practices apply. You just try to copy data using a data recovery tool like ddrescue, or whatever tool you are familiar with.