General beginner questions about the Node

dsimon24cx · September 15, 2021, 2:37pm

Hi! I’ve been running a StorJ node for a short time and have a few questions about it. I hope I haven’t overlooked them here in the community… Maybe you guys can help me.

When I restart my computer (Windows 10). Do I have to start the node manually or is it directly online?
Can I move the Node to another computer or expand the storage space by adding another hard drive?
What happens if I have the computer turned off for 2-3 days due to a move to another apartment. Am I then diequalified?
What happens if a hard disk is defective? Can I then obtain the data again?

Greetings, David

josh · September 15, 2021, 5:51pm

Hi David,

Hopefully I can answer your questions here, as a Windows Node Operator myself.

When I restart my computer (Windows 10). Do I have to start the node manually or is it directly online?

If you have installed the node as a Windows service, you can set the service Startup type to be automatic. This will ensure that the node is online after a reboot. The documentation for this can be found in the Windows Installation Instructions document.

Can I move the Node to another computer or expand the storage space by adding another hard drive?

Yes and yes! We have some excellent documentation here on how to migrate your node. You can expand the storage space by either migrating the data to a new disk with more storage space, or spanning the existing drive to an additional disk. You will then be able to specify your new storage space by editing the node’s config.yaml located at "C:\Program Files\Storj\Storage Node\". The value to modify is storage.allocated-disk-space, and if you have changed the drive path you will need to change the value for storage.path.

What happens if I have the computer turned off for 2-3 days due to a move to another apartment. Am I then disqualified?

The disqualification criteria are outlined in this document here, and you can find more information regarding the audit system here.

What happens if a hard disk is defective? Can I then obtain the data again?

You could attempt to repair the drive itself, however if the node data is lost you would need to create a new node using a different authorization token.

Hopefully this reply answers your questions, and please see our FAQ for more details!

Cheers,
Josh

Alexey · September 15, 2021, 6:28pm

Hello @dsimon24cx ,
Welcome to the forum!

Please, never do that. With one disk failure the whole node is lost. Run two (3, 4, 5, …) nodes instead.
The RAID5 is not safe with todays disks too:

See also RAID vs No RAID choice

and a new identity: Step 5. Create an Identity - Storj Docs
Do not copy the existing one - it will be disqualified in seconds!

bereznik · October 30, 2021, 10:14pm

@Alexey
Thanks for explanation but there are couple of things which is confusing me as I’m also new in platform but very exited with idea.
First of all let me describe my prepositions. According to documentation there is no need in any backups/RAIDs as it’s done at application level of storj (erasure coding). It’s perfectly make sense but from that we can make conclusion that individual HDD outage is expected behavior.
Please correct me if I’m wrong but doesn’t disqualifying of identity means that Node Operator:

Will not be able to receive payment for work which already had been done
Will not be able to make Graceful Exit (it’s obvious that no, but I just want to emphasize my point)
Will loose all node reputation
Basically will have to start from scratch and next 9 month will have payment cutouts

As a result we have Node Operator with same capabilities (replace a hard drive it’s a not big deal and can take from couple of minutes to couple of days) same storage amount, same bandwidth, same online score but starting from scratch with ~9 month of on-boarding period.

It’s only my point of view but such approach is seems bad from both Node Operator who is not able to provide Storage/Bandwidth as well as Consumer who is not able to utilize Storage/Bandwith. Of course disqualification of one Operator will not have any impact on particular consumer but HDD will always break and will affects more and more Operators which at some point may affect whole platform. It also may lead to demoralizing operators and leads to operators leaving platform.

Does it make sense to set Reputation, Vetting, Pay-cuts at higher level like IP or domain name. Because in other case setting up RAIDs seems only available solution which is obviously will neglect whole idea of platform and affects economic of storage operators.

Alexey · October 31, 2021, 7:40am

Yes, the individual HDD and related node can fail with 2% probability accordingly to global stat, see Simple adding of more Drives to a node - #9 by SGC
If the node is disqualified on one satellite, it will not receive any payments from it except postponed, if the node is eligible to receive a payout accordingly Minimum Payout Threshold on L1 or if it has zkSync enabled. The held amount on that satellite would be used to recover lost data. The disqualified node will be blacklisted by that satellite, thus Graceful Exit is impossible for that satellite. However, you can call the GE from others if your node is eligible to call it (it should be older than 6 months on the satellite).
Each satellite pays independently, so if your node is not disqualified for the rest, it will continue to receive payments for its services from remained satellites.
If node is disqualified on all satellites, then you cannot use this identity anymore, so if you decided to continue - you will need a new identity, new authorization token and clean healthy storage.

If you have several HDD, you can run several nodes. If they are behind the same /24 subnet of public IPs, they will act as a one big node: the ingress will be shared between them (because we selects only one node from the subnet for the one piece of the one segment of the customers’ data), thus your nodes will act as a RAID on the network level.
If one disk fail - you will lose only that part of your total shared space.
In case of traditional RAID with today’s disks you have a high probability to lost everything during rebuild after the disk failure. See

See also RAID vs No RAID choice for rationales.

bereznik · November 2, 2021, 8:53pm

Ok, I think I got it. So for instance if I have 3 node under the same IP and 1 out of 3 fail it will just increase Ingress to 2 nodes which are still working.
But I’m still confused about reputation. If I understood correctly main factors for reputation is node availability and node tenure. As a Node Operator I can work with availability (backup power, backup internet channel etc.) But at some day I will loose all may reputation regardless my efforts into availability just because disk is died and I haven’t any control on it.
What I’m trying to say is that vetting period is perfectly fine for newcomers because system has to know who is who, but why Node Operator who is providing an enough level of availability during for instance last 1 year and system already has some knowledge about him has to pass vetting period again especially putting in attention that system is designed to be resistant to loose of data.

Alexey · November 2, 2021, 9:15pm

The main part of the reputation is data integrity and availability.
The data integrity is more important than availability. Your node should be offline more than a month to be disqualified. The audit score is a different story - it could be lost in a few hours (in case of catastrophic data corruption) or minutes (in case of running several nodes with the same identity or with empty storage, if it were used before).

Satellites vetting the node (connectivity, reliability, retrievability), not the Operator.
Each node works independently, so should be checked independently.

Pentium100 · November 2, 2021, 9:20pm

You can use RAID to protect from HDD failure. This has been discussed a lot here whether you should or should not use RAID, but you can do it. Sadly, it is not possible to back up the node (maybe if the backups are done every minute or so, but definitely not normally)

The vetting process is part of that system. A node can run for a long time due to luck and not necessarily because of the skills of the operator. Also, a disqualified node means that the operator could not keep it running, right? So yeah, another vetting period.
Also, while the system is resistant to data loss, repair (rebuilding the pieces that were in the disqualified node and distributing them to other nodes) costs money and the held amount may not be enough. For example, I doubt the $90 held amount of my node would be enough to repair 19TB of data if my node got disqualified.

Pentium100 · November 2, 2021, 9:22pm

or because the system froze in a weird way and the node operator was sleeping at the time instead of monitoring the audit score.