Storj Vetting & Operation - Multinode - SingleIP

geeksheikh · January 22, 2021, 1:15pm

Hello,

I have been working with several members in the unraid forums using the V3 docker and there were several questions that came up, perhaps someone here can help us answer. Several folks running unraid have 10s maybe 100s of TBs to loan but the following answers will help us setup optimal parameters for running storj nodes inside unraid.

All of the following questions assume a single public IP and all storagenodes are running inside docker containers V3 beta.

When running multiple nodes does the vetting process take longer?
Based on this post it seems much more efficient to run, for example, 5 8TB storage nodes than it is to run 2 20TB nodes, is this correct?
How can one determine the count of audits completed per satellite (vetting process)? I ask because I can see 100% everywhere but I understand that I need 100 successful audits from all satellites to be fully vetted. This will also help me track / confirm the answer to #1.

Finally, if you had 100TB to loan what via docker containers and a single IP how do you suggest one set it up for optimal utilization?

Thank you.

cc @CryptoPumpkin

Pac · January 22, 2021, 1:48pm

I believe you should use the “latest” tag now.

Yes. Not necessarily as much as the number of nodes, but still (e.g. 3 vetting nodes won’t take 3 times as much time as a single vetting node).

The post explains that the more available data storage you have, the longer it will take to fill up a node in a non-linear fashion. You should copy the “more realistic earnings estimator” linked in the post you cited to your google account to fill in your numbers and see for yourself.

Having 5x8TB or 2x20TB or even 1x40TB won’t change a thing to this regards, as long as they all are within the same /24 subnet (and even more so behind the same IP) because they are all going to treated as a “big single node” for ingress.
(This doesn’t apply to egress which only depends on the quantity of data you’re storing)

This said, having several nodes scattered among several HDD is always a better option as it mitigates the risks of losing nodes if some HDDs fail, whereas having only one huge HDD could make you lose everything if it were to fail.

Vetting progress is not shown on the web dashboard.
One needs to run some technical commands to uncover them.
See this for instance:

As of today, 100TB would never ever be filled up even after 20 years.
The best approach is to start one first node with say an 8TB disk, and wait for this node to be nearly full.
In the mean time, you may want to start another small node (500GB) so it gets vetted and is ready for the future.

Then, when your main node is nearly full, start a new one with another 8TB disk (or expand your small 500GB which would now be fully vetted and ready - and start another new one for “incubation”).

And so on. By doing so, and if usage of the Tardigrade network stays roughly as it has been for the past months, it would already take around 7 to 8 years to fill up 48TB (6x8TB).
It would also be better from an electricity cost point of view.

TheMightyGreek · January 22, 2021, 1:54pm

Yes, vetting time is, from what i understand, linearly dependent on the number of nodes behind the same IP.

It won’t change anything if the nodes are in the same subnet because the nodes will share the ingress traffic.

Run this command in your linux terminal:
for sat in wget -qO - localhost:14002/api/sno | jq .satellites[].id -r; do wget -qO - localhost:14002/api/sno/satellite/$sat | jq .id,.audit; done
change the port if your node is on a port other than 14002

I’d just make one node if the disks are already setup with redundancy. Otherwise make one node per HDD (that last one is my opinion but people have different preferences).

Also all the questions you asked have been answered multiple times on the forum so next time please do use the magic of the research function

EDIT: you beat me to it @Pac haha

tylkomat · January 22, 2021, 2:18pm

Since your nodes is treated like a single node during the vetting process you receive 5% of the traffic to all nodes combined. Let’s say your start 5 nodes at the same time then these 5% traffic are split to the 5 nodes that your are running. Which means each node only receives 1% of the traffic directed to your IP (if it is evenly distributed, which I don’t know).

When you only let one node vet at a time you can benefit from the whole 5%. When the node is vetted it will receive 100% traffic and you earn money faster.

The question is now when I start a second node on the same IP is that traffic then split 95% for the vetted node and 5% for the unvetted one?

Pac · January 22, 2021, 2:29pm

@tylkomat @TheMightyGreek:

I believe vetting doesn’t work exactly like that.
As far as I can remember, on the whole network, 95% of ingress requests get sent to vetted nodes. And 5% to unvetted nodes. Your vetting nodes don’t receive 5% of what your vetted nodes receive.

Which means that, you could even end up in a weird situation where your vetting node gets more ingress than your vetted nodes if during your vetting period the number of vetting nodes in the whole Tardigrade network is very low. Very unlikely, but possible.

More technical insights on how it works and what they are planning to do on this matter:

github.com

storj/storj/blob/12fa50569280c8a82b099617f3bd779efd5c84ee/docs/blueprints/audit-v2.md

# Auditing V2: Random Node Selection

## Abstract

This design document describes auditing based on reservoir sampling segments per node.

## Background

As our network grows, it will take longer for nodes to get vetted.
This is because every time an upload happens, we send 5% of the uploaded data to unvetted nodes and 95% to vetted nodes.
Currently, we select a random stripe from a random segment for audits.
This correlates with auditing per byte. This means we are less likely to audit an unvetted node because only 5% gets uploaded to unvetted nodes.
It will become exponentially less likely that an unvetted node will be audited.

With a satellite with one petabyte of data, new nodes will take one month to get vetted.
However, with 12PB of data on the network, this vetting process time would take 12 months, which is much too long.
We need a scalable approach.

We want a way to select segments to audit based such that every node has an equal likelihood to be audited.

This file has been truncated. show original

Also, even though the vetting process takes longer if you have several nodes being vetted, I remember reading somewhere that they tweaked this process so it’s not exactly linearly dependent on the number of nodes being vetted. It’s faster than that.

~~Can’t find back posts explaining all that though~~
EDIT: Found it, see my post below.

Pac · January 22, 2021, 2:36pm

Aaah there it is @tylkomat & @TheMightyGreek:

BrightSilence · January 22, 2021, 3:31pm

Thanks to already excellent responses from others in this topic, I have very little to add. Just some small remarks.

@Pac already answered this, but I want to add a caveat that my post you are linking to contains outdated information. There seemed to be a theoretical limit of about 40TB at the time. Things have changed since then. Ingress has gone down a bit, but deletes have gone down even more. To the point where the theoretical limit has gone up to about 100TB, but is no longer relevant as it would take literally several decades to get there if ever. Please refer to the sheet in the top post instead to get an indication of what to expect in the first 10 years (if traffic patterns stay exactly the same, which they won’t). I try to keep that sheet up to date with recent traffic patterns.

While we’re linking to my stuff, I recommend this script. It will show earnings and among other things the vetting process on each satellite. If you have further questions regarding this script you can post in that topic.

One last remark on @Pac’s last post. The linked post refers to a priority auditing that’s done for nodes that haven’t been fully vetted yet. This indeed makes it not linear to the number of nodes in vetting. But keep in mind that this doesn’t cover all audits. You still get audited through the normal process as well based on random piece selection. Additionally, even this priority audit process for unvetted nodes also audits all other nodes that hold pieces for the same segment. So while this process makes sure there is a minimum number of audits for all unvetted nodes, it is quickly overtaken by random audits as soon as you get more data. All those normal audits do scale linearly with the number of nodes in vetting.

This leads to 2 options. Vet as many at once to get them vetted a little faster, or vet them one by one. One by one has the vetting process take a little longer, but as long as you also have fully vetted nodes with free space, this could actually be a slight advantage as traffic for unvetted nodes isn’t shared with traffic for vetted nodes. So you can get slightly more on one IP if you have nodes in vetting and vetted. In general, this difference is probably not worth jumping through hoops for. So choose whatever makes managing your nodes easier on your end.

geeksheikh · January 22, 2021, 4:46pm

Wow, you guys are amazing. I can’t believe the responses in such a short time, thank you so much!

I do have parity drives, and the 24TB node running now is made up of 6 disks. I also have 1Gbps up/down and unlimited bandwidth.

Awesome! Thanks, will switch on next upgrade, looks like both images are the same ATM

So is there any egress benefit to having multiple nodes? As per your statement seems like egress is a function of data present, thus whether it’s on 1 node or 10, doesn’t really matter…agree?

Sounds like we need a bigger sales/marketing team Perhaps this bull run will help tardigrade be more widely adopted

I did look around quite a bit and search the forums but I did not see answers that I thought were current or I didn’t see the answer. I try to not clutter up forums with redundancy

I did use this script, thank you. I actually found it before this post but it still only shows %vetted which is actually sufficient assuming 100 successful audits are still the number required.

That said, I ran the script successfully and my nodes have been up 24/7 since Jan 16 and the vetted status is between 0-2% across satellite. I’m still only running a single node. This seems VERY slow, at this rate it will take >1 yr for all satellites to vet my server. I’m pasting a screenshot below. Any thoughts?

Given all of the discussion and my setup, it seems that a single node with 24TB is sufficient for quite some time (especially if it never gets vetted). My disks are optimized for sleep with a cache drive in front plus parities for redundancy…as such, 1 24TB will get vetted and filled up as quickly as possible and egress will service the requests as they come once data is hosted, right?

Thanks again to everyone for the great feedback and assistance.

BrightSilence · January 22, 2021, 5:00pm

Yep, you got it exactly right.

Keep in mind that they aim for balanced growth. So with more customers come more node operators. And if your large node would fill in a year that would also mean the network has only about a year of additional space. That would be a dangerous level to be at. Don’t expect this to suddenly change a lot. It may go up, but it won’t be x10.

It is. So the percentage is the same as the count. It’s currently hard coded at 100. Don’t think that’s likely to change, but if it does I will update the calculator accordingly.

Yep, currently you mostly see audits for unvetted nodes. The more data you get, the more normal audits you will also get. Vetting speeds up exponentially over time because of this. Nothing to worry about. It takes on average about a month. But the first week you only get a couple. What you’re seeing is normal, it likely will still take about a month only.

If you mean to make your drives spin down. I would highly advise against that. Read caching is of limited use as download patterns are quite random. It mostly helps with the database access. If your drives spin down they will definitely time out downloads due to spin up time. But realistically they will soon no longer get the chance to spin down because of the constant access.

geeksheikh · January 22, 2021, 5:23pm

Thanks again for all your feedback – very helpful

I have a 2TB NVME device in front of the entire array. The mounted volume write to NVME first and it then gets replicated down to the disks on a schedule. From the mount, all data appears in a single location so anything reading/writing to/from the mount doesn’t need to worry about where the data is physically located, as it’s irrelevant to the reader/writer.

Regarding spin up times – 100% irrelevant on write as write are completed on the nvme, furthermore all of the identify and db files are maintained ONLY on the NVME drive. The disks backing the mount are configured in such a way that all data will go to a single drive first, fill that drive up and then proceed to the next drive.

You may be correct on not spinning down the first disk that’s going through ingress right now as it will have spin up time on egress. Is there an egress timeout? If so, is it longer than spin-up time or no?

Thanks again.

Pac · January 22, 2021, 5:29pm

There is a “natural” egress timeout: Whenever a client requests a segment of data, corresponding pieces are queried from nodes (29 which is the minimum number of pieces for rebuilding a segment + some more to be safe).
All these nodes are then concurrently trying to retrieve these pieces as fast as possible.

The first 29 nodes to successfully send their pieces win the race. Other transfers are cancelled (~~and do not get paid for this egress query~~ EDIT: we in fact get paid for what got partially sent, see @BrightSilence’s post below).

If one the nodes needs to spin up its disk to query the piece, it will surely lose the race.

BrightSilence · January 22, 2021, 5:33pm

Download in Storj terms means a download by the tardigrade customer from the network. That would be a read on your end. This btw is also what makes you the most money. As soon as your cache is exceeded those spin up times will matter to pieces requested from one of the HDD’s.

@Pac addressed the time out. If enough other nodes are faster than you, your download will be canceled. So you want to be among the faster nodes to make money from downloads.

nerdatwork · January 22, 2021, 5:39pm

If I remember correctly, whatever bandwidth you send for that query you get paid even when your transfer is canceled. Example, 1 MB was requested and 100KB was uploaded then transfer was canceled then that 100KB is paid.
(Correct me if I am wrong)

Pac · January 22, 2021, 5:41pm

Ah really? I thought it wasn’t, but maybe you’re right.
Does anyone has the answer to that?

BrightSilence · January 22, 2021, 5:43pm

Yes you do get paid for partial transfers, but if the spin up time is the problem it’ll be cancelled before it even starts transferring in most cases.

geeksheikh · January 22, 2021, 5:44pm

Got it, thanks again all – I have disabled spin-down on the first drive – once the second drive begins receiving data I will disable spin down on it as well and so forth as the array fills up.

peem · January 22, 2021, 5:46pm

Is the JBOD array? or something else?

kevink · January 22, 2021, 6:55pm

constant spin down is bad for the drive health anyway. I would advise against it.

fmoledina · January 22, 2021, 9:10pm

Having used Unraid in the past, I would advise against using the Unraid cache for Storj data. I would recommend having a dedicated share for your node and set Cache=No in the share settings. Unraid’s cache and Mover system is likely okay for large files, the Mover run may choke on the sheer amount of small files that your node will generate on a regular basis, which will also translate to heavy parity activity during this time. Furthermore, once you’re storing data, ideally you’re able to serve up that data on demand and get paid for egress. Having to spin up drives in order to win egress races is going to be detrimental to node performance and drive health. Might as well write the data straight to the array and have it available to read from there without spinning down drives.

For others not familiar, the Unraid community is obsessed with drive spindown.

kevink · January 22, 2021, 9:40pm

Thanks for the info. Why though? There’s not a single advantage besides electricity costs.