Expected Disk and Bandwith utilization

Hello Storj Community

I am currently thinking about expanding my storage capacity. Currently i am providing 2TB of space for the storj network and i would like to expand this to 50 - 150 TB. I have connections to a hoster that can provide me with high volume dedicated storage servers. For this to smoothly run i will have to invest about $2 per TB into operational costs.

1 TB used space over a month will give me $1.50, so i would also need a little bit of egress to break even.

I wanted to ask how your experiences with storj have been. My 2TB Node is filling up nicely but i don’t know what will be when i am providing 50 - 150 TB of available space. What can i expect the average disk and bandwith utilization to be?

Also how are you handling data redundancy? Do you recommend a RAID 5 setup? What would happen if there was no RAID setup and the data is lost?

Thank you for your help.

1 Like

Random. Pretty much if Storj decides to to a test, I’ll get more egress. If not, then no.

Used space:

The sharp drop to zero was a network wipe. The gaps in the graph are because I managed to screw up monitoring for a while and did not notice it.

Most of it is test data.

ZFS raidz2 (raid6 equivalent).

Your node would get disqualified and you would lose the escrow money.

2 Likes

Thank you very much for your insights :slight_smile:

at 2$ pr TB and then maybe some bandwidth and other related stuff on top, but lets say 2$ then still it gives you a 33% if you receive data mainly for storage, which might not be accessed before it’s just deleted again… ofc it would be nice if one could rely on income from downloads… which at the moment is at 5% of stored capacity, so thats an additional 5% of 20$ pr stored TB.
so 1$ making the total of earning per TB 2.5$ a month.

this makes your best current profit at 25cents and potential loss of 50cents a month pr TB
and with the activity being test mostly currently, the i would say it’s kinda risky… ofc it might be better than buying the hardware for it…

and you ofc also have to take into account that this profit takes into account a graceful exit after a 15 month period to be 100% putting your setup even more at risk for market prices changing…
you cannot rely on that the 1.5$ will be permanent, it will vary with the development of storage, which is as anything else always changing…

so yeah it might be a profitable idea or just a potential loss, lets say you run your node for 6 months and then decided to quit, it will barely have even started to run right…
and at 50 to 150TB rented space, then you will have risked somewhere in the 600 to 1800$ range. of which your node might offset a little bit, but because storj is still new… who knows.
it might have made you at best double or tripled your money, but it might as well be at a loss of 90% because the node and storj hasn’t had time to really get started.

and hey at 2$ pr TB i think many inhere will rent you storage… xD after all we are here to rent it out at maybe 1.5$ hehe

By the way, I always kept some free space on my node - when the free space got low, I just increased the virtual disk size and still only have 7TB or so. I have no idea if a 50TB node would ever be filled.

Remember - Storj wants the data to be distributed, so for your node to have 50TB, a lot of other nodes would have to have 50TB as well. Also, having a bigger node does not mean that you will get more data. I have asked a few times about this and got the same answer from Storj - as long as your node has “enough” free space, you get the same chance to be chosen to store a new file - it does not matter if you have 20GB or 20TB free.

2 Likes

have been wondering about how that distribution did work, so that’s nice to know.
going to expand my node soon to 36TB or so… got a bit more than a week before i run out at present speed.

will be very interesting to see if it slows down before it reaches the end of that, egress has been pretty weak thus far tho… but i’m not relying on having egress at all to make it profitable…

Best advise I can give is to expand your node only when it’s getting full. I normally don’t advise RAID, but a setup the size you’re suggesting would get messy if you run separate nodes on separate disks. And having the ability to expand an array instead of having to spin up new nodes all the time removes a lot of complexity and processing overhead. You are probably looking at a RAID6 setup or similar. Perhaps upgrade to RAID60 later on. But don’t start with 50TB available right away. That will take years to fill up.

getting like 250gb a day and still rising, about a week out from filling up my current node… so i just figured i would buy myself a few months before having to deal with it again… i’m running ZFS so its all just located on a pool which i can add or remove vdevs (arrays) to as i see needed…
so i don’t have to deal with the whole moving it around… the file system handles that, ofc i believe i will need to advise the system that i want to remove an array before i do so xD obviously
so it can have a chance of moving the data to another array.
anyways 36tb was the next logical step in my zfs progression… i run raidz1 for performance reasons and to make resilvering (rebuilding / reconstructing) arrays less demanding on the system.
also only working with 5 disk pr vdev (array) at present, so it’s not easy to use more than 1 drive for redundancy, but next step i hope to go for 10 with 1 or 2 redundant…

if i don’t experience any real issues then i might just go for 1, and mainly use it for node data…
i know 36tb is a big node, but i’ve been thinking about starting to build a small home data center in a while, so perfect reason to do so… xD

if my current download speed keeps then the 36tb would fill in 3 months, which is already 3 months less than i had hoped when i planned it… but we will see.
also the space will be used by other stuff… but using 36tb is tricky lol

I don’t think you can remove raidz vdevs yet, though recently they added the ability to remove single-drive and mirror vdevs, raid10 for a node is probably overkill

not raid10, but 10 drives with 1 or 2 redundant :smiley: basically raid5 or 6 depending on how my adventure with zfs goes… just want to have so i don’t get into trouble when a disk breaks, because they will when one starts getting up at running 10-20+ drives.

pretty sure you can remove vdevs, but one has to balance the pool or something… to sort of empty the vdev the data is located on first… but i duno… i’m still very green in zfs…
barely know how to maintain it… but did learn how to check the array / pool … lol only took me 6 weeks… got my redundant disk back no issues reported from resilvering it nor since… so think i’m good… i don’t need to be able to remove the vdev’s either… would be cool if i could, but i plan on getting a 36 3.5" bay DAS to hook up over sff 8088 i think its called the external SAS.
then it doesn’t really matter if i filled most of the bays in the host and i can just migrate it to a new pool by then if i need to reevaluate my setup.
running 5 drives with 1 redundant is wasteful… imo 20% is a lot, especially since it’s old drives so 10% to 512 sector overhead and so 30% with redundancy… while with 10 4kn drives i could get 3% and 10% so 13… that would be nice… but kinda running a bit of a getto setup… so got to work with what i can afford. xD

Thank you all for your help and information. I think i will start with a 40TB Server in RAID 5, that would give me 30TB in available space. I will try and migrate my old node to the new server so i can start out with 2TB almost filled already and hope the current inflow of data is continuing :slight_smile:

Very cool community, must say :smiley:

Please, do not use RAID5 with big consumers disks with 1 bit failure on 10^14, you can lose a whole array when you will rebuild it after a one disk failure.
Consider to use RAID6 instead.

1 Like

how do you feel about raidz1?
i mean it shouldn’t be prone to bitrot like raid5 right… sort of the best of both worlds… even if its kinda a huge spaceship to run… lol

raidz1 is the same as raid5, though if you get a bad sector on a rebuild it probably will be acceptable (you might fail one audit but that’s it), unless the bad sector is in the metadata etc.

With zfs it used to be impssible to remove vdevs at all, but with version 0.8.0 of ZoL they made ti possible to remove mirror vdevs and single drive vdevs. Removing raidz vdevs is still not possible.

I am using a 6 drive raidz2 for my node.

not exactly, raid 5 will not catch all error and thus can easily corrupt the 50% of the array if not worse, raidz1 doesn’t have that issue, but yeah i’m not a long time zfs user, but i just hate performance penalties so i went with raidz1… also heard resilvering is faster… sure didn’t take to long, it was done resilvering my drive in 1½ hour, which is odd, because i think i lost it when i shuffled the drives around in the bays, which would mean it hasn’t been connected while i’ve been running my storagenode on the array…

thus it should be 3.2tb split over 4 drives and then parity … so like 800gb data it should have been behind, but it only resilvered 250gb… which was kinda odd…
but it say it’s all okay, so i guess ill just have to monitor if it starts throwing errors again.

i duno more about zfs than i seen in some plentitude of lectures by the developers and enthusiasts.
i just seem to remember some of them talking about that feature… but i duno… i know there was one guy that switch drives in vdevs one by one to increase his array capacity, because he had run out of bays and space… lol said it was hell … and the main reason i keep 2 free bays in my 12 bays on the server… because zfs can be a bit of a heavy dance partner at times…
i also tried to take what advice i could from the guys that really knows their stuff… like Alan Jude.

it’s a massive undertaking learning zfs, and i barely got my toes in… so i’m sure i’m wrong more than i am right… also i mainly saw the freebsd zfs stuff… not sure if their features are ahead of the linux version… i just know i really like zfs its amazing. and i don’t mind running raidz1… until i change my mind lol from learning the hard way… xD

and long term i want to setup a cluster… then i will essentially end up having a parity server… so an array going corrupt is just a growth problem lol

There are multiple reasons for that, however, we were taling about this scenario:

  1. You have a radiz1 array
  2. One drive dies completely, you replace it with a new (and empty) drive.
  3. During resilvering, on of the other drives get an unrecoverable read error.

That data is lost. Not the entire array, but just what was in the sector that could not be read (since zfs has checksums it would know that the data got corrupted, but has no way of recovering it because the array is degraded).

With raidz2 you have two parity sectors, so if one drive fails completely and one other drive gets a read error, you can still recover the data using the remaining drives.

Yes, especially if the array is mostly empty as zfs only bothers resilvering the actual data and not empty space.

That can be done. You cannot shring the array though. For example, let’s say I have an array made from two vdevs, each vdev being 10 drives in raidz2. I can slowly replace 10 drives with bigger ones to get more space, but I cannot replace them with smaller ones or take 10 drives out entirely, that array will always have at least 20 drives as big or bigger than the original).

yeah but then with the raidz1 we come to the use case… worse case for me will be i loss my storage node… so its the age old question of risk management… i may in the future go to raidz2, but for now with as few drives as i got, it doesn’t make sense…

and like i said i got 2 bays free, i can always just put in 2x20tb drives and transfer all my array data and then rebuild from scratch, if need be… or move it to new …errr pools on a DAS when i get one.

other option is that when i add my new 5 drives, i add them in their own pool, transfer the array data … goddamit POOL data to the much larger pool and then use the 5 old drives as the first 5 in a 10 drive raidz1… and it being mainly for storj data, then its not like i got angry customers or bosses that will fire me, at worst i loose my node… and have to start over… if it was really important i would also go the extra mile… but 6 drives with 2 redundant… ouch my overhead… i really hope thats not 512 sector drives… else thats lets say 36tb -10% for partition overhead so 32.4tb and then 4/6 so like 20.16tb
thats like 43.3…% overhead…i got like 30% with my 5 drive…thats already pretty painful imo…