Bandwidth utilization comparison thread

kevink · October 14, 2020, 6:44pm

Hardware usually doesn’t care about 24/7 if the environment isn’t too bad. you can use a normal PC for CPU mining as long as the temperature isn’t too high. I don’t think giving a CPU a break for a week will make a difference in the long run.

deathlessdd · October 14, 2020, 6:48pm

When I say break I dont mean I just shut it down and not use it, I just dont keep full loads on it 24/7 especially hard drives. Cpus are ok I have plenty of cooling for my servers aka water cooled.
But anything mechanical I dont really like running at full anything for a long time. I wouldnt run my truck for 24/7 and I wouldn’t run hard drives 24/7 under full load all the time. The older they get the worse it gets.
Also I lied I have been shutting my servers down cause I don’t wanna pay the extra power to keep them on.

kevink · October 14, 2020, 6:53pm

consumer drives might not be made for running 24/7 but enterprise ones are specifically made for that, doubt that your truck is meant for that though
But 100% load is certainly something that will decrease a HDDs expected lifespan significantly. But other than SGC I don’t know anyone who puts their HDDs through that much stress

deathlessdd · October 14, 2020, 6:56pm

Same I cringe every time especially the stress that isn’t really necessary to do all the time…

kevink · October 14, 2020, 6:58pm

and yet they’ll probably survive that torture for at least 5 years.

SGC · October 14, 2020, 7:30pm

my hdd’s are all second hand enterprise drives, with i duno 50k hours spin time on them…
most of them are that way… had a couple break down… but only two out of 13 or however many it is…

i don’t run them under full load and they don’t run hot infact they run to cool… they are never above 30 and usual a fair bit lower… tsk tsk… the high humidity i’m working on… but the room is still kinda leaky so outside air will leak in and saturate it, also i’m doing concrete work, which doesn’t really help… have a dehumidifier running that tries to keep it down… but it has trouble actually keeping it below 80% should be like 40-60% for the room temps i got… something to do with the condensation point of water vapor in air… if it can condense it will stick to stuff and the it corrodes, its why stuff gets damaged from being stored in rooms without heating.

sure i put the drives to work… moved like a few hundred tb in total when i’m done now… which is spread across 10 drives, so thats like 10-20 tb each and their rating is something like 3-5 PB
i don’t plan on doing this all the time… but been testing out what kind of setups i like… the main node i don’t expect to move ever again… which will also become a bit difficult because now it will have a total capacity of 33TB or so, and then i can add another set of 4 drives to add more space… but already without it, that would get it pretty close to the theoretical max node size, because at one point there will be so much data that the delete’s will exceed the ingress and thus it will stop growing…

if i thought it was going to damage or kill my drives, i wouldn’t do this… really the drives i’ve had die was because i bumped the table i have the server on… every damn time…

if i was to hazard a guess thats the main cause of drive death’s… vibration
duno exactly how many hdd’s i’ve had over the years… but i know i’ve killed atleast 7 or so to moving them or exposing them to vibration while running…

when a drive is bad tho… or runs very hot… then torturing it isn’t the best idea… else it doesn’t seem to matter

kalloritis · October 15, 2020, 12:46am

I have a Z2 10x10TB WD Gold pool at work that would beg to differ- they’re singing right now along at 95-99% utilization due to a constant 5 queue depth from either the current scrub going on, or the constant iSCSI or SMB data being sent to them, despite the 128GB RAM and 512GB L2Arc. The 3x2x6TB WD Gold striped mirror pool for the Virt host on the other hand gets hammered constantly but chewed through its scrub while still serving data.

…then there’s the Ceph based systems- I’ll save that for another day.

Edit: FYI, this is a ZFS system that sees 150-287.5MBps traffic on the daily. Yes part of that is the storj node.

SGC · October 15, 2020, 6:52am

have you considered doing a mirror pool… sounds like your pool is a bit short on iops
the main problem with running big raids are that the iops doesn’t go up a long with more drives being added, but only pr vdev

so a mirror will literally have 30% less space, 500% more write iops and 1000% more read iops, the ability to remove vdev’s while the pool is live, and you gain the ability to rebalance data across the vdevs.

ofc if the workload is from scrubbing then maybe you don’t lack iops, and i cannot really say to much on mirror pools yet, since i haven’t really tried to actually used them yet…

just made my 2nd pool from 4 x 3tb drives to test it out… but i suspect that it will be how most of my pool will be going forward, sure i hate to loose the space… but the creature comfort by being able to change the pools while they are live, and rebalance the data…and then there is the added levels of redundancy… i know some won’t agree that there is better redundancy in a mirror pool, but really there is because of the data not being shared across all drives… sure one cannot loose two specific disks… but one can on the other hand loose data on either of the two hdd’s in a mirror vdev and still so long as it doesn’t line up on both sides the data can also still be recovered.

also if 1 drive is bad, it will only slow down 1 vdev so 2 drives… while on raidz2 you can run into silent death / damage of the hdd’s which then can eventually slow down the entire pool while rebuilding it…

so yeah there is certainly some advantages to raidz2 but i would say in the most cases mirror pools are superior.

i look forward to starting to really test it… might add two more drives… but i kinda like my test being 8 x 6 tb drives all the same in a 2x 4 drive raidz1 configuration (wanted to do raidz2 but couldn’t find the free space, so ended up doing a dual raidz1 it also has twice the iops… so should make it even more difficult for the 2x 2drive mirror pool to keep up but still it should be able to…

it should have twice the read iops, the same write iops.

should get my 3rd node installed on the mirror pool this morning… if i don’t feel lazy… so close now tho, basically just need to flick the switch hehe yes i can be that lazy

SGC · October 15, 2020, 6:57am

oh the total space used number keeps moving two days in a row… wowie lol
almost 30gb ingress…i really should get my new node created to it can start vetting

stefanbenten · October 15, 2020, 1:45pm

Nodes that are below the minimum allowed version, will keep running until the next restart/crash.

We have the check to be 1 minor version below, including every patch. Eg. if the satellite runs on v1.14.7, then everything from v1.13.X is still getting ingress data.
Once the satellite is bumped to the next minor version, lets say v1.15.1, then all nodes with version v1.14.X will keep getting ingress.

Its pretty straight forward and easy and simply gives the incentive to update.

kalloritis · October 15, 2020, 2:41pm

Well this is why we have Virt vs Vault pools- Virt is for our XCP-NG clusters and the Vault pool is hosting anything that is for the endpoints. This being things like the common “public” SMB share, personal drives, etc.; but is also doing iSCSI for any of the endpoints that require “scratch” drives or additional storage that is infrequently accessed (think code repo’s and syncthing targets). Its very much struggling right now because of the scrub.

SGC · October 15, 2020, 3:05pm

i have been trying to run my vm disk directly on my pool with everything else… i’m aware of how difficult that can be… but i really think it’s the right approach to collect everything in one dynamic pool / ceph storage that then distribute the data as needed or moves it to faster and faster caching or slower and slower storage depending on how often it’s used…

ofc changing such enterprise setups can be hell… sure the concept of splitting it into two segments is kinda what i’m doing with my 2x2drive mirror pool and my 2x4 drive raidz1 pool.

the mirror i expect to have vm drives on and then ofc l2arc and slog on both pools

i’m very interested in seeing how fast my new double mirror pool will scrub… ofc now it doesn’t have any data so… on the old pool it took about 1hr pr 1mil files, and when i’m doing tomorrow or so ill get a chance to scrub my current raidz1 pool which is only 1x, so that should be interesting to see just how long that takes compared to my old 3x raidz1 pool… maybe in for a multi day scrub… yay,… lol
think it was like 14-16hours so maybe 42 to 48hour scrub… for only 14tb…

if memory serves the second scrub when having a good sized l2arc doesn’t take nearly as long as the first one when the l2arc hasn’t cached the metadata, seem to remember it being maybe 20% improvement

how long does a scrub take for your setup and how much does the pool have stored?

kevink · October 15, 2020, 3:24pm

I guess once we can see those boundaries on the dashboard it will be… Until then you have to search for that information.

kalloritis · October 15, 2020, 7:03pm

unloaded? I actually don’t remember- These pools are on the higher side of their 5yr warranties (1yr left on the 6TB and 22mo on the 10TB) and haven’t exactly been left idle much of their life (around 101-320TB/yr each drive). Due to their age, we’re already evaluating moving to a full Ceph shop centered around either Proxmox, if we go HCI, or PetaSAN, if we stay traditional… we’re leaning traditional.

Edit: the Vault pool is 84% full, which is not helping the matter either.

SGC · October 15, 2020, 7:36pm

not really familiar with the other options, i’m running proxmox, would love to run ceph, but thats like a 3 server setup even to make it work…

duno if ceph is the future, sure sounds like it from what i’ve learned about it, which sadly isn’t enough… but since i’m nowhere near a level to even be able to consider making the change, i haven’t had the need to deep dive it…

traditional can be good… tho the ways raid and redundancy just works is… antique these days… with ceph one moves away from all that the entire storage becomes one fluid media for efficiency, to my understanding anyways… can’t state how little i actually know about it, but i have been really getting into storage stuff these last few years… and ceph keeps popping up everywhere

if you are in charge of the corporate network, you should really make sure ceph isn’t for you, even if it does require a bit of a change… you already have a working setup even in it’s getting older.

so there is time to start to build a experimental ceph setup if you did classify it as meeting the requirements set… tho i’m sure it will do that easily… the real advantage is its storage media fluidity which could well save a lot of future trouble for yourself…

traditional is nice an secure, and one can easily predict the outcome, but one will also miss all the new features which makes life easier, horse and cart was also very traditional once…

anyways, just saying… be sure you don’t make a choice you will regret, because you may be paying for it long term…

ofc in today’s world if the corporations computing demands isn’t going up… you might be able to exchange the server room for a cellphone lol never an easy answer for anything sadly

kalloritis · October 15, 2020, 7:47pm

Meet our 6 OSD x2TB single node setup that’s colo’d in a clients space:

Those are SATA 512e drives, specifically HUS726020ALE610, connected to a singular LSI 9300-8i on a ASRock Rack board with 64GB ECC RAM and a R9 3900X. WAL and everything is colo’d on the disks with a SSD boot drive.

If you have a use case, and acknowledge and work to mitigate all risks, you can build whatever you want.

in other news- pretty solid day today:

date	Ingress standard	Ingress repair	Egress standard	Egress repair	daily space
2020-10-01	5.43	7.1	19.69	4.55	135.49 TBh
2020-10-02	5.1	10.13	22.53	6.08	149.56 TBh
2020-10-03	5.17	8	22.92	4.25	144.35 TBh
2020-10-04	4.94	12.88	22.61	6.8	134.96 TBh
2020-10-05	5.78	15.49	21.9	11.55	152.21 TBh
2020-10-06	3.02	6.89	23.56	5.85	135.96 TBh
2020-10-07	3.66	3.27	23.87	2.27	150.97 TBh
2020-10-08	6.43	2.91	27.14	2.16	141.06 TBh
2020-10-09	7.43	2.93	27.81	2.14	128.51 TBh
2020-10-10	4.07	3.99	27.99	2.87	143.46 TBh
2020-10-11	3.87	5.13	29.44	3.71	164.17 TBh
2020-10-12	3.76	4.71	32.38	3.28	150.67 TBh
2020-10-13	11.49	14.89	30.31	16.52	137.56 TBh
2020-10-14	21.59	8.03	26.23	5.83	147.76 TBh
2020-10-15	17.52	5.07	23.06	3.51	67.50 TBh

SGC · October 15, 2020, 8:27pm

well the whole cluster concept doesn’t work with only 1 server, the idea in ceph is like raid5 but on a server level, so if any of the storage start giving corrupt data then the two others will know because they can agree that the last one is wrong…

ceph is like a cluster based file system that i guess is almost as much a SAN as it is RAID
atleast to my understanding, but yeah never used the thing… nor learned enough about it to really be wise…

those are great drives… i got the 6TB ALA version basically just 512b instead of 512e or is the ALE for encryption … something like that… i really like them, so far no issue at all

had to turn my PWM off using hdparm, because some of the drives would cause latency and iowait due to it falling asleep or that was the theory… didn’t really seem to help before i replaced my sata slog ssd with a pcie ssd apperently the sata ssd was getting so much io work that it would get 150ms latency and that was long enough for the first hdd in the first raidz1 of the 3xraidz1 pool to go into PWM mode and spin down…

then ofc it would have to spin up shortly after… which ended up taking like better part of a second… gave me so much grief to find that issue… thought that damn disk was broken…
alas turned of zfs was weird and my sata ssd slog was slow…

i got no idea what that means, i don’t speak ceph, i just saw some lectures with some smart people dumbing it down.

the 45 drives i think it was on youtube, their storage engineer apparently prefer ceph

just watched a bit of this a few years ago when i was trying to decide what to setup… ended up buying a lsi 9280 i16 card and before i even got storj up and runnng again for v3 i had switch it to dual HBA’s instead lol

anyways, watches some of their videos and they where kinda cool so it stuck…

Pentium100 · October 15, 2020, 8:49pm

Ceph is interesting, though with storagenode you may have to use the block device instead of cephfs (as I do not know if sqlite has the same problems with it as with nfs).

I probably would use ceph (or glusterfs) if I had a node in a remote location, where I could not get there quickly in case of a problem. Then I could have three or so storage servers with a bunch of hard drives, two HVs for running the node VM, two routers, switches and UPSs. It would be able to run the node with some failed hardware until I managed to get there and fix it.

SGC · October 15, 2020, 8:58pm

yeah in some cases ofc if downtime is acceptable, then going all the way near 100% is just wasteful…

it’s always that question of the economics of it… distributed storage tho…

the cluster thing gives so much bandwidth and load balancing also… but ofc if one doesn’t need it… then there isn’t any point…

anyways, yeah the network is really picking up… seems to be scrolling really fast now in the logs

i think i will be very happying moving towards running mirror pools instead… but now i got one to test on… and it will even have it’s own little storagenode of a max capacity of 5tb or so… so cute lol
or i set it to 4 as i expect to run some vm’s on it atleast until i see what happens and how it performs and how i like the features.

clustering sounds so cool to, but also very wasteful if one doesn’t need to insane bandwidth of 3 servers basically and near perfect HA

kalloritis · October 15, 2020, 9:39pm

I’m familiar with Brett and 45 drives, solid person/people. They use to be a ZFS only people, and now they’re keeping with the times by offering clustered stuff.

And yes, while I agree that the single server kinda defeats the purpose of Ceph, in that its a singular node, but it was created in a way to allow scaling up of the storage and out of the number of nodes at that colo site- they’re already on track for a second copy of that node (PVE2) to be created with a Rpi operating as a 3rd node stand in to avoid split brain issues (always make your node count odd, kiddies) and adding another two 2TB OSD’s to PVE1. And our Ceph options we’re quoting out for the main office are all either triple or quintuple setups, depending on drive size, IOPs need, and CRUSH rules/pool setup.