Most cost effective built (SNO)

ZhiYuan · December 18, 2020, 2:07am

what is your most cost effective SNO built?
will you consider raspberry pi the most cost effective built? which includes power consumption and hardware cost?

deathlessdd · December 18, 2020, 2:59am

My most cost effective system would be a free system that I got off a marketplace. I started with a 1TB hard drive,then I ran 2 nodes with 1TB each that I had sitting around. I then used storj funds to upgrade hard drives.

Most cost effective would be hardware you already have. Soon as you go out and buy hardware it’s no longer cost effective.

If your talking about efficiency raspberry pi will win for sure. But you will need to take all costs in consideration, and you have to think it will take awhile to recoup the amount If you were to buy hard drives and a rpi4.
If you already have the hardware this is the best way to start off.

joesmoe · December 18, 2020, 3:04am

You will find many differing opinions on this.

Please consider that failure due to some weak usb/sata/cheap external hdd could cost you all your held earnings.

For the single drive, single node, beginner, i hear these ODroids? are really good.

TheMightyGreek · December 18, 2020, 6:18am

Indeed I started with an odroid HC2 because I didn’t want to leave my main PC on all the time and it forks like a charm.
It’s tucked behind a clauset with a 120mm fan blowing over the housing to keep it cool. I just hacked the fan to connect to the USB5V and that’s all the power it needs to keep the cpu under 50C, almost set and forget.

andrew2.hart · December 18, 2020, 7:07am

I don’t think you can beat a pi4 booting direct off a wd elements 12tb. If you already have an old phone charger to power it, even better.
But only when the wd elements are on sale

SGC · December 18, 2020, 7:36am

i think scaling is the way it’s cheapest, because basically you don’t want processing to be stuck to your storage, you want to get the best utilization out of your processing power and storage.

the best way to compare is the watts pr TB and when you start on that it turns out that RPI’s might not be as efficient after all, due to their limited CPU bandwidth.

sure if you plan on hooking up 1 or 2 drives then a RPI seems to be the popular choice, being at i think a few watts pr TB…

i mean a HDD itself is anywhere between 2watts to 0.25watt pr TB ofc you can get higher watts per TB but then it quickly becomes an issue of actually turning a profit depending on a lot of other factors… so lets hope nobody is buying hdds that pull more than 2watts pr TB these days.
which would be 2-3 TB HDD’s

based on my math i can hook up like 100 hdd’s which would put me at lets say a kilowatt with a system that might not be able to utilize their bandwidth and may lack cpu to keep up… but i could just say i have newer hardware… but it was bought with around the 100hdd mark in mind.

100 hdd with system power, redundancy… hell we could even say it was running a mirror raid setup…
if we say 20tb hdd’s then thats 0.5watts pr TB

doubt one can really go any lower than that… and ofc with that one also has redundant power supplies, server grade hardware, which pay off in other ways down the line… imo

ofc the real issue when one starts looking at storage isn’t the power usage… because a hdd uses about 10-20% of it’s value in power over its lifetime, if we consider them used up or outdated in like say 5-6 years.

so really trying to conserve power usage tho saving overhead, will only reduce on the 10-20% of the total cost.

complex topic, but you get the idea… the real thing that has to be cost effective is the hardware and the man hours… the smaller the setup and worse the setup the more man hours will go into keeping it running… but thats ofc my opinion.

node1 · December 20, 2020, 10:17pm

I can’t see more then few percent better performance on pc vs rpi4 (2 drives). But the power cunsumption a few times more on pc. So no doubt go for rpi4 with 2-4gb of ram never seen used more then ±450Mb. (running two nodes) + couple of drives. Somebody here in forum tested ±20 drives/nodes on rpi4, and there was no problems.

kevink · December 21, 2020, 6:02am

Main argument for PC vs RPI is that PC offers SATA connection while the PI has a possibly unreliable USB connection (and a possibly unreliable external HDD enclosure).
This could be solved by using an Odroid HC2 (or the newer one with 2 sata ports), it has an SATA connector for one HDD and 2GB RAM. It also has USB2, which is a disadvantage compared to RPIs but as a 2nd HDD that might be ok. Still, it’s the most cost effective solution while still very reliable. Would recommend it more than RPIs. However, RPIs are easier to find/buy thant Odroid HC2.

SGC · December 21, 2020, 9:48am

the number of drives i think is almost irrelevant, i think it comes down to memory management… there seems to be an increasing amount of memory used as the TB stored increases, also there seems to be some minimal memory a storagenode wants to use.

maybe that’s because my nodes have no fixed memory ceiling and if they get close to what they got i would most likely just raise it…

i would expect something around 80TB on 2-4GB ram to be close to the limit before the storagenode atleast wouldn’t get the ram it might want…

but this is ofc a projection based on the limited data i got and not knowing how the storagenode keep track of it’s saved data… but no matter what… the bigger the storage capacity the more memory will become required to keep track of all the allocations,

even if that kind of stuff could be stored of stuff like SSD’s and other solutions… and even the storage media itself… but moving the “memory” framework to disk would ofc slow down the process greatly and thus would most likely result in higher latency for data requests.

and yeah like kevink says, i certainly wouldn’t want to use usb if i can avoid it… might work fine… but then again it might not… usb is built to be disconnected and reconnected live… and thats part of the problem, tho eventually i’m sure the storagenode software will have ironed out most of those sort of issues.

but at present a brief disconnect would shut down the storagenode, so if the usb bus decided to reconnect your drive…

no matter how short the reestablishing of the connection is… then the data stream atleast in windows would be disrupted… because the hdd will basically be running in a new instance / new instance id or hardware id whatever we want to call it… it may look the same from a top down view… on the desktop…

but lots of software doesn’t deal well with it… because it’s usually not a problem, because most stuff doesn’t need to run 24/7 “unattended”.

alas usb has some drawbacks… might be great for temporary solutions, and it sure is nice to be able to hook up near infinite disks…

kevink · December 21, 2020, 3:55pm

grafik

The difference is not that relevant tbh… maybe once you have a 20TB node but maybe it’s different if you don’t use zfs like I do… that’s possible I guess.

SGC · December 21, 2020, 6:10pm

i think the problem might be that you are looking from one point in time… tho i would say it’s kinda interesting that your nodes seems to use much less memory than mine… ofc maybe thats a matter of how i track it…

might have something to do with the filewalker also… been having a lot of node reboots since i’ve been tinkering with my network, seems like there maybe a correlation in that… but duno…

this is my 14.4tb node

this is a 280gb or so gb node

this is a node near 1TB

spot the worst node the one with the highest latency is the near 1 tb one.
its current running on a single hdd which is almost filled

then there is the big storagenode… which usually will run with l2arc and slog, but hasn’t for the last week… but it doesn’t matter either… if i don’t start and stop it… i think it might not use this much memory…

and keep in mind that this is the memory for the full container… not an individual program or docker which may in some ways i’m unaware of share memory with other parts of the system.
sure it will run lower in memory usage, but it also seems very dependent on activity of egress and such.

i duno why… i have tried to cap the max memory and it seems to accept that… but why would i want to slow it down… also all hdd’s are running at sub 20ms latency, iowait is maybe 2% avg or lower

i duno what the big storagenode is doing some of the time… but it likes to use memory, which doesn’t really surprise me, will be interesting to see how it will behave long term, because i haven’t tracked it this way for more than a week…

this is the full memory usage for the container and docker, of which some of the information will be shared across KSM so this is not an accurate memory usage for the 3 … it’s individually accurate memory usage… but since they all share some of it, one cannot add them and get the total of how much memory i am using for running 3 nodes, because they share… well 4… err only 300mb… but they have been restarted recently… so maybe thats why KSM sharing is so low…

also not exactly sure how proxmox tracks that… KSM thing …

but those 300mb shared memory could account for why my avg looks so much higher than yours, since mine is tracking the total usage and yours is tracking how much each individual instance or whatever it was container is using and then docker or whatever accounts for the rest of the memory usage which they all share…

i doubt there is such a big difference…
if i go do docker stats on the 14tb (this also had a rather big difference of 250mb from how much proxmox reported of memory utilization for the container compared to how much it’s docker would report used)

and this is for the near 1tb node, which has a usage of 150mb
but in docker the storagenode only takes 27mb

so yeah the docker memory number seems to no include the tables or whatever it’s called for data storage allocation, and thus isn’t that great for actually estimating the required memory to make a system run smoothly.
granted i should just put my storagenodes into the same container … and i plan to
but i wanted to figure out what was going on with the memory and would be better at tracking it…

maybe ill move them over after newyear when my 2nd isp runs out, just to see how it would change the memory usage of the container with more nodes…

anyways… i can see where you get the numbers, but if i looked in docker from time to time… when my node hadn’t been rebooted for a while… then i doubt it would show much more than what yours seem to do… but still the OS also seems to use more memory when dealing with larger storagenodes and that docker doesn’t seems to take into account in it’s totals…

would be nice if you tried to collect some data on your memory usage over time… if you still have netdata then that might be able to help… it tracks the storagenodes docker container usages.

maybe i should try the linux app… to go around docker…

or it may be down to proxmox, different zfs version, kernel version, or other unique hardware / software configurations related to my system…
i can exclude my ssd now since it’s offline lol

both smaller nodes are exactly the same, even tho one holds like 3 or 4 times the data of the first one…
and i mean exactly… . within less than a few % which kinda tells me that there might be a minimum that they are still on…

joesmoe · December 21, 2020, 6:32pm

I also just wantd to mention that Synology unit’s a great too.

SGC · December 21, 2020, 6:36pm

how much data you got on one of those ?
and how is the memory utilization looking of the system?
gimme some of them graphs you know you want to

@everyone
what i was trying to say with my earlier post was that it seems to me that beyond a certain node size, there is a substantial increase in memory usage… not all the time, but some time…
and personally i wouldn’t want to run into memory issues and it’s a common thing that servers that control / supply vast stores of data uses loads of memory to run smoothly…

the data also seem to suggest that it may be only when past a certain point in size… which could indicate either a programming thing or something similar… also if people store their databases in different locations, that might also affect it … i duno

i certainly wouldn’t skimp on memory if i was to build a cost effective system, which i believe is the larger systems… if one wants many big nodes…

deathlessdd · December 21, 2020, 8:45pm

Here is my test node 200gigs

max ram is 512mb 1 core

Proxmox doesn’t show actual ram usage though.

SGC · December 22, 2020, 8:08am

you should turn off swapping immediately… its basically a useless feature for 99% of people which was made to avoid crashing and so that a small computer could work with a big data set…
a system should never be swapping, ofc maybe if you are running the vm’s on ssd… then it might be semi okay, but it’s certainly doesn’t help with much unless if you are running out of ram.

the ram thing might be down to KSM… i got 1.2GB shared

https://pve.proxmox.com/mediawiki/index.php?title=KSM&redirect=no

14.4 node (docker in the debian container shows 153mb used, so about the 250mb offset from yesterday)

one of the small ones, the other is still basically a mirror image
docker shows 36.33mb used so an offset of just about 160mb which is also just a few % away from yesterday…

mine for the last hour looks fine.

their offset seems about the same as last night, even if the 14.4 node has finished it boot up work or whatever and now is using much less memory.

to test this further, i spun up another debian on which im going to install docker and updated to the same level of the other containers.

deathlessdd · December 22, 2020, 11:34am

That’s just a default Ubuntu server install I dont normally custom set anything to run.
I’ve never had any issues with swapping but I will keep that in mind, because my raid is ssds m.2 to be percise it’s only a test node running binarys with the updates to see how well it works out.

ZhiYuan · December 22, 2020, 11:55am

I’m sure the swapping is not very important if you do have enough ram.
I was thinking to put the gateway together with the SNO on a rpi. If there are enough ram and processing power. Why not put some simple software like next cloud with gateway or maybe nagios and upload the logs to tardigate if the rpi can handle it.

deathlessdd · December 22, 2020, 12:15pm

My idea was to simulate a budget low powered soc or pc I want to put the minimal for a reason, But I will also disable swap to see how bad it can get. Even though its a really high end server, But it might not be exact to compare with since its a threadripper with 32 cores but with only 1 core for this node I wish I could only assign a thread to this node but I don’t think proxmox can do that.

SGC · December 22, 2020, 2:04pm

Summary:
RAM requirement - you should expect to be dependent on workload and increase.the more data capacity you expect to serve to tardigrade.
CPU requirement - will change depending on the tardigrade network activity and how many unique connections to the network you make.

and so lets put some numbers on it
4GB of RAM (well i wouldn’t expect more than 40TB to be served optimally on that) maybe 8GB
so lets be large and say 10TB pr 1GB of memory sounds realistic to me… from what i see (YMMW)

CPU anything should do unless if it’s like some tiny low core semi old mobile thing.
so far as i can tell anyways, i’m sure there are a wide range of opinions.
i wouldn’t go less than 1 thread per unique tardigrade network connection, that should be enough…
ofc gross estimations, YMMW and i am not a finanical or health advisor nor your local witch doctor…

Rants and reasons
i think i got a node running one 1 core, for testing exactly that…totally forgot about that… been running like that for a while i think
yeah both of them actually… 1 core on a 2.16ghz xeon 5630L maybe can’t remember the last letter
antique cpu basically, not to shabby for general computation tho…

so got almost to the 3months of that test done, am running on a 2x mirror zfs hdd pool, so no realistic normal storagenode limitations on it either…

this is an avg utilization 1st oct until now the first month of flatline is vetting, then it spikes a bit with some activity and then activity drops off, i think because there was an extra node that came back onto my subnet…
the node is at 500gb currently… thought it was a bit more… so not very much in size yet… currently set to max 4tb but will most likely increase it in size before it reaches that.

and this is the max ram usage graph over the same period of just a bit less than 3 months, the max ram peak is 458 MB, when i point the peak to check… looks a bit lower on the graph

and traffic for the same period, i knoew the graphs look a bit rough… it’s a zoom on the proxmox yearly graphs, so not fine detailed, last month up next

the yearly cpu avg utilization graph was a bit useless.
weekly cpu max utilization looks more useful, got an odd spike on the monthly graph… might be some sort of server error or high activity i have caused… as it seems to go across all the containers.
500gb 3 month old node.

there is also a pretty clear correlation between ram max usage and cpu utilization, most likely in relation to restarts of the storagenode / container thus starting the filewalker process.

280gb 2 month old node, only finished vetting on the 15th dec

further more it was moved over on it’s own internet connection around the 19th, which clearly shows an increase in cpu activity, indicating that the more activity the more cpu is required and thus if multiple nodes share 1 connection their demands for cpu will be significantly lower…
not really a big surprise.

the big node of 14.4 TB has 4 cores on its container, and shows a significantly higher demand for processing and ram, even tho it has been sharing internet with other nodes…

each of the nodes are on their own storage.

so what can we conclude…

the more data one stores the higher the demands will be for particularly memory, the more activity a node has the more demand will be for cpu, thus there will be a minimum cpu requirement to supply 1 internet / subnet ip with node processing, this minimum will vary with how much data is moving on the storj / tardigrade network.

my architecture is 10 years old, it’s basically what ended up in the I3, i5 and i7 series back when they launched initially, and tho xeon it will be slower per core than almost all consumer grade cpu people will be using today, but i got 2cpu’s of 4c/8t @ 2.16ghz
on my 14.4TB node i do see peaks of 30%, tho the avg is like 5% for 4 cores for that day…

so really processing seems to be the least important factor, it would be near impossible to actually max out any cpu.

not even 5% avg on yesterdays peak activity.
processing nearly 100GB, so a single core of close to the worst cpu you could imagine might be able to deal with close to 500GB of data a day, if it doesn’t totally choke and starts failing uploads and what not… but then again the worst cpu’s are like dual core and the network has never even broken 500gb a day to my knowledge…

deathlessdd · December 22, 2020, 2:20pm

Its very intersting to see the differences in CPU usage because this is my test node and this is a week avg

It hasn’t had any peaks its just 4 to 6% pretty much all week. Its about a 2 month old node now.
But I will see if anything changes.