Most cost effective built (SNO)

Wait till it updates and runs the filewalker process that always shows me some usage.

i switched away from the avg utilization graph because it didnā€™t show very useful numbers, i did use it to calculate my max estimated throughput of node GB in a day from it tho.

the top graph was 3 months avg as you can see it peaks at 2.2% and then drops off as activitiy did

weekly avg graph for my 14.4tb node, with uptime and latest spike marked for time reference

so it just seems to indicate that your storagenode hasnā€™t been restarted for a weekā€¦ :smiley:
my cpu utilization is lower because itā€™s split across more nodes :smiley: or because itā€™s on 4 coresā€¦

the max usage graphs might be a bit betterā€¦ because then one can see peak usageā€¦
which might be relevant for node performance at some pointsā€¦ ofc so long as the avg is enough iā€™m sure the node would survive.

iā€™ve been reorganizing my network infrastructure ā€¦ so had many many restarts over most of Decemberā€¦

but yeahā€¦ storagenodes sure doesnā€™t use much cpuā€¦ :smiley:

the spike in cpu usage and uptime seems to correlateā€¦

and the full flatline 0% was downtime for my network.

I have an idea. The heavy RAID systems can affect the ability of storagenode to store data, thus - higher RAM and CPU usage. All my nodes have a low memory and CPU usage and I do not have any RAID for them:
Windows

  • native:
    CPU 1386.09375 seconds usage since December 20, 2020 4:13:32 AM or 0.62%
    RAM: 38.63671875 MiB
  • docker
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O  PIDS
67294afebfb0        0.14%               53.46MiB / 24.82GiB   0.21%               70.3GB / 3.42GB     0B / 0B  38
93605d6fc4b3        1.95%               32.07MiB / 24.82GiB   0.13%               801MB / 19.3GB      0B / 0B  22

Raspi3:

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
a3fc3ad34f34        storagenode         2.32%               62.12MiB / 800MiB     7.76%               0B / 0B             0B / 0B             16
1 Like

Well I think avg gives you a good idea how much cpu is being used all the time, If you compare it to my windows VM it tells a different story.


Max cpu could be anything causing the cpu usage to spike.

agreed, the avg utilization tells one about how much cpu is really neededā€¦
but when considering a storagenodeā€¦ peaks could be relevant, because high cpu activity could created added latency, or downright failures of somethingā€¦ even if doubtfulā€¦ most likely it will just give dropped or cancelled upload / downloadā€¦ i suppose i could test that with my 14.4tb node

itā€™s just at the edge of what a single core would handle it seems, atleast at peaksā€¦ avg isā€¦ much more sensible, but still 30% cancelled uploads for 12 hours when one is running the filewalkerā€¦ might be perfectly okay for someā€¦ but why would i accept it if i can avoid itā€¦ and cpu time is so cheapā€¦
and can be turned off when not usedā€¦

well the only thing running on my containers is docker and 1 storagenode in each :smiley:
but yeah it did seem like i had some sort of iowait event permeate through all the containers, but usually not the caseā€¦

ofc if i look at peaks then 30% of 4 of my cores for processing 100gb on the 18th of decā€¦ that would put me at 300 for 90% and then x4 to reach my 16 threads so 1200gb in a day if i do the approximate math from my max cpu usageā€¦ maybe only 800gb depending on how proxmox counts the cores, in vmā€™s i select based on cores and threads, but in containers it seems like they are all coresā€¦

while i get 20x 100 so 2tb x 4 so 8tb of data processed in a day for the networkā€¦ if i use the avgā€¦
sure the avg will work fine in most statesā€¦ but like now where i would be rebooting oftenā€¦ or if we imagined higher traffic a day than 100gb then it would also go upā€¦

but yeah
avg is good for seeing the minimum requirements of the cpuā€¦

while max is good for seeing the maximum possible used by the cpu.
each has their placeā€¦

but only one of them will tell you what you need for optimal performance specsā€¦

tho using the word cpu performance in relation to a storagenodeā€¦ seems to maybe a misnomer
lol cpu requirements are so low that one can get out an abacus and might still keep up.

wonder if enterprise cpuā€™s have an advantage over consumer cpuā€™s for this kind for workloadā€¦ ofc the throughput most likely arenā€™t high enough for that to matter.

now when you mention that promox build with SNOā€¦ now im thinking of something alittle more wildā€¦
has anyone here tried with Freenas with Dedupe and Compression on and run the SNO on a jail off that particular path ā€¦ i got a feeling it might push the storage limit to double or tripper with that buildā€¦ unless the data storing on the SNO is totally unique.

i think they renamed freenas to truenas ā€¦ awhile back

and no zfs will not be able to gain you any extra space with dedup nor compression, we already tested that, the data is encrypted and thus cannot be compressedā€¦ because encryption schemes try to make data look random.

or i duno if anyone tested dedup, but i would be very very surprised if it works and if it does it will only be for testdata and your system would just die trying to keep up eventually.

not compressing stuff will get you something thoā€¦ :smiley: encoding and it wonā€™t be a tonā€¦ but a few bits which i guess is sort of the same number :smiley:

rant and reasons

donā€™t use dedupā€¦ itā€™s like filling your swimming pool with corn starch, and playing with non newtonian mechanicsā€¦ so much fun at firstā€¦ works greatā€¦

until you realize you cannot pump it outā€¦ itā€™s difficult to digā€¦ it basically eats things alive if they get stuck in itā€¦ most likely goes bad at some pointā€¦

i made a zfs dataset using dedup for my vmā€™s drives, then i knew i could always abandon it, by copying them to a different datasetā€¦ work fine for a whileā€¦ but even tho my vm drives are like less than 60gb and that it was stored on 3 x raidz1 with ssd l2arc and slog and i got 48 gb ram.

after a few months i noticed my iowait was just sky rocketing and nothing i would do seemed to fix itā€¦ meanwhile i had sort of forgotten about running dedup on it and most of the vmā€™s hadnā€™t actually been used a ton, but one of them had been running a serviceā€¦ nothing that really required a lot of dataā€¦ like i said there was basically no real amounts of data in the dedup datasetā€¦ but it killed my entire server from operating when using anything that was on the dedup dataset and worst was the one that had the most uptimeā€¦ not sure whyā€¦ but since the entire dataset could be in memory almostā€¦ i donā€™t really understand how it could go so badā€¦ but it has something to do with how it cross references stuffā€¦

iā€™m sure there are some kinds of usecases for itā€¦ i just donā€™t know what itā€™s forā€¦ surely not for regular data storage / usageā€¦ maybe long term storage?

i was also recommended never to use dedupā€¦ never understood whyā€¦ it sounds so awesomeā€¦ but it just isnā€™t viable long term for everyday usage.

compression is awesome on a file systemā€¦ ofc for
SNOā€™s both are basically useless because the data is encrypted and most encryption schemes try to make data look randomā€¦ which is the polar opposite of compression.

dedup you might be able to get something out ofā€¦ but as i understand erasure coding , which is what storj / storagenodes hold to store the dataā€¦ then they are all uniquely generatedā€¦ so they are very unlikely to be the sameā€¦ in any useful way.

Compression is based on searching of repeatable patterns, not on ā€œmake data looks randomā€ :slight_smile:
The simplest compression algorithm: 222344445555 can be compressed as 23314454
However, the encrypted data almost random numbers and way to hard to find repeatable patterns.
The same problem with deduplication.
However, you can use a deduplication on higher level for VMs, some hypervisors are able to reuse the same basic images for VM, if they based on the same OS.
But again not for the encrypted data, because the duplication is very low.

KSM is a good example of reusing data without much performance lossā€¦
Kernel Shared or Same Memory it will basically use a sort of dedup process and make one able to run 3 times the number of VMā€™s with the same amount of memory.

according to proxmox documentation, KSM is a linux thing tho i believeā€¦
right now my proxmox has 2gb in KSM which is shared over 4 debian containers and 2 debian vmā€™s
canā€™t complain about that, and basically no performance lossā€¦ i think they claim there is like 1% or 2%

but outside of ramā€¦ dedup is just a bad planā€¦ perhaps on ssdā€™s but then again it would end up scanning and comparing your data all the time, thus wearing down the ssdā€¦ so
would work, but you would really want to have some practical use case for it.

the thing i was alluding to before was zero length encoding, which worksā€¦ but has nothing to do with the storj data and everything to do with how hddā€™s and zfs stores dataā€¦ not even sure it would work for most other fs.

The NTFS (HPFS at born) has a compression feature since the version 1.1 (1995)

but zfs has variable block / recordsizesā€¦ which is why i think reducing a data block will actually save spaceā€¦ which is why iā€™m unsure if it would work of other filesystems without that feature.

no point in keeping any mostly empty sectors on the hdd.

on windows if you have 4Kn hdd, and you reduce the data from 2kb to 1kbā€¦ it will still take up a 4k sector because thatā€™s the smallest it will write.

while zfs should be able to go down to the minimal the hardware allowsā€¦ ofc if that is 4k then thats 4k ā€¦ but zfs blocks / recordsizes are much larger ā€¦ in what is usually the raid range.
most likely because it was build to be raid mostly. :smiley:

but still, it will cut data to length ofc within the hardware limitations, but because of the increased sizes and variable size, it can cut the data blocks very accurately and utilize space very efficiently.

ofc that comes at a cost in other aspects, like say itā€™s raid is terrible for adding and removing disks to an existing array / pool

zfs is magic and way to complex for me to explain wellā€¦ just magic! thatā€™s how it worksā€¦

Hi,

iā€™m running 3 nodes on 3 HP Proliant Microserver N40. All of them are used (bought them on Ebay - average price of 90 euros per device), bought new disks, installed Windows 10 and they are running flawlessly for almost 1 year. One of the best things of these devices is that they can hold 4 disks and while running they use 50W per hour.

Cheers,
Tiago

1 Like

i read about WD purpleā€¦ i mean does it really matters about surveillance disk vs nas disk ā€¦ i mean ā€¦really ā€¦

Pretty sure WD just relabeled there drives for both NAS and surveillance. Itā€™s all about marketing itā€™s like when they add Gaming to there namesā€¦

1 Like

from what i understand it sometimes comes down to the algorithms used in the hddā€™s control pcb, so what you may be paying for could be that or rated throughput vs length of warrantyā€¦
it may very well be the exact same hardware, maybe a better bearing or a few other components maybe a bit more wear proofā€¦

hddā€™s are pretty difficult to wear out, in generalā€¦ so iā€™m sure a surveillance disk should be fine for most of usā€¦

another thing that just came to mind, is that the nas drives may be rated for different vibrationsā€¦ and rated to run in series of like say 24 disk without created harmonic dissonancesā€¦ so really it maybe the same drive that has just gone through a lot more testingā€¦ maybe an older seriesā€¦ maybe there are insurance related requirement for surveillance drives which must be metā€¦

there are many various small and big reasons to why some diskā€™s have a different model definitionā€¦
iā€™m not really that familiar with it all, aside from enough to know that itā€™s most likely perfectly valid reasons, even if itā€™s not relevant to those that buy them in many cases.