Yes, 0 is a default for this parameter.
But now it’s not needed to be specified even for a weak devices
So you may comment out it (place the #
character before the option), save the config and restart the node.
Yes, 0 is a default for this parameter.
But now it’s not needed to be specified even for a weak devices
So you may comment out it (place the #
character before the option), save the config and restart the node.
Have you ever encountered a combination - a basic Linux system, with vmware installed on it and virtual Windows 10 systems running on it?
Could such a combination be better if the base system is also Windows 10?
The default:
# how many concurrent requests are allowed, before uploads are rejected. 0 represents unlimited.
# storage2.max-concurrent-requests: 0
Why all issues started with “I’m running a Windows VM on VMWare on a Linux host”?
Please, do not take it as an attack, but it’s true.
First of all - do not use Windows, then - do not use VM, then - do not use VMWare to run Storagenode.
Why you cannot run a docker container directly on the Linux host? You likely will never meet any issues if you would run the node like that.
Since it’s VMWare, then you likely configured something specifically for storagenode, not use the what you already have, am I correct?
You may still catch all issues related to this combination of a complex setup, or consider to reconfigure things. If it’s the case, then first of all - do not use Windows VM and NTFS. Or use them only as a bare metal - then they would work great, proven by @Vadim.
Because I don’t know how. I haven’t dealt with Linux at all. So I work with what I can. I would like to, but I haven’t found detailed instructions on how to do this.
OK, my node is assembled from decommissioned HW, so yes, isnt too fast. Can you pointing me, what is the required disk speed for StorageNode? If I have a problem with disk speed, guys with RPi must be down, or?
Yes, filewalker takes after start much from the disks speed and if in the time is send too much data to my node, this can be consequence.
I just thinking loudly - it is possible to run filewalker with lowest possible IO nice? I mean, slowering all processes, which isnt required and fastering processes, which need to be done quickly.
Lets starting with tunning:
storage2.monitor.verify-dir-writable-timeout: 1m30s
BUT
# how many concurrent requests are allowed, before uploads are rejected. 0 represents unlimited.
#storage2.max-concurrent-requests: 25
disabling the limiting concurrent requests looks like bad idea for me with slower node. I think, I must have the parameter active and set lower value. Or I missing something?
I am still curious - how the situation handling guys with “weaker” RPi?
UPDATE:
#storage2.max-concurrent-requests: 25
This was definitely bad idea for me .
Load of machine was unpredictable and sometimes too high (CPU was used on the limit and RAM buffering jumping like on the roller coaster).
Many times CPU really hit his limits (on the IO wait):
Actually trying:
storage2.max-concurrent-requests: 20
Looks better, but I will watching the node very closely .
Why all issues started with “I’m running a Windows VM on VMWare on a Linux host”?
I have only Linux directly on the HW and I have Docker daemon on the host OS. This “container under virtualized virtualization” looks like great recipe for disaster and wasting of resources …
Because I don’t know how. I haven’t dealt with Linux at all. So I work with what I can. I would like to, but I haven’t found detailed instructions on how to do this.
If you know working only with Windows, it is better to stay here and have Win directly on HW and here installed Docker and using StorageNode.
OR
You can take the running StorageNode as a challenge, trow Windows and all his licensing garbages to the trash, start reading about linux distribution, chose one (I using Debian), install it directly to the HW and thats it!
For installing Docker you can find bunch of step-by-step documentation, so, this isnt problem as well.
For installing StorageNode there is documentation as well.
Sorry, still missing point - what is problem??? Time? You can spin up SN under Windows and start doing same thing on second machine with Linux … You can learn soooo much information and get valuable experience!
Thank you. I will try.
I tried to install a storagenode on linuxmint using the instructions on this forum. But I encountered errors during installation and the installation was not completed. I didn’t understand the essence of the errors, so that’s where it ended
Thank you. I will try.
If you are still unsure or feel as a newbie, try UbuntuServer. Looks stable, but have bigger community for help in case of trouble. Or you can use Debian. Many of guidelines for Ubuntu are working for Debian as well and vice versa.
My opinion to M$ products are more like negative, near zero, so, I highly recommend migrating from this “master of vendor locks” to somewhere, when any updating iteration is in your hands and you are only one, who using your hardware. You can look for topics like “BitLocker enabled by default” or “Recall enabled by default” (luckily, back pressure was stronger, like managers bullshits ).
This is off topic, so, I terminating it here. Lets focus to tunning slower nodes …
I see, but would you like to?
We can help.
If I have a problem with disk speed, guys with RPi must be down, or?
yes, and no. Windows and NTFS are a bad guys here… Initially the NTFS (IPFS in the past) was a good filesystem against FAT, but later much more efficient FSs were implemented.
However, back to the topic. If you do not plan to migrate (and this is OK), you likely need to tune it to be more fast than now.
I just thinking loudly - it is possible to run filewalker with lowest possible IO nice?
It’s already doing so, it’s called a “lazy filewalker” and regulated by the option pieces.enable-lazy-filewalker
(it’s true
by default). This option affects almost any filewalkers (except “collector” I believe), so they run with a lowest possible priority.
disabling the limiting concurrent requests looks like bad idea for me with slower node. I think, I must have the parameter active and set lower value. Or I missing something?
You do not need to touch the parameter, which would directly affect your payout (lower value - less uploads and download requests = less payout), it also abruptly affects customers (they will receive a response “that node is overloaded, it’s too busy to provide you a piece which you pay for”), so it is better to do not change it from a default (0, i.e. no limits). Especially after the “choice of n” is implemented:
Ok while you are removing any concurrency limit you might have set I will explain how the node selection actually works. When a segment gets commited to the database it will contain the nodes that have been fast enough and it will be missing the nodes that got long tail canceld. The satellite calculates a success rate for each node with that. The node selection takes that success rate. Instead of 110 total nodes it selectes 220 nodes at first and compares them in pairs and pick the one with th…
and RAM buffering jumping like on the roller coaster)
which again suggests that you have issues with a disk subsystem. The RAM always grow if the disk is not able to keep up.
How is it connected? What’s filesystem? How much RAM do you have?
I have only Linux directly on the HW and I have Docker daemon on the host OS. This “container under virtualized virtualization” looks like great recipe for disaster and wasting of resources …
exactly. Except that docker is not a virtualization, it’s actually running on the host, moreover - on the same kernel, but with restrictions. So no penalties unlike any virtualization (independently how good is it).
Yes, I use btrfs. Recommended system on Synology. Had space over, so decided to use it for a node.
Without checking the complications for Synology -btrfs & Storj.
Somewhere coming months I will push out a new storage machine of some sorts on another location. Which will probably also have a node.
But no Synology that time.
But anyway, it’s a 923+ with 32gb ram. 2 x16 works fine. It’s two ironswolfs and a single Exos.
I will disable access time. At setup I turned it on. But haven’t used it, so it’s not for me.
I will check your guide.
Any knowledge about Write caching? I can see them wear out quickly with the writes Storj is pushing out now. But it’s a single drive for me as I already have one for read and it would probably lighten the load a bit?
Or I could just push a 4th drive in on a seperate volume and use that just for storj. But that requires me to buy a drive earlier than I expect
The BTRFS on Synology should be fine for the Storj load. We have many SNOs, who run nodes on Synology. And now it shouldn’t be worse than running on ext4, but likely would require more RAM.
From your description I can assume, that the RAM should not be a restriction.
Do you have any issues except a high memory usage (it’s kind of expected for BTRFS)?
Yes, 0 is a default for this parameter.
But now it’s not needed to be specified even for a weak devices
my 2 cents to this: i have 2 nodes one weak and one lightspeed fast.
even one unlimited node would eat up all my bandwidh (100mbit) not letting me watch streams.
now both nodes have the “5” - it translates to roughly 25mbits per node, wich is the requirement. nodes get ca.500GB per day per node average ingress->1TB/day.
average disk space used this month is still crazy wrong, even for fast node, so i ignore it.
However, back to the topic. If you do not plan to migrate (and this is OK), you likely need to tune it to be more fast than now.
Yes, Storj was runned as a project, where any HW can work. Fix me please, if I am wrong. And this “hey you, migrate from HDD to SSD right now” is a little anoying. I think.
Yes, I think so, that some optimalization I will need. But what can I do next?
You do not need to touch the parameter, which would directly affect your payout (lower value - less uploads and download requests = less payout), it also abruptly affects customers (they will receive a response “that node is overloaded, it’s too busy to provide you a piece which you pay for”), so it is better to do not change it from a default (0, i.e. no limits). Especially after the “choice of n” is implemented:
The machine have only nonECC RAM modules. It is old gaming machine, no server equipment designed for 24/7. If I will constantly overloaded it, chance to unexpected OS crash rising. I have tested (CPU mining for testing), that one or two weeks it can running, but after the period, many strange things start happening. I suspect platform as is, because, it isnt constructed to constant load.
Solution can be restarting it regularly, e.g. every week, but thanks to filewalker, which one iterration cost few days, it will checking the inventory constantly.
I restarted the node one per month due to installing OS updates, but let it running longer is risky too. I mean, restarting once per month is acceptable for me, due to installing updates, but restarting it weekly will be, that filewalker will running constantly.
which again suggests that you have issues with a disk subsystem. The RAM always grow if the disk is not able to keep up.
How is it connected? What’s filesystem? How much RAM do you have?
First of all, I using non-recommended setup, surprisingly .
I have bunch of old HDDs with various capacity and reliability. So, I must have pairs in mirror for case, that one will go away. This approach safe for me my node many times, btw!
The MDs (pair of HDD with same capacity, but different models) are merge together in one big LVM volume (I have few volumes, but one is allocated for Storj only). Second volume is “root” and third is my small NAS (SMB, FTP, NFS, …), which is basically used very occasionally per week (uploading few photos, watching videos on my desktop …). I mean, near all of storage load coming from StorageNode.
On the volume I have EXT4 with noatime
set, but another attributes are default. Notthing special here, but if you have some specify interest, let me know, I will share it.
Looks like:
$ pvscan -v
PV /dev/md10 VG default lvm2 [465,63 GiB / 0 free]
PV /dev/md3 VG default lvm2 [465,63 GiB / 0 free]
PV /dev/md5 VG default lvm2 [<5,46 TiB / 3,96 TiB free]
PV /dev/md6 VG default lvm2 [1,36 TiB / 0 free]
PV /dev/md8 VG default lvm2 [465,63 GiB / 0 free]
PV /dev/md2 VG default lvm2 [14,55 TiB / 0 free]
PV /dev/md11 VG default lvm2 [465,63 GiB / 0 free]
PV /dev/md9 VG default lvm2 [<7,28 TiB / 0 free]
PV /dev/md4 VG default lvm2 [465,63 GiB / 0 free]
PV /dev/md1 VG default lvm2 [<7,28 TiB / <7,28 TiB free]
PV /dev/md7 VG default lvm2 [<3,64 TiB / 0 free]
Total: 11 [41,84 TiB] / in use: 11 [41,84 TiB] / in no VG: 0 [0 ]
$ lvs -a -o +devices
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
LV-StorJStore default -wi-ao---- 28,78t /dev/md2(3840)
LV-StorJStore default -wi-ao---- 28,78t /dev/md11(0)
LV-StorJStore default -wi-ao---- 28,78t /dev/md10(0)
LV-StorJStore default -wi-ao---- 28,78t /dev/md3(0)
LV-StorJStore default -wi-ao---- 28,78t /dev/md8(0)
LV-StorJStore default -wi-ao---- 28,78t /dev/md4(0)
LV-StorJStore default -wi-ao---- 28,78t /dev/md6(0)
LV-StorJStore default -wi-ao---- 28,78t /dev/md7(0)
LV-StorJStore default -wi-ao---- 28,78t /dev/md9(472282)
LV-StorJStore default -wi-ao---- 28,78t /dev/md5(0)
home default -wi-ao---- <1,80t /dev/md9(0)
root default -wi-ao---- 15,00g /dev/md2(0)
swap default -wi-ao---- 2,00g /dev/md9(471770)
My strategy is simple:
In summary: What do you think? Or what you suggesting? Go to UPS and risk damaged FS due to gamming machine, or go to LVM caching and risk damaged FS due to liveness of SSDs?
If you have better idea, this will be very valuable for me.
Edit:
Again thinking loudly: It is possible somehow serialize the walkers? It seems, that they are running parallel and this exhausting all available IOs.
It is possible somehow to set StorageNode worker count threads?
$ ps -T aux | grep "storj" -c
169
If all threads do some storage operations, storage subsystem is totally overloaded only by switching between them.
Some very small drives in your setup, are you sure it’s worth electricity prices?
From my experience most of HDD failures still allow data recovery, so I’ve decided against any sort of redundancy even for pretty weared down drives. One of the HDDs had a few badblocks at the time I was setting up nodes there. However, I’ve decided to run a large number of small nodes (many nodes per single HDD against Storj recommendations) to sort-of compartmentalize the space in hope that potential badblocks will affect only one of them.
As such, I’m just prepared to run ddrescue
in case a HDD starts getting any non-trivial number of badblocks.
No UPS here. I did get some cheap consumer SSDs with decent TBW for LVMcache and databases. So far it’s fine, I’m monitoring TBW and so far it seems like I’ll not run out of TBW before the warranty period ends.
I do have ECC memory in this box, but only because it was cheaper than non-ECC. Data errors would have to be extra severe to affect nodes. My old desktop had bad memory chips, but it was enough to run the kernel with the memtest
option to guard against using the faulty regions.
So, if I had your box, I’d give up on using drives smaller than 4 TB for Storj (they may make sense for personal use if you can power them down—and if you don’t need them, selling is an option to free up bays for bigger drives). And I’d give up on RAID, instead I’d just accept potential losses if HDD goes totally bad before I manage to rescue it.
many strange things start happening
I have used consumer hardware for 24/7 loads several times and only once I had to replace bad RAM chip that was failing in this sort of usecase. Other than that it was pretty stable. So, I find this a bit worrisome—I think even gaming hardware should be stable working 24/7 with storage nodes.
I want. But I don’t understand HOW you can help? We don’t sit in the same office and don’t communicate by phone if necessary.
32gb ram, so that should I hope be enough. The node does alot of caching.
I disabled recording file access time for now, so will see if that improves.
I also saw that i can use smaller stripe cache size. Not sure how that will affect the rest of my work. Transfering to it now is fine. The gui seems a bit slow since the third drive was added.
Some very small drives in your setup, are you sure it’s worth electricity prices?
Yop, they was free of charge and they are notebooks HDD. So electricity consumption is low. So, if they consider go away, no big deal, I will replacing until I have some one.
Next I will replace them with bigger for reasonable price.
And to be honest, after last “prize change” the Storj is again losses project for me :(. But if I will have more data here, it start to be profitable again.
From my experience most of HDD failures still allow data recovery, ...
Last my crashed HDD was 1TB, which working without any problems, any suspicious SMART attributes … One nice day it stop working totally. Looks like physical damage on the drive, electronic was still alived, but on every command or query it react after few minutes. So, list of SMART status cost half hour.
Thanks to redudacy I dont lost my node.
Thank you for suggestion, but I dont trust to the HDDs and I like to have some redudancy.
As such, I’m just prepared to run ddrescue in case a HDD starts getting any non-trivial number of badblocks.
If I got notification, that the HDD have “pending sectors”, this is reason for temporary release it from array and “zeroit”. Many times this save me some head acks.
No UPS here. I did get some cheap consumer SSDs with decent TBW for LVMcache and databases...
I not have UPS too, but sometimes, when electricity was cutted, I found the FS in bad state. fsck made it job perfectlly and I was still possible to recovering from this disaster. I cannot imagine to have “some unbackup caches”.
My old desktop had bad memory chips, but it was enough to run the kernel with the memtest option to guard against using the faulty regions.
My monitoring server is 20 years old Dell OptiPlex (Core2Duo, 8GB RAM, …), which working perfectlly without any problems during half year (my maximum). I dont know why, but I am thankfully to have that reliable monitoring server and I will replace it only when it will be unrepairable.
So, if I had your box, I’d give up on using drives smaller than 4 TB for Storj (they may make sense for personal use if you can power them down—and if you don’t need them, selling is an option to free up bays for bigger drives). And I’d give up on RAID, instead I’d just accept potential losses if HDD goes totally bad before I manage to rescue it.
My node is on duty more like 5 years. Without redundancy I will stopped every year, due to some malfunction and never fill it up to got some reasonable prize.
Thank you for hint, I will using my redundancy as long as possible.
I have used consumer hardware for 24/7 loads several times and only once I had to replace bad RAM chip that was failing in this sort of usecase. Other than that it was pretty stable. So, I find this a bit worrisome—I think even gaming hardware should be stable working 24/7 with storage nodes.
This is reason, why I writing about platform, no about specify components. RAM modules are working without any problems, CPU and power supply as well, but when I let it running on full thortle, I have tested, that after ~2-3 weeks kernel start reporting strange and random problems. Without restart the machine crash after few hours.
If I let it running on ~75% load, the situation repeat after 2-3 months. Longer without restart it is impossible.
Thank you for the tips, but this isnt helpfull for me.