It’s a public forum, so im not going to entertain that request.
Node consumes ram to save the day on potato hardware. It’s better than dropping requests. On non-potato hardware, node consumes under one 2G of ram, and that is pretty stable. It’s not a bug, it a behavior by design. It’s not an issue on properly configured disk subsystem.
OOOm killers are needed to maintain system stability. They operate on a different level.
Did you read what I wrote above?
Please explain to me (but more importantly, to yourself) how are you planning to serve needs of multiple customers with 200 IOPs budget. Not based on just wishful thinking of developers pulling off some magic, but in actual technical words and numbers.
To reiterate, you either [get storj to] throttle [requests to] your node, making it for intents and purposes useless; ( it’s not interesting nor high priority create the whole load balancing system for potato nodes; not at this stage).
Or you fix your disk subsystem by a number of ways discussed in this thread to make it tolerant to high io workloads. I’m not going to repeat them here again.
The bottom line: if you only can offer 200 iops — might as well shut down the node.
To put it more directly: The whole thread is a non-issue if you setup your hardware correctly. Instead you expect storj to workaround your improperly configured system .
braaah, no, You don’t need more memory, my VMs have 4GB, 1 disk = 1 node. its just the the RAM’s speed (and controllers)
When i have 1 VM Running, it counts small files in 30min, when i run 13 of them, it takes like 60-70h for same work. Gotto be phsysical limits of the machine with 4 x 16GB 3200Mhz RAM, and some optimalization in newer controllers also can help! (so i gotto try newer versions soon)
Edit: no, i mean 60-70h, for filewalking like 6,5TB of storj’s current files.
Yea 1 machine. I use VPN’s, coz its the easiest way to setup all that networking.
(nothing to do on my router)
I’m suggesting how to try to solve the issue which your node have (mines three (2 Windows Docker Desktop - worst what I can imagine, and one Windows GUI) - do not have, however YMMV).
If the disk subsystem is slow, it could have more active threads than designed, so you may solve this by this suggestion, it’s up on you.
I would say, that I never used RAID for Storj and my nodes operates normally, even on Windows. Well 2 are Docker Desktop for Windows (what could be worse?) and the GUI one.
No big problems so far (except rare “database is locked” and seems only for the expiration pieces database so far), I still have databases on the data drives.
This is a number of threads (processes) limited by the Linux kernel accordingly the default configuration, the CPU is not involved.
In this case it’s caused by a slow disk subsystem, unfortunately. This is why we perform a stress test - to show problems in the setups.
Not necessarily. My system has 32GB, which are barely used (~10TB). Everything is depend on how fast your disk subsystem can operate. My disks are connected via SATA and I do not use any RAID or caching device (on Windows it’s close to useless, unless you will use some advanced cache software).
I use Windows specifically, because most of Operators are using it, I have had a Linux one on Pi3, but the SD card is died and I’m far away from the installation to fix this simple issue. However, it worked perfectly fine even with 1GB of RAM.
If you store data on HDD you will always have 200 iops limit. You can buffer somewhere before but you will explode your buffer. I think nobody saves data on SSD (yet)
And that’s precisely the point. You have hard ceiling with hdD, and therefore need to be wise in how you spend this limited HDD IO budget; this requires some configuration, beyond plopping the disk to raspberry pi and hoping for the best — it’s just waste of a good hdd.
for example:
sending all small bits of data, including metadata, to a small SSD, leaving HDD to handle large-ish blobs. This will offload majority of IOPs and may be sufficient on its own. The goal is to make seek time small enough compared to data transfer time, to keep disk busy doing data transfer as opposed to moving heads around majority of the time. I would start there.
batching and coalescing writes together in some sort of transaction groups (ZFS does it). This further reduces IO by combining multiple writes into a single request.
doing all other filesystem tweaking discussed here multiple times
avoiding databases sending io to disk, if for some reason there are still there.
First, estimate the number of pieces you expect to have.
You can look at the average segment size here. As of writing this post this is 7.25 MB.
Divide it by 29 (the number of pieces required to reconstruct a segment) to get average piece size. We get 250 kB.
Divide your allocated disk space by the average piece size. For 1 TB this is 4M pieces expected.
Now, estimate the amount of RAM you need per file. These numbers depend a lot on your software stack, as any additional layer (like storage spaces, RAIDs, whatever) adds their own requirements. Assuming you have a file system set up directly on a raw partition of a single HDD, this would be:
For default ext4 this is around 300 bytes (inode + direntry data structures).
For default NTFS this is probably around 1kB.
You multiply the number of pieces expected by the amount of RAM you need per file to get the estimate.
Remember, this is an estimate of the amount of RAM that needs to be free after your OS, the node itself (assume less than 0.5 GB), and all other software running on the same system take their chunks.
The idea being the amount of unused ram, available for caching, shall be enough to fit metadata.
How much exactly – depends on your filesystem. Metadata sizes differ, granularity of caching differ, caching implementation differ. Some filesystems don’t take advantage of all available ram at all – e.g. NTFS.
All I can say is for a 10TB node on ZFS 8GB is probably, 128GB is definitely overkill, 32GB seemed OK, I saw almost all metadata fetches come from ARC
I understand that a slower node is likely going to have a higher load average, as it will take longer for requests to be serviced, so there will be more requests in flight at any given moment.
However, why not just set a reasonable default to max-concurrent-requests? I can’t think of any good reason for there to be thousands of requests in flight on a given node, because it would just lose those races anyway and the SNO wouldn’t earn anything from them. If anything, it’s just going to spiral downwards, where the increased load causes them to lose even more races.
Edit: apparently, it used to have a default, but was removed at some point? Why? There can’t possibly be a valid use case for allowing >1000 concurrent requests.