Should I change max-concurrent-requests? >> Note: The implementation has changed. You shouldn't use this setting any longer. Leave it commented out

Toyoo · July 12, 2020, 7:20pm

Yes, it is. If the choice is to have a limited node or no nodes at all, then the former is still preferable.

Pac · July 12, 2020, 8:50pm

Okay, I’ll start testing some values then.

I do not want to touch the config.yaml file as I’d rather keep all settings in an external utility sh script that I can version (with git). With that in mind, my undesrtanding is that I can also set parameters directly in the docker command as follows:

docker run -d --restart unless-stopped \
    -p xxxx:28967 \
    [...]
    --mount type=bind,source="xxxx/storj",destination=/app/config \
    --name storj_node_1 \
    `#--log-driver none` \
    storjlabs/storagenode:beta \
    --filestore.write-buffer-size="128KiB" \
    --storage2.max-concurrent-requests=5

Right?

BrightSilence · July 12, 2020, 9:19pm

This setting counts all active requests towards the limit but only rejects uploads/ingress.

Using this setting can break uploads for customers. So it think it’s use should be avoided in any scenario.

I posted a suggestion for this a while ago so SNOs could have a way to limit things so uploads don’t break. But it hasn’t yet been picked up.

The topic got automatically locked so I don’t think you can vote for it anymore.

Pac · July 12, 2020, 10:30pm

Oh my, why is it locked? Very interesting exchange, and there are amazing ideas and explanations in there.

So what… Should I just GE my SMR node?
For now, I’m kind of keeping a close eye on it whenever heavy tests (ingress) are in progress, and when the load average goes tits up, I stop it and reconfigure it so it does not accept anymore data…

There must be a way to implement something that would make sure nodes don’t crash by putting them on hold for a little moment (a few minutes) to give them time to recover when the disk is stalling because of writes… surely? ^^’

BrightSilence · July 12, 2020, 11:03pm

Back then suggestions got locked after 3 days of no responses. That’s no longer the case but it was when I posted this one.

Honestly, when it comes to this setting, your interests and the interests of the network/customers aren’t aligned. I can’t answer this question for you. Do whatever feels right to you.

Pac · July 13, 2020, 7:58am

Then I take it it was a misconfiguration of the forum software. Isn’t this a good reason to ping someone at SotrjLabs so they unlock it, to give it a chance to be upvoted more, if people feel like it? (in fact, maybe they should unlock all ideas that got locked this way).

It is sad to configure something that is going to reject requests, but if there is nothing else that could potentially make the node work without crashing… I’ll run a few tests see if it changes anything.

Thanks for your insights

SGC · July 13, 2020, 10:15am

i used max concurrent for a long time until very recently… it did help my system manage much better… but after i have turned it off… i get a lot less errors in my logs…

best way to make your SMR drive run better is putting up a few more nodes… 1 or 2 will make a massive difference, because the work load on your subnet is split between nodes… thus if you have two nodes… the smr will only have 50% the workload… and at 3 nodes only 33% obviously

max concurrent can be a stop gap measure if nothing else works, but it will make your node run weird even if it will protect your hardware…

with max concurrent i would have something like 0.2% - 0.5% rejected if not less
been a while… and it was usually during boot or super high traffic peaks…

but i cannot recommend using max concurrent if you can avoid it…
something is weird when using it…
these days after everything is fine tuned on my overkill system, now i can run for 22 hours sometimes without a single error in the logs…

Pac · July 13, 2020, 2:46pm

I already have another node, but it is smaller and is now full. And I do not have any spare disk at hand and do not want to invest in more storage. Besides, that is not a great solution for all the people having SMR drives… we should find a more long-term solution that people can apply to their node when facing issues with SMR drives (or any other technology not keeping up with what the Storj network is requesting).
Ideally the software should handle that itself, but it’s way more complex to say than to do

Well it depends what errors we’re talking about. Obviously “upload rejected, too many requests” errors are to be expected. I also see a lot of “download canceled” which simply means that someone sent files faster than me, so I’m not sure they are concerning. Did you face any other errors?

As a side node, I really think these shouldn’t be flagged as “errors” in the logs, but as “warnings”.

Anyways in my case, the node ends not responding or being killed by the kernel, so… I’d rather have errors in my logs than a as-good-as-dead node. I’m currently trying the “max-concurrent-requests” option set at “5”. The node is rejecting a lot of requests, but it’s been running for almost 7 hours with a load average of 2~3 and a i/o wait of 20~30%. Doesn’t seem ideal, but if it stays stable, I say good enough, as it’s supposed to be a problem only during massive ingress periods.

BrightSilence · July 13, 2020, 2:59pm

If your node is rejecting uploads, that is the cutoff for when it could become a problem for customers. You don’t need more ingress than that for it to be an issue.

SGC · July 13, 2020, 4:01pm

well i think my setup using 3x raidz1 peaks at 4000 iops maybe a bit less… afaik smr reduces at about 40 write iops when “overloaded”
so basically what my setup will do in 1 minute would take yours 100minutes…
granted regular single harddrives are more like 400 iops… so that only a 1 to 10 factor… but really not much to do about that… aside from not overloading the smr drive with writes…

which is easy at first, but when they are at over 30-60% capacity they start to have trouble with write speeds because they utilize writing some stuff like regular hdds and then writing it correctly to the drive later to optimize space usage…

this process can take hours after the initial copy is completed…

consumer SMR drives are not designed for 24/7 operation… and unless the manufactures find a way around the current issues… they never will be…

ofc there are many different SMR technologies on the market, and not all are terrible… but the consumer versions are terrible at 24/7 or high sustained writes

i ran my server on 14 max concurrent i think and ended up at 20 i think… something around that mark… you don’t want it to reject a lot… just enough that your disk can keep up… i would start at 15 then go lower if need be… use the disk latency to determine that… the latency should go off the chart when you overload the drive… or your write speeds will drop immensely

at 20 max concurrent i might reject 20 in a day… but long term it made the world of a difference for my node… not sure exactly why… maybe because incoming requests are at a certain pace and when latency starts to go up… the incoming is still the same… and then you start getting more cancelled, which is still hitting your system… and slowly but surely latency keeps going up until its choking when the system cannot keep up…

Pac · July 13, 2020, 5:01pm

That’s what’s happening right now I think: It took around 9 hours to get there, but the load average is going up. I though it was stable around 2.5, but it’s now above 4, so it’s slowly be surely getting worse, hour after hour. The disk is now taking more than 500ms to respond to reads and writes.
So I guess a max-concurrent of “5” is still too much. It feels pretty low already though

I’m not sure to see what you’re getting at, what are you suggesting?

It feels like whatever I write on this disk, it will eventually fail at responding as long as it is asked to write stuff continuously, which is basically what’s happening right now on the tardigrade network apparently.

… should I try to lower the value again?

BrightSilence · July 13, 2020, 5:18pm

What I meant is that if you and enough other nodes reject a transfer, the upload will fail for the customer. You implied that it wouldn’t be an issue unless ingress is high. But for customers rejecting can be an issue.

I don’t know how to fix a node on an SMR drive. They’re simply not really fit for storj. Adding nodes on other disks can fix it but isn’t exactly a viable option for many SNOs. I don’t see a solution that doesn’t hurt customers. So yeah, up to you.

SGC · July 13, 2020, 6:16pm

if 5 won’t do it… then i doubt anything will…

you could try some sort of tiered storage arrangement… read speed isn’t the issue … its storing the data fast enough…

move your database off the smr drive… basically move anything you can off the drive and keep only the storagenode data required on it…

SMR drives can read and run iops equal to regular hdd’s… so really all you need is time to write in the data at reduced speed…

something like microsoft storagespaces or similar stuff can do tiered storage…
basically the idea in that is that the storagespaces software classes the drives… and when a drive is slow it’s going to become a long term storage drive… and stuff that isn’t really used is moved to that…

but setting up such solutions might be more work than it’s worth… you may simply be able to buy a new drive and sell the smr to somebody else for a near equal price…

smr is basically useless in some workloads… and trying to make the work might just be to give yourself a lot of future pain and agony…

if you are at the 6 month mark i might just graceful exit… ofc that ends a perfectly good node so if you want to be a SNO you might aswell migrate it…

i cannot imagine a way around it… the problem is this…

you got lets say 10mbit incoming random data… and your hdd can write with maybe 8mbit of random data… either you need to store the data until the disk can catch up…

but the ingress goes for weeks at a time… so a couple of 2mbit how bad can that be…

well right now we are getting like 100gb a day… so 20% is 20gb a day… so 10 days is 200gb
well the math works you could easily store it on some type of tiered storage setup…

but aside from that… you cannot store data faster than the disk will take it… unless if you add more nodes…

Pac · July 13, 2020, 7:21pm

I tend to agree. I did not stop my node or change the setting though, and its load average is fluctuating between 2 and 4… it’s definitely more stable than without this parameter set. I’ll leave it as is for the night, we’ll see what happens; even though I think @BrightSilence is right about the fact it makes the network less reliable, but it’s that or completely shutting down the node… so…

Sure. But in fact SMR drives usually can write at a decent speed even when overloaded, as long as it’s big chunks of data. But with Storj, it’s only many many little pieces, that makes it really hard for the disk to keep up, apparently.

Anyway, my disk will be full soon, so… the problem won’t be solved, but at least it will go away until massive number of files get deleted.

Thanks a lot to you all for all your insights and pieces of info
It feels like there is no golden bullet that would solve problems around SMR drives, my understanding so far is that they just are not appropriate for use as STORJ storage disks.

At least I’m learning many things

BrightSilence · July 13, 2020, 7:25pm

If you have another HDD you could move the databases to a different HDD. That would almost certainly fix the issue.

SGC · July 13, 2020, 7:35pm

SMR drives have recently been a very popular topic in tech media, due to some law suits against disk manufacturers that sold SMR as NAS HDD’s something which they are extremely poor at… for a long time it was refereed to as archive drives, i think the tech is from 2014 and its terrible for some workloads… like NAS and RAID … works fine for most casual consumers tho… for plain storage… i know people that use them and have no trouble with them…

tho occasionally they will lose a file if the drive is shutdown before it’s finished its… post processing… relocation of data… whatever one will call it…

in the benchmarks and other such tests done on SMR drives… they would when overloaded give write speeds of 700kb/s… and for rebuilding raid arrays a 1 day operation would go into taking like 9-10 days…

well bright might not look at it the same way you do… sure a working node is better than a non operational node… also keep in mind that you can damage you hardware from overloading it… or atleast cause extra wear due to temperatures and what not… especially in tech thats not built for 24/7 operation.

since the max concurrent does have a disruptive effect on the storagenode, it may lead to stuff like failed audits or other such issues… suspensions or DQ in extreme cases… especially if you node is right on the edge…

i did run mine with maxconcurrent for like 2-3 months… so wouldn’t worry to much about it…
but i didn’t reject many… and i had many weird errors… which i today attribute to limiting max concurrent.

i don’t think it will affect customer data, but it may… bright knows the code much better than most

BrightSilence · July 13, 2020, 9:28pm

Where did you get that information? Unless we’re talking about a shutdown while data is being written, which could damage data on any HDD, this should not happen after data has been successfully written to the disk.

RAID arrays actually should work just fine in most cases. Normal RAID levels use linear writes which works well on SMR HDDs with barely any slow down. This has been tested. However I would still agree that it’s best to avoid them. And with ZFS they simply can’t be used.

I’m not in this situation. I tend to be upfront with my opinion, but in this case I really don’t know what I would do. So in this case I just provide information and let everyone decide for themselves. Well that and I make suggestions of how storj could provide an option that doesn’t hurt customers. I understand both sides very well here, so when I say it’s up to you, I mean that. No judgement from me.

I haven’t used this option in a long time, so I have no personal experience with this. While I like to discourage the usage of this setting to some extent, I don’t know of any examples like what you’re mentioning here. Rejections only happen for uploads and should in fact lower the chance of your node being overloaded. I would not expect any impact beyond that. Do you have any examples of errors you saw related to this?

SGC · July 13, 2020, 9:48pm

i use to have a lot of errors… nothing major but like when the system did garbage collection it would act like it ran out of allowed concurrent and run into the db locked thing… that seems to have completely vanished now that i set max concurrent to unlimited…

caused a lot of weirdness which seems to have all gone… ofc could be due to other stuff it’s not easy to say without really working with it across a ton of systems… worked fine for me… works better now without it… but before i had to use it because i had a drive that was “bad” was running sas with sata which made everything weird… so ofc that doesn’t help the conclusion either…

in regard to the SMR the case of lost files was on USB drives that was disconnected to early after a file transfer claimed it was finished… and seems to be repeatable…
so not very relevant in most cases… but one of those annoying things after one losses a few files to it… so something to be aware of can happen with them,

i forget how server the home did their test of SMR drives but sure didn’t run well and it seems the general concensus ofc there are many different kinds of SMR drives the consumer grade DSMR drives or DMSMR can’t remember … drive managed, and then there are system managed and maybe controller manage or something like that… the latter two are fine for enterprise raid, because the raid controller is built for using smr tech and simply accounts for it… also if you have enough smr drives it shouldn’t matter… would also run with zfs i’m pretty sure…

its the whole single raid6 or raidz2 setup where one has only single disk worth of io, which then can give you whole array a limit of like 40 iops which is … well not much, ofc so long as you don’t end up choking the array it will run fine… also the real thing that pissed sysadmins of was the rebuild times on the NAS marketed SMR drives… afaik it doesn’t do well in rebuilds no matter what one is running … which increases the limited redundancy time greatly and thus up’s the odds of secondary bit failures and thus data corruption

Alexey · July 14, 2020, 8:51pm

should be
storjlabs/storagenode:latest \

Pac · July 14, 2020, 9:22pm

I thought we would receive an e-mail notification when the time to change that comes.

But alright, I switched to this tag, if Leader @Alexey says so
Thx.