Node goes down/restarts every 10-15 minutes. thread allocation error in logs

Alexey · May 5, 2024, 12:54pm

You can… but it would affect customers, so I wouldn’t guide you how to harm customers, but this option is exist…

kocoten1992 · May 5, 2024, 1:23pm

unlimited would also hurt customer, what if there is a malicious customer ?

Does storj have protection against attack like https://www.cloudflare.com/learning/ddos/ddos-attack-tools/slowloris/, if not unlimited is a disaster, node will die left and right and it will cause instability within storj network, a high limit like 1000 or 2000 make a lot of sense.

Alexey · May 5, 2024, 1:41pm

Unlikely, see the white paper: https://storj.io/whitepaper and the section B. Atacks.

kocoten1992 · May 5, 2024, 2:00pm

I just skim through, I didn’t see attack on network protocol (tcp stack) or related, the idea is this: when a customer request a file, nodes will race to send pieces to customer, but the customer never intended to actually receive pieces, they will ack 1 tcp packet from nodes every 5s, but node already prefetch piece from filesystem and keep in memory - the connection between node and customer will be keep almost indefinitely, and nodes crashes like wild fire (run out of memory) - this is all just hypothetical, no idea if this would work, seeking advise from @Toyoo.

Alexey · May 5, 2024, 2:11pm

if the file is smaller than a Metadata to store it, it will be stored directly on the satellite, so this request will not reach any node.
And there is a timeout, the customer cannot keep the connection open indefinitely, it must receive or send something, otherwise it will be dropped. By the satellite in case of an inline segment or by the nodes in other cases.

kocoten1992 · May 5, 2024, 2:22pm

Assume the file is big, 20MB, and as I wrote earlier, customer did receive/send data, just very slow, 1 tcp packet per several seconds…

P/s: if the file is small, would satellite also subject to slowloris?

consumerbot797287 · May 5, 2024, 2:49pm

I’m not ungrateful, but it seems it wasn’t a good suggestion–if it’s a symptom of disk IO not keeping up, so giving more processes/RAM isn’t a solution at all, unless it’s a only a brief spike in traffic.

Respectfully, I judge this as really bad behavior by node software. There should be a (default) upper bound for RAM caching of requests. It could be (for example) 1GB per TB of node capacity. Or some percentage of system RAM. Once that’s consumed, node could reject further requests until low water mark is reached. Or even gracefully shut down. You can even penalize with “offline” time, and even send email to the SNO as you’ve already implemented for node downtime. But you seem to shoot yourself in the foot with current behavior, as the cached requests are lost when the container gets killed. Right?

Makes perfect sense. But there must be a better way to handle problems in setups than letting RAM consumption (even filling up swap!) and/or process count run away until container gets killed. Although I suppose it’s an effective way to bring problems in setups to SNO’s attention!

You confess above that if node uses SMR drive, node software will completely exhaust the server’s RAM/resources with no apology. But at present, there is not even a mention of HDD type or speed in the recommended hardware doc. (or available RAM?!)

There could be a simple solution of recommending in docker run command doc to use suitable --memory-reservation or --memory flags to avoid risk of killing the server from underperformant disk.

Alexey · May 5, 2024, 2:50pm

on the node side there are several parameters to handle this situation, like

PS C:\Users\User> ./storagenode setup --help| sls slow

      --storage2.min-upload-speed memory.Size                    a client upload speed should not be lower than
MinUploadSpeed in bytes-per-second (E.g: 1Mb), otherwise, it will be flagged as slow-connection and potentially be
closed (default 0 B)
      --storage2.min-upload-speed-congestion-threshold float     if the portion defined by the total number of alive
connection per MaxConcurrentRequest reaches this threshold, a slow upload client will no longer be monitored and
flagged (default 0.8)
      --storage2.min-upload-speed-grace-duration duration        if MinUploadSpeed is configured, after a period of
time after the client initiated the upload, the server will flag unusually slow upload client (default 10s)

Alexey · May 5, 2024, 2:50pm

on the node side there are several parameters to handle this situation, like

PS C:\Users\User> ./storagenode setup --help| sls slow

      --storage2.min-upload-speed memory.Size                    a client upload speed should not be lower than
MinUploadSpeed in bytes-per-second (E.g: 1Mb), otherwise, it will be flagged as slow-connection and potentially be
closed (default 0 B)
      --storage2.min-upload-speed-congestion-threshold float     if the portion defined by the total number of alive
connection per MaxConcurrentRequest reaches this threshold, a slow upload client will no longer be monitored and
flagged (default 0.8)
      --storage2.min-upload-speed-grace-duration duration        if MinUploadSpeed is configured, after a period of
time after the client initiated the upload, the server will flag unusually slow upload client (default 10s)

kocoten1992 · May 5, 2024, 2:59pm

Hi @Alexey, I mean this:

Default config matter a lot, it help me as an operator peace of mind collecting a few penny instead have to fire fighting this, I hope you understand .

Alexey · May 5, 2024, 3:11pm

you are correct, however, it’s what you can do to improve the situation without a deep reconfiguring of your setup, right?

this is not possible in the wide OS, FS and platforms implementations, where the node can be run.
Your OS and the platform must be capable to handle the load, otherwise it will be a looser in this race. Yes, it’s rude, but it’s true. If your setup is not capable, when many are capable, then, well, your setup is not so good as you expected and you likely can improve it, for example run less nodes on the same HW.

perhaps. However, why my nodes are not killed and not used so much RAM?
They are simple bare metal nodes, 2 are Docker desktop for Windows, and the one is GUI service, all works perfectly. But there is no RAID, no VMs on a foreign OS and no complicated FS, the one node - one disk, nothing else.

that’s true. We do not want to put limits to our docs too aggressively, we have a section in the prerequisites though: Step 1. Understand Prerequisites - Storj Docs, but only regarding the filesystem choice, because it’s not 100% confirmed, that storagenode cannot be run on a SMR disk. And all workarounds for them are known: run more nodes in your subnet, and/or limit a concurrency (the worst choice, but applicable).

we have it for the Raspberry Pi nodes. It’s not expected to place this limit on a PC/Mac nodes, unless you are trying to squeeze as much as possible from your hardware. In the latter case you got a signal - you need to relax your rules and either run less nodes on the same HW, or bring more resources on it.

Toyoo · May 5, 2024, 5:03pm

Though indeed, no good safeguards in storage node code right now. Besides, DoS would probably be better handled at a lower layer.

consumerbot797287 · May 5, 2024, 5:07pm

First, thank you for your reply!

Wrong. An analogy: if I am spending more money than I earn, you recommend me to go to the bank and take out loans until they refuse. If I don’t spend less (or earn more), taking out a loan makes matters worse. In this instance, it would worsen the situation by tying up even more server resources before running out. Only in the specific case of a short spike in demand could it possibly be helpful to allocate even more resources.

If that’s really true, then perhaps a generous docker memory limit should be recommended (ie max of 50-75% of installed RAM).

I wasn’t the only one who experienced this behavior. I guess it’s because you manage to divide the test load between 3 nodes? I have 3 nodes with 3 disks, but 2 of them were full, so complete load went to the last one: a single ext4 partition on 2.5" SMR connected via USB. And even that looked okay until the demand test was activated. No RAID, issue occurred with VM shut down and only <3 of 8GB in use. So clearly a 2.5" USB SMR isn’t enough for a single node on a subnet, not for anticipated demand. Solution isn’t to allocate more RAM or processes, it’s to add more nodes in subnet, migrate to a faster disk, or shut down the node, right?

But instead of saying “works fine for me”, maybe you should see us as canaries in the coal mine? I can imagine a snowball effect–if too much demand on network, weaker nodes get brought down. Demand will increase on remaining nodes, taking down the weakest of those, increasing demand on the remaining, ad nauseum.

I understand you don’t want to drive potential SNO’s away with high barrier to entry. But why not suggest avoiding SMR? Or at least understanding the ramifications of their use.

In principle, it would serve as a backstop before node swallows up all of the server’s resources–a protection from misbehaving software, instead of relying on OOM killer. Interesting that you recommend this for RasPi. My server would have been RasPi, but they were rare/expensive during COVID, so I ended up with AMD SoC thin client–still very meager hardware. One might even call it a potato.

I prefer rude truth over polite lies every time, so thank you. I get some conflicting messaging, though. Storj doesn’t even want to mention avoiding SMR drives, offers specific advice for running on RasPi, but then blames SNOs when their hardware isn’t good enough to avoid encountering runaway software?

I maintain my stance that node software doesn’t handle an overburdened/underperforming disk correctly. SNO’s will need to improve their hardware, but also node software should handle those situations better, and have reasonable default settings rather than overly greedy ones (everything unlimited).

mattventura · May 5, 2024, 5:13pm

How? Past a certain point, setting a higher concurrency limit does not improve the throughput of a node. It will only cause the node to fall further and further behind. At some point, it has such a long queue depth that it would be losing every single egress race and provide nothing of value to the network. Whereas, if it were limited to a reasonable amount, the node would have a chance at contributing.

It’s true that you can somewhat improve HDD throughput (at the expense of latency) by queuing several requests, but that only works up until a point. After that point, you’re just trashing latency for no gain in throughput.

Part of the reason to perform a stress test is to make sure that the system fails gracefully when overloaded. I don’t think the current behavior fits that bill.

Yeah, I agree with this 100%. Part of maximizing performance in a system like this is recognizing that not every node will perform the same, and to handle overloads in a much more graceful manner.

If the system were designed to handle this more gracefully, then a node that performs at only 25% of a typical node would only contribute approximately 25% of the throughput of that typical node. Instead, it ends up contributing 0% because it just gets tipped over completely.

Toyoo · May 5, 2024, 8:04pm

Avoiding SMR drives was suggested many, many, many times on the forum. For some time almost all threads complaining about performance started with the question whether the drive is SMR.

I really wonder though why this suggestion was not added to the Prerequisites page in documentation. Bad performance on SMR drives is known for years now.

kocoten1992 · May 6, 2024, 1:22am

That would mean they admit unused, reused was a far fetch dream, they shatter their green ideology, they would fight with their life on the line before admitting that.

Toyoo · May 6, 2024, 1:45am

You’re dishonest, or at least ignorant.

Firstly, taking advantage of unused resources is still the goal. Nobody claims we’re there. We’re not good at it yet, but becoming better with time. With the right software even SMR drives will be good enough, but it takes time to figure it out.

Secondly, nobody 5 years ago would predict what kind of crap will hard disk companies throw at the unsuspecting market.

Thirdly, nobody cries that we’re not trying to reuse 20 year old hardware either. It must be reasonable to draw a line somewhere and say, sorry, no, this is too slow. Just as we draw a line at 500 GB as the minimal reasonable amount to share, there’s also a certain level of performance to expect.

kocoten1992 · May 6, 2024, 2:21am

That was your point, I just spice thing up a notch so that they will add SMR to avoid recommendation, but maybe this go a bit overboard, so I’ll hide old comment.

consumerbot797287 · May 6, 2024, 3:23pm

@Alexey already answered a few posts up:

So it looks like when I set up my node, I should have spent some hours reading in the forum, instead of following the documentation.

I’m 100% okay with this. But Storj should be candid about it, including recommending a minimum amount of RAM/TB of storage, recommending against SMR drives in their documentation, etc. Additionally, since node software doesn’t handle the slower hardware gracefully, docker --memory flag belongs in the docs, too.

Intentionally leaving that stuff out of the docs, making software greedy without apology, performing stress test, then blaming SNO for using insufficient hardware all feels like an abusive relationship.

Toyoo · May 6, 2024, 9:16pm

Yeah, this is not an answer given that the page has two sets of requirements: minimal and recommended. I don’t see any reason not to include recommendation against SMR drives at least in the latter set.

Yep!