Updates on Test Data

Alexey · June 1, 2024, 11:52am

Yeap, this is why we suggest to use a --memory docker option…
See

Just use the value, which you have after the OS is fully booted:

free -h

Ruskiem · June 1, 2024, 1:11pm

No. Not yet. (But there will be time, when data inflow will be 13-15 times of what is now.)

I don’t complain about the traffic. i love it. and want more.
Just feeling obligated to report any unstable node behavior or errors i find in order to help improve. The testing never ends. I hope You welcome feedbacks?
better have too much than none, right?

So far only 1 node crashed yesterday(OOM).
There was periods where multiple people reported problems with overgrowing RAM usage by node process, easy to find by “OOM” phrase.
What if that’s some flaws in software and not hardware?
If so, reporting precisely when it happens can immensely help You to find the cause.
Hope its hardware, but don’t looks like to me.
Also @Mitsos reported 5 times bigger RAM usage around that hour, and thats bare metal?!
i would want to know more to investigate! (Enough of making a scapegoat out of VMs )

For example in current moment the traffic is same at ~30% load, and the RAM usage is normal, like ~500MB.

Yesterday, i was sitting and passionately watching test live, and that problem occurs only between short period of time, means there was something specific done to cause this.
Would like to know what was it. Or even we don’t have to know, But i insist You don’t ignore it,
as it really made RAM grow craaazy and no signs of Disk being slow or network overloaded.

Alexey · June 1, 2024, 1:33pm

This instability is influenced by using a VM (and this is known and considered as a user error), my bare metal nodes doesn’t have any issues. Would you like to try? And please, do not use Windows…

unlikely, otherwise we would have +20,000 more complains.
I wouldn’t report any issue related to usage of VM, sorry… Especially VMWare + Windows.
I consider VMWare + Windows is a dead end. Either migrate or shutdown. Sorry. Too many issues out of blue for this combination. If your host is Windows - purge VMWare and use a Hyper-V, likely all your issues will be in the past. If your host is Linux - use anything other than VMWare and Linux as a guest. Or do not use any VM (the best choice) and use docker instead.

Ruskiem · June 1, 2024, 1:38pm

Ignorance at it’s finest!
@Mitsos help, or finish me and my VMs once for all!

agente · June 1, 2024, 1:52pm

Maybe I have more nodes than th3van

Mitsos · June 1, 2024, 1:55pm

The 5x RAM usage wasn’t due to buffers or cache, actually both of those fell at the same time that used RAM went up. As far as I can tell, it’s not an issue with the hardware side keeping up (otherwise buffers would have shot up) or trying to cache something to serve it faster. Mind you I had GCs and trash-cleanups running as well.

Picking a random node, I see ~1GB used RAM. That value was way higher when RAM usage shot up. I haven’t had any node getting killed with OOM (on this test, had them killed in the early testing days).

I don’t think it’s a bug or something, otherwise usage wouldn’t have come down (ie in the case of a memory leak). I think whatever storagenode was doing with the data, that’s what was causing high usage. I’m not a programmer so I can’t troubleshoot that any further, just reporting on my observations.

Alexey · June 1, 2024, 2:32pm

Then you need to replace your “potato” router to a something more enterprise-grade?

Roxor · June 1, 2024, 2:47pm

Wasn’t @agente pushing 2Gbps in one of the last tests?

agente · June 1, 2024, 2:59pm

I have a Microtik CCR2004-1G-12S+2XS.
Btw we are in the dark side of storj for freely speak. How many SNOs like Th3van in the project?
I transform into him for a moment. I built a setup for 100 nodes. Am I going to maximize performance? Of course not! I know how to push the performance to the limit? of course yes! I managed many CDNs in the past (probably). Storj was not about pure performance. Let’s build e conservative (for money) setup. Jbod? 12g? Naa… 6g used is ok. Cache? Nah. just a bunch of ram (1TB). Exotic fs? Nah… ext4 will be good. Storj is about storage… we are fast all togheter and we are not a CDN or a proxy.
Today something big changes (maybe big customer arriving). They are going to ask PERFORMANCE! Storing fast… deleting faster! We can see 40mbit up per node with a total of 4gbit in th3van case. Filewalker, GC, Cleanup of many TB every day… ok. I was wrong. I need a bunch of SSD to cache everything and work on FS migration soon!
This is THE NEW STORJ? A project more oriented to CDN than “old slowly customers storing photo album and whatsapp backups for years”?

Andrew · June 1, 2024, 3:11pm

Guess we all should leave only 3 nodes and not use a proxy. At least then we could write in every topic that our 3 nodes are fine. Also, I guess the size of the Storj network would then be 1 PB. And many big clients with exabytes of data will run into Storj then.

Roxor · June 1, 2024, 3:16pm

I think used SAS2 JBODs full of refurb HDDs will be fine. Maybe the systems running the nodes will need a bit more RAM. No exotic filesystem setups: keep it simple: stick with the conservative configs.

There seem to be a least 100 large SNOs that will always expand when they fill disks: so performance and capacity should always be covered.

Toyoo · June 1, 2024, 3:39pm

Customers will bring money that will help fixing bugs.

Are people who complained about lack of traffic the same people than now complain about too much traffic? It not, then it shouldn’t be a surprise.

My ISP’s router, which I cannot change, has some problems dealing with the number of connections Storj generates. It sometimes just locks up and only a hard hardware reset helps. Recently it happens almost daily due to all the testing. It’s ADB Epicentro. It seems I do have an almost full admin access to the router, so I’ll probably figure something out, but it is a bit of a problem now.

The power of 2 choice is not helping here on its own, because there are no early indicators before the lock-up: node works just fine, then suddenly no new connections are handled altogether. All I can probably do node-wise is to lower the max number of connections limit, but it’s not a full remedy either unless going to extremely low values—connections seem to be lingering on the router long after they are closed by the node and stop counting for the limit.

A single non-full node is enough to trigger this behavior.

Alexey · June 1, 2024, 3:52pm

I can only suggest to switch it to a bridge mode then use your own a “cabbage” router instead of ISP’s “potato”'s one.

ACarneiro · June 1, 2024, 3:57pm

Obvious question, I know, but does it have a Bridge Mode you can enable and get yourself a decent router instead?

Toyoo · June 1, 2024, 4:02pm

Yeah, that’s one of the ideas I’m considering. I would prefer not to run two devices though, my network closet is pretty tight already.

snorkel · June 1, 2024, 4:03pm

OK, OK! I’m done complaining… Just fill my nodes already and pay me 300$/month.

arrogantrabbit · June 2, 2024, 4:57am

This will stop being a viable option very soon. At the current rate, my node behind Oracle VPN will blow through 10TB of free traffic in about 10 days.

So I either need to find an alternative way around CGNAT (I’m strongly considering AirVPN) or shut it down. Paying for extra traffic won’t make sense.

Ambifacient · June 2, 2024, 6:43am

How fast are you receiving ingress? Note that the 10TB quota on the free tier is for outbound data transfer only.

You could use a tool like wondershaper to limit the bandwidth, or adjust your max concurrent requests to slow it down.

Alexey · June 2, 2024, 6:44am

Exactly. If you use this VM as your gateway, all ingress will be egress from the VM. Actually it’s true for egress from the node too. It will be an egress traffic to the customers from this VM.

arrogantrabbit · June 2, 2024, 6:49am

Right:

Node ingress: Traffic from the internet to bridge (free) → forwarded to node → traffic from the bridge to node (10TB free)
Node egress: Traffic from the node to the bridge (free) → forwarded to requestor → traffic from the bridge to requestor (10 TB free).

Any traffic through the bridge will contibute to the 10TB allowance, once.

BTW I moved my node to AirVPN. Works fine. Also got ipv6 support by doing nothing… Another upside – I don’t have to maintain my instance on oracle anymore. Throwing money at the problem is the best way to solve problems that are solvable by throwing money at them.