So a couple of things I’ve noticed are that I start to fail audits when my Nextcloud server gets hit with large copies like iso files. It looks like it is due to the router not coping as the Nextcloud is exposed to the internet for sharing with other team members. Now, I am planning still to replace a lot of the networking with 10Gbit on the LAN for most of the servers and have started doing that and I’ll replace the router with pfsense as well. I’ve built a new pfsense box but need some final hardware upgrades for it before putting it into production use, So, as an interim measure are there QoS settings I can apply to storj to give it priority?
Depending on the setup, bottleneck, and the devices involved, you could classify any traffic to the node port/IP as higher priority.
Something else that may have been contributing is my Veeam Backup and Replication VM is on the same hypervisor and the storage for that is on my Truenas box so it is possible when those backups are running it is choking the NIC for storj. So i have moved the Veeam box to a different port on the quad gigabit ethernet. The load on the hypervisor itself is minimal so that shouldn’t be contributing.
If you’re sure your node definitely is healthy and holds all the pieces it is supposed to, it would mean that the node takes more than 5 minutes to reply back to the satellite with the right audited piece (that’s the only thing I can think of that could cause an audit fail, other than actually returning corrupted or missing data).
What could cause such a delay? Or do you think your router is dropping some requests all together?
The Linksys E4200 I was using died (after 9 years!) and was replaced with a crappy tp-link on a temporary basis. Combine that with some downloads/uploads being 300+GB or more then yeah 5+ minutes is certainly possible. The Nextcloud server box is currently down due to a power supply failure and after moving the Veeam VM NIC problem hasn’t come back so it is looking like those two things are the culprits. Certainly looking forward to getting on pfsense soon.
But still… I’m no network expert I must confess, but it feels to me like the size of the file being transferred on/from Nextcloud shouldn’t affect that much response times from other services.
Unless Nextcloud uses a massive number of requests in parallel completely overloading the router… ?
There’s something weird going on here, I say
Whilst i used to be on Gigabit Ethernet in Omsk, here we have only a 100Mbit connection and that carries storj, nextcloud, my exchange server and sometimes backup traffic between my employer and here plus also my wife’s gaming. (she is currently playing in a pro game where 1st prize is $200k USD.) The internet speeds are such she can’t stream on twitch at all. So I am not surprised the router gets swamped. The isp will not upgrade us. (Government company) and they have refused permission for another isp to supply this building.
Looking through my Proxmox logs I have found an additional problem.
[ 19.375455] netxen_nic 0000:01:00.0: Incompatibility detected between driver and firmware version on flash.
HP only supply the firmware in rpm format so currently working out how to apply that under debian. I then also need to apply a similar update to the 10gig card.
Well, a bit over a day later I got the firmware updates done…
Couldn’t upgrade under Debian/Proxmox (I could get into the rpm fine but the upgrade bombed with “unsupported os” messages) and i installed RHEL 7 and failed there with dependency errors.
Also tried Esxi 6.5 but there were known driver limitations stopping the upgrades working there too.
In the end I whacked in a spare 80GB SATA drive and installed server 2012. Then the upgrades worked. So, the errors on both my quad gig nic and the 10Gbit card are now gone. Yay!
Now, if only the SFP+'s I ordered from ebay hadn’t gone missing I’d be able to test out the 10 Gig card. Already got the fibre patch leads so the sfp+'s are the last part remaining at least for the first link.
Oh well, we made progress.