I started s3 gateway for storj and started mirroring my minio server (50GB of files ranging in size from 100KB to 3MB). s3 gateway could not sustain my max upload speed of 500mbps (separate topic documented in other threads), but whenever it was pushing 300 to 500mbps, it was using 400% cpu (4 cores loaded fully). I am running on Intel i5-8400 which has aes-ni and avx extensions so I would not expect encryption and upload to take that much cpu. In comparison, when I am uploading backup to minio (using restic which will encrypt backup and break up into smaller blobs), I could max out my 500mbps connection and neither restic, nor minio server (using https) would use much of my cpu.
I was planning to use storj in cloud setup where I would be uploading lots of data directly from cloud servers to storj network, but so much CPU requirement from storj uplink would slow down my servers a lot (CPU is very expensive in cloud). Is there a way to minimize CPU usage?
Please stop trolling. Telling me that I can minimize CPU usage by not using network is not really an answer, is it? Whole point is to use network and upload files to it. I know it won’t use CPU if I am not uploading file to it.
Can you give us more info about how to mirror the data? Also would be nice to know your costs in terms of cloud CPU. From my perspective its super cheap nowadays, but maybe you are using a special setup. We have uploaded petabytes of data via cloud servers for load testing.
The erasure encoding is taking up most of the CPU. Hypothetically you could change some config in your uplink to not upload so many files at the same time, which should reduce CPU load to the levels you want.
I am using Restic to create hourly backups of my server. Currently all the files are stored in minio. I ran storj s3 gateway and used mc mirror to copy to storj. While mirroring the data, I observed cpu and network load. As for pricing, you can use digitaloceans prices 4vcpu is around $40. However my cpu is dedicated 6core i5-8400 (4ghz max turbo boost) which is probably double the compute power of digital oceans shared vcpu’s.
Server is running mail, nextcloud, grafan+prometheus and other services. Right now I only had a huge load because of initial sync. However I need to run a monthly cleanup job so stale blobs are removed from the repo and it requires some data download/upload and repository Management from restic. (Index rebuilding). That probably will take 30 minutes to couple hours and I expect having a CPU load from network IO from storj s3 gateway. I will try to set lowest priority on S3 gateway so it does not affect other services, and the process will take longer than it would otherwise, so it’s not really a problem. I just wanted to bring it up for the team since, in cloud my speed will be limited by my CPU and not network which is unexpected from a data service perspective (usually network and hdd are bottlenecks)
In other words. If you were to run a benchmark on S3 gateway where HDD and network is stubbed (instant read of data and instant upload), what would be maximum theoretical throughput of S3 gateway? According to my results it is around 10MB/s per dedicated cpu core.