Relationship between S3 operations and uplink

Suykerbuyk · April 26, 2024, 5:06pm

My question is rather basic and primitive.

I initially uploaded a snapshot of a 30TB data set with uplink. However, the source data set changes over time. It appears that uplink cp uploads a fresh copy every time, regardless of whether the data object has changed.

Ergo, I switched to S3 Access credentials and rclone ( the binary with StorJ support, from the StorJ home pages) - version 1.66.

My question - in terms of retrieval (or onboarding), does it make a difference whether I use uplink or an S3 client?

John “S”

Suykerbuyk · April 26, 2024, 5:52pm

On a side note, rclone, when used to download and with the “sync” option is now crashing for an out of memory condition - every single time now - even for a --dry-run as soon as it begins indexing files that have not been down loaded and with a --parallel of “1”.

jtolio · April 26, 2024, 5:52pm

Hey John!

uplink cp just unconditionally copies as you noticed, and doesn’t attempt any kind of determination of if the source is already at the destination. Using rclone does give you this behavior (both rclone cp and rclone sync attempt to determine if the destination already exists).

I believe (at least at one point) rclone had support for both native direct Storj network communication and also S3 protocol support (it may be listed under Tardigrade, an old brand of ours). I’m not sure which exists in rclone today.

If you use native communication (uplinkcli, for instance, or rclone’s native Storj mode), then you are talking directly to the storage nodes. Your uploads can be much faster in the sense that you are eliminating a hop, but it can require some tuning to get there. This is also the right way to get end-to-end encryption. On the other hand, native uploads have an expansion factor - every 1 TB uploaded this way will cost you ~2.7 TB of egress.

If you use the S3 protocol route, then while you now get S3-protocol-standard server-side encryption, and your client is no longer talking directly to storage nodes, your egress use will scale back to the 1TB you’re uploading, since the S3 gateway is now the origination point of that expansion factor.

We probably have a little table or somewhere comparing the pros and cons of the two, but in summary:

Native/uplink: end-to-end encryption, direct node communication (less hops), 2.7x egress for uploads, perhaps a need for some use-case-specific tuning.
Gateway/S3: server-side encryption, 1x egress for uploads, using the Storj-tuned gateway.

Hope this helps and that you are well!

Suykerbuyk · April 26, 2024, 5:56pm

I’m quite sure that I’m using the rclone native StorJ support, when setting up a new end point with rclone config, it’s something like option 44. The default Ubuntu 22.04 LTS version of rclone does not support StorJ directly, but upstream version 1.66 does.

If I upload with uplink, can I download with S3 (and vice versa)?

jtolio · April 26, 2024, 6:09pm

Yes you can! As long as the access grant you use with uplink and the access key you use with S3 have been constructed with the same passphrase, it should work great.

Suykerbuyk · April 27, 2024, 3:14pm

I’ll just add some notes here. I’ve been using StorJ to distribute multi-terabyte copies of full blockchain copies to and from various continents.

uplink works absolutely wonderfully and flawlessly IF one waits till the complete data set is on the StorJ network as it is unaware of changes to data sets and blissfully overwrites objects and files that are fully data complete and is unable to effect some form of “delta/diff” transfer. Ergo, uplink is most definitely the way to go if you have a static, relatively immutable data set AND you can wait till the data set is fully uploaded before beginning an ‘uplink’ based download.

Unfortunately, I did not wait till the upload was complete before starting downloading it to various endpoints. If using ‘uplink sync’ this means that fully complete data files are overwritten, regardless of their state.

I therefore transitioned to using rclone, but with native StorJ and generic S3 configs. The objective was to regularly refresh the downloads destinations as the upstream StorJ bucket was added to.

Unfortunately, under every scenario, rclone has a very nasty OOM bug when there are LOTS of files. In my case, the source file count is > 65,000.

The rclone bug appears in every release repo of rclone I’ve tested (~ version 1.6) as well as the latest upstream v1.66. It also appears regardless of the --dry-run flag being present. The mitigations suggest from 2018 of adding the --attr-timeout of >= seconds appears to be no longer supported.

I then pivoted to using Minio’s ‘mc’ client to pull from StorJ. This appears to mostly work fine for both full runs and --dry-run with regards to memory and OOM issues. However, when pulling from a StorJ S3 bucket, I always end up with an unexpected EOF error on an object stream:

Below is typical of the random, but always present failures to mc mirror my StorJ archive. Note I had to paste a screen shot as it was interpreted as containing too many (more than 2 links).

Alexey · April 28, 2024, 9:48am

I see, you may reduce the rclone memory footprint to reduce a parallelism, but it likely will affect the speed.
Hm. You may try to use xargs command to run rclone in parallel, but with a reduced number of threads/parallelism for the each instance, because right now we do not have an option in the uplink CLI to check the destination, it will always upload, independently of does it changed or not…
Hm… It wouldn’t work too I guess, if the source is exact for the each instance…
I passed your question to the team.

The only thing which I thinks about is to use some backup tools instead, like Duplicacy, restic or HashBackup, they can do a hashed snapshots (only a difference) and reduce your storage costs (because of a bigger packs), this also should increase the speed for recovering.

Suykerbuyk · April 29, 2024, 12:28pm

Adding some more notes…

The rclone OOM issue appears to plague high core count, EPYC processors the most. I’ve a bunch of older (Xeon gen 1 & 2) boxes with 8 cores or less, and the problem does not occur.

Doing a web search for “rclone out of memory oom bug” yields a ton of similar results going back to 2018, most relevantly:

github.com/rclone/rclone

rclone using too much memory

opened 05:33PM - 20 Mar 18 UTC

closed 11:17AM - 24 Mar 18 UTC

calisro

bug

rclone v1.39-211-g572ee5ecβ - os/arch: linux/amd64 - go version: go1.10 I… recently upgraded from rclone.v1.38-235-g2a01fa9fβ to rclone v1.39-211-g572ee5ecβ I'm using the following mount: export RCLONE_CONFIG="/home/robert/.rclone.conf" export RCLONE_BUFFER_SIZE=0M export RCLONE_RETRIES=5 export RCLONE_STATS=0 export RCLONE_TIMEOUT=10m export RCLONE_LOG_LEVEL=INFO export RCLONE_DRIVE_USE_TRASH=false /usr/sbin/rclone -vv --log-file /data/log/rmount-gs.log \ mount robgs-cryptp:Media $GS_RCLONE \ --allow-other \ --default-permissions \ --gid $gid --uid $uid \ --max-read-ahead 1024k \ --buffer-size 50M \ --dir-cache-time=72h \ --umask $UMASK 2>&1 > /data/log/debug & This worked flawlessly before. In the most recent version, it is using WAY too much memory causing OOM issues on my linux server. The above uses upwards of 7GIGS of resident memory before my OOM killer terminates the mount when activity is being read/written to the mount. I used to be able to crank up the buffer-size to 150M without issues. There is NOTHING else different between the setup. I can restore my old version and re-run the batch process without issues and then replace it with the new version and it consistently terminates due to OOM after consuming everything left. If I restore the old version and rerun the exact same process, rclone consistently uses no more than 1.8G. I can provide logs but there is nothing them. They simply show transfers. Even in the fuse-debug there is nothing but normal activity. There seems to be either a leak or the use of memory has changed DRASTICALLY between the versions making things unusable. I've rolled back till this is sorted.

But again recently documented here:

github.com/rclone/rclone

S3 sync high memory usage (leads to OOM)

opened 03:30AM - 16 Jul 23 UTC

closed 11:40AM - 24 Aug 23 UTC

ahmedjafri

bug Remote: S3

#### The associated forum post URL from `https://forum.rclone.org` #### What is the problem you are having with rclone? rclone runs for about 20-40 minutes and then memory usage starts to climb up to 11-13 GB until it's killed by the kernel for taking too much memory. The machine this is running on has 16GB of memory. #### What is your rclone version (output from `rclone version`) 1.63.0 #### Which OS you are using and how many bits (e.g. Windows 7, 64 bit) Debian 64-bit #### Which cloud storage system are you using? (e.g. Google Drive) S3 #### The command you were trying to run (e.g. `rclone copy /tmp remote:tmp`) ``` $ rclone --config ./rclone.conf -v --stats 5s --transfers 3 --dry-run --filter-from ./rclone-exclude.conf --rc --s3-upload-concurrency 1 --max-backlog=1000 sync /<redactedPath> remote:<redacted>/ ``` rclone.conf ``` [remote] access_key_id = <redacted> secret_access_key = <redacted> max_upload_parts = -3 endpoint = region = us-east-2 type = s3 folder = / bucket = <redacted> storage_class = DEEP_ARCHIVE fast_list = false server_side_encryption = chunk_size = 100M ``` rclone-exclude.conf ``` - .zfs - .zfs/** - *.DS_Store - /iocage/** - /jails/** - **/.recycle/** - /ix-applications/** ``` #### A log from the command with the `-vv` flag (e.g. output from `rclone -vv copy /tmp remote:tmp`) ``` 2023/07/15 20:23:32 NOTICE: Serving remote control on http://127.0.0.1:5572/ 2023/07/15 20:23:32 NOTICE: s3: s3 provider "" not known - please set correctly ... Transferred: 3.317 GiB / 263.962 GiB, 1%, 8.313 MiB/s, ETA 8h55m8s Checks: 6365 / 6365, 100% Transferred: 71 / 1082, 7% Elapsed time: 2m0.3s ``` memory dump ``` flat flat% sum% cum cum% 11GB 99.92% 99.92% 11GB 99.92% github.com/rclone/rclone/lib/pool.New.func1 0 0% 99.92% 11GB 99.89% github.com/rclone/rclone/backend/s3.(*Fs).Put 0 0% 99.92% 11GB 99.89% github.com/rclone/rclone/backend/s3.(*Object).Update 0 0% 99.92% 11GB 99.89% github.com/rclone/rclone/backend/s3.(*Object).uploadMultipart 0 0% 99.92% 11GB 99.89% github.com/rclone/rclone/fs/operations.Copy 0 0% 99.92% 11GB 99.89% github.com/rclone/rclone/fs/sync.(*syncCopyMove).pairCopyOrMove 0 0% 99.92% 11GB 99.92% github.com/rclone/rclone/lib/pool.(*Pool).Get ``` #### How to use GitHub * Please use the 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to show that you are affected by the same issue. * Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue. * Subscribe to receive notifications on status change and new comments.

What has worked extremely well is MinIO’s ‘mc’.

It would be great if uplink supported delta sync/mirror operations natively!

Dominick · April 29, 2024, 2:51pm

Rclone should work great for this. I’m happy to do a call if you want to run a lab @Suykerbuyk

It sounds like your process might benefit from the use of rclone sync vs rclone copy.

Example command
rclone sync --progress --checkers 100 --fast-list --disable-http2 --transfers 64 --dry-run /local/path mount:bucket

--progress real-time transfer statistics
--checkers default is 8, scale up to improve checking throughput
--fast-list can improve listing speed by being recursive, drop this if you have memory issues
--disable-http2 a must as it improves performance substantially
--transfers 64 files transferred in parallel, memory usage will be this number multiplied by file size up to 64mb (64x64=4096mb). Normally 64 is enough to be “really fast” when moving a bunch of smaller files but if you have the resources you can try 96 and 128. If the file being bulk uploaded (synced) is large don’t use such high transfers as default --s3-upload-concurrency is 4 (64x64)
--dry-run for testing, does not make any changes

Native vs Hosted S3
As for native vs hosted S3. Rclone supports both options. My advice is to use hosted s3 to start and experiment with native after you are successful. Load will be a lot higher native.

Hosted S3
As of version 1.61.1

5 / Amazon S3 Compliant Storage Providers including AWS, Alibaba, Ceph, China Mobile, Cloudflare, ArvanCloud, DigitalOcean, Dreamhost, Huawei OBS, IBM COS, IDrive e2, IONOS Cloud, Liara, Lyve Cloud, Minio, Netease, RackCorp, Scaleway, SeaweedFS, StackPath, Storj, Tencent COS, Qiniu and Wasabi

Then…

21 / Storj (S3 Compatible Gateway)
\ (Storj)

Native
As of version 1.61.1

41 / Storj Decentralized Cloud Storage
\ (storj)

Suykerbuyk · April 29, 2024, 3:40pm

Thanks, Dominick!

I agree. I tried multiple permutations of sync. More often than not in the past, the OOM was “solved” with an --attr-timout flag of 60 seconds or more. However, this seems to be missing in all the v.1.6 versions I’ve tested.

I had not played with the flags of: --fast-list and --checkers. I suspect that will help and will resume testing again this afternoon.