Upload via various Apps make different results

I uploaded 1GB file [1,00 GB (1 073 741 824 bytes)] to different buckets, via:

  • filezilla
  • s3browser
  • WinSCP
  • web interface
    And then I downloaded the file the same way as it was uploaded. Interestingly, the results are not the same in sense of Segments and Bandwidth. (Furthermore, I byte-compared the downloaded files with the original, and they are identical, so no issue with upload / download corruption)

Could you explain why the same file takes different amount of Segments of storage, and different amount of Megabytes of download? Can’t understand how downloading 1GB file counts like half of gigabyte of transfered data. It’s not bad, but…
Because Segments and Bandwidth are paid, I would like to know where the difference comes from. Whether I better to use web interface when working with files. But thats not feasible and sensible…

1 Like

Please review our guide to fine tune performance for help in optimizing the segment size you use when requesting file transfers. The different methods you used have different defaults which might not be ideal to be used as is in conjunction with Storj.

Specifically, our guide for the S3 file browser states:

By default, S3 Browsers will download everything using 5MB chunks, whereas, you have the configuration option to increase that download size to 64MB, the segment size for Storj.

Similarly, Filezilla and WinSCP may have settings that you can use to specify the preferred segment/chunk size.

The Web interface is not meant to be used as your main method to download files, it is only to be used for one off downloads, as a quick introduction to Storj. It does not lend itself to fine tuning to optimize for using the best segment size that would save on bandwidth as the other options we offer for more professional use cases. You can find more guides in our documentation under the How To Guides section.

5 Likes

Thank you for clarification.
Yes I have s3browser set to 64MB part size since beginning. Filezilla does not have any options, the only option WinSCP allows is “Virtual Host” vs “Path”. I can provide screenshots if it helps.

Initially I expected the S3 gateway will do the 64MB segments chopping on its own, my mistake.
What would happen if I set the segment to more than 64MB? Will the S3 Gateway chop it anyway?

Ok, I’m using Synology Hyperbackup anyway, so hopefully this segments will not screw it up.
Btw. how about creating storj native plugin for Synology Hyperbackup (or provide Hyperbackup Vault running on storj platform) so users can avoid overloading your S3 gateway?

You could also add Synology Hyperbackup to Backups - Storj DCS Docs

I am using duplicati for that reason. It puts all of my small files in big zip archives with compression and deduplication. Ofc this depends on your use-case.

A lot of S3 clients will default to a 5 MB multipart size. Now we have a problem. The S3 clients expect the first part to finish uploading before sending the next one (it might send multiple in parallel but still expects them to get stored before continuing). There isn’t a good way for the gateway-mt to just pretend the part was uploading while combining it silently in the background. It also doesn’t know in which order the S3 client would like to have all of these parts at the end.

If you upload with a 128 MB multipart size gateway mt will split that into 2 segments again.

Currently, there isn’t a way to increase segment size or tune any of the other parameters. There is a refactoring plan that would give us some more flexibility. My dream setup would be a 1 GB segment size and no long-tail cancelation for downloads. Currently not possible but after the refactoring, it might get an option.

2 Likes

Wouldn’t this cause issues if a node doesn’t respond, send the wrong data or is just terribly slow?

It will wait for 29 (80 for uploads) to be finished before cancellation.
So I expect it would work anyway.

I reviewed the Guide and now I have last two questions. It may help other users of Synology as well. What setting do you recommend for:

  1. Request style - Virtual Hosted style or Path style?

  2. Multipart upload size 64 or the maximum available 512?

But if even 1 node doesn’t respond at all, the transfer would fail without long tail cancellation.

@xsys use path style and 64MB.

1 Like

Yes, and this is why you need to use software with retry feature like rclone
I hope that the uplink CLI will have this feature too.

Well retrying the whole segment would be really inefficient. I would hope any native implementation of this could just retry the missing piece(s) only. This would still cause significant slower transfers but would be the most efficient way of using resources and much of the slowdown can be overcome with parallelism. I’d love to see more flexibility for that too.

1 Like

There is a refactoring ongoing that comes with a retry on the piece level and concurrent piece transfers instead of concurrent segment transfers. So I would just download 10 more pieces of the next segment while I am waiting for the terrible slow node to finish. If it doesn’t finish I just retry with a different node.

4 Likes

Oh nice! That is super cool. Does that also work across objects so you don’t have different settings for multiple segments vs multiple object in parallel?

Any update for larger segment size?

1 Like

You may use the larger chunks in the multipart uploads, in result you likely would get less segments.