Amazon S3 has a minimum multipart size of 5M. Apparently that limit is not enforced on the S3MTGW. I just tested HashBackup with partsize 1K (normally HB gives an error, but I removed the error check), and Storj allowed me to create a file with this small 1K part size w/o error. Uploading a 42K file this way means 42x1K segments, 80 pieces each, so the upload with HB is very slow (854 bytes/s) and downloading with uplink is also very slow (6.7 KiB/s)
There should be an error check for minimum part size because this allows a denial-of-service attack by using very small part sizes on a multipart upload. I suppose it would allow even smaller part sizes than 1K. See below for Amazon’s error if part size is < 5M.
[jim@mb hbrel]$ py backup.py -c sj backup.py
Backup directory: /Users/jim/hbrel/sj
Backup start: 2021-10-27 02:21:22
Using destinations in dest.conf
# for /Users/jim/hbrel/sj/DESTID numchunks=1
This is backup version: 0
Dedup not enabled; use -Dmemsize to enable
/
/Users
/Users/jim
/Users/jim/hbrel
/Users/jim/hbrel/backup.py
/Users/jim/hbrel/sj
/Users/jim/hbrel/sj/inex.conf
# for /Users/jim/hbrel/sj/arc.0.0 numchunks=42
Cache size: 29 MB (3600 pages)
Waiting for destinations: sjs3
Copied arc.0.0 to sjs3 (42 KB 49s 854 bytes/s)
Writing hb.db.0
# for /Users/jim/hbrel/sj/hb.db.0 numchunks=5
Waiting for destinations: sjs3
Copied hb.db.0 to sjs3 (4.8 KB 12s 375 bytes/s)
# for /Users/jim/hbrel/sj/dest.db numchunks=36
Waiting for destinations: sjs3
Copied dest.db to sjs3 (36 KB 38s 962 bytes/s)
Time: 8.2s
CPU: 0.1s, 1%
Wait: 101.3s, 1m 41s
Mem: 58 MB
Checked: 7 paths, 136192 bytes, 136 KB
Saved: 7 paths, 136192 bytes, 136 KB
Excluded: 0
Dupbytes: 0
Compression: 68%, 3.2:1
Efficiency: 0.00 MB reduced/cpusec
Space: +42 KB, 79 KB total
No errors
[jim@mbp ~]$ uplink cp sj://hbtest/testmb/arc.0.0 x
41.55 KiB / 41.55 KiB [------------------------------------------------------------------------------------------------] 100.00% 6.74 KiB p/s
Downloaded sj://hbtest/testmb/arc.0.0 to x
[jim@mbp ~]$ ls -l x
-rw-r--r-- 1 jim staff 42544 Oct 27 02:42 x
Amazon:
[jim@mb hbrel]$ py backup.py -c hbs3 ../test10
Backup directory: /Users/jim/hbrel/hbs3
Backup start: 2021-10-27 02:58:51
Using destinations in dest.conf
This is backup version: 0
Dedup not enabled; use -Dmemsize to enable
/
/Users
/Users/jim
/Users/jim/hbrel
/Users/jim/hbrel/hbs3
/Users/jim/hbrel/hbs3/inex.conf
/Users/jim/test10
Cache size: 29 MB (3600 pages)
Waiting for destinations: s3
dest s3: error #1 of 9 in send arc.0.0: [S3ResponseError] S3ResponseError: 400 Bad Request
<Error><Code>EntityTooSmall</Code><Message>Your proposed upload is smaller than the minimum allowed size</Message><ProposedSize>1048576</ProposedSize><MinSizeAllowed>5242880</MinSizeAllowed><PartNumber>1</PartNumber><ETag>f4be738b24def28664ab0e947bc7abb0</ETag><RequestId>9Y3HACXJH6Y3AX6J</RequestId><HostId>QPa7MWgvs9x5MpauzNG59ZyPu73UH61JeEXPsSPqjuvA9ZaeKklQvCFMejjg3tppCEMN9POx35k=</HostId></Error>
Perhaps related or not, I tried to use the share trick to verify the number of pieces, but it didn’t quite work. It displayed the file size, but said there are 0 pieces. Maybe it’s an access issue - not sure. Here’s the share URL
Thank you for the input! A minimum part size checking feature is code-complete and slated for an upcoming Satellite release. This check had previously been implemented on Gateway-MT, but was recently disabled due to issues it created in edge cases where there is high network latency.
I have been doing testing with many segments yesterday and today and had a question about billing. My account page says:
My First Project
Estimated Total $0.29
Storage ($0.004 per Gigabyte-Month)
Oct 1 - Oct 27
0.86 Gigabyte-month
$0.00
Egress ($0.007 per GB)
Oct 1 - Oct 27
41.75 GB
$0.29
Objects ($0 per Object-Month)
Oct 1 - Oct 27
88.09 Object-month
$0.00
Where does the charge for excessive segments show up? I’m still on the free tier and it looks like I can go pretty crazy on segments without accruing charges, but maybe I’m just not near the limits.
A related accounting question: does the accounting work by doing a scan of my account every hour, or are logs kept when objects are created and deleted? It seems if there is a poll every hour, it would be possible to go nuts creating and deleting objects and segments and never be charged. Ie, the degrade service attack I mentioned would not cost anything.
Did this bug get fixed? Small multipart uploads are no longer working, which is fine, but the error message is not good (if that’s what’s causing this):
Traceback (most recent call last):
File "/hb.py", line 154, in <module>
File "/destcmd.py", line 442, in main
File "/destcmd.py", line 241, in dotest
File "/s3dest.py", line 1024, in sendfile
File "/s3dest.py", line 1001, in sendmulti
File "/opt/lib/python2.7/site-packages/boto/s3/multipart.py", line 319, in complete_upload
self.id, xml)
File "/opt/lib/python2.7/site-packages/boto/s3/bucket.py", line 1779, in complete_multipart_upload
headers=headers, data=xml_body)
File "/opt/lib/python2.7/site-packages/boto/s3/connection.py", line 668, in make_request
retry_handler=retry_handler
File "/opt/lib/python2.7/site-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/opt/lib/python2.7/site-packages/boto/connection.py", line 1028, in _mexe
raise BotoServerError(response.status, response.reason, body)
BotoServerError: BotoServerError: 500 Internal Server Error
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InternalError</Code><Message>We encountered an internal error, please try again.</Message><Key>hbtest/test.tmp</Key><BucketName>hbtest</BucketName><Resource>/hbtest/test.tmp</Resource><RequestId>16B6942493190103</RequestId><HostId></HostId></Error>
Please see Amazon S3’s error message for too small parts for reference. Throwing a general 500 error and asking to retry is not that helpful.