Best Practise re Segments

I have a folder I would like to backup to Storj. It consists of hundreds of thousands of other folders each with some small files (1-4MB would be usual). Its about 50GB or so in total.

Whilst backing up to storj does work, eventually, from reading the docs its somewhat inefficient and results in far too many objects / segments to be Storj friendly. Also deleting the bucket to start again (after I lost the credentials due to “idiot between keyboard and chair” in association with an overkeen password manager) is somewhat long winded - the website keeps timing out.

I am using TrueNAS Scale to store the files and to directly back them up, file by file - but I suspect there ought to be a better way of doing this. If I used duplicati to backup directly to Storj, the problem I can forsee is that duplicati is going to want to download, change and then re-upload large numbers of objects, as files in the source change - which is not going to be bandwidth friendly.

Not sure where to go with this, am I overthinking it or am I right?

It depends (isn’t that always the best answer you want to hear :smiley: )

Usually I wouldn’t be too worried about the segment size if it’s for personal backups, but 100k files would cost you $0.88 in segment fees alone. it’s worth checking how many files you have in total. Performance is not ideal, but for backups that might not be a big concern.

If costs are going to be a problem, you can use duplicati as you suggested. Use these two options to prevent as much egress as possible.

–no-backend-verification = true
–no-auto-compact = true

The first will stop it from downloading every file uploaded to verify and the second will prevent auto compacting files by downloading and reuploading them.

If you don’t frequently add or change files, you might want to leave the verification on though as it’s used to verify everything was backed up successfully and it’s basically a one time operation on upload or change. I’m not sure of the impact of compacting on egress. You’ll also use a little more storage if data isn’t compacted. So you’d have to play around a little with what works best for you.

Since you only store 50GB, chances are even without these settings you won’t go over the 150GB egress for the free tier, unless a large part of that 150GB changes frequently.

Changes are infrequent and tend to be minor (ish)

This user may not exceed the 150GB storage/egress limit included with a free account but if they are storing too many segments, then they still may exceed the segment limit in which case they should add a payment method, as Storj after seeing several unpaid bills may opt to reduce storage and egress limits to zero until unpaid invoices beyond free tier are fully paid.

2 Likes

Right… if your total costs won’t stay under $1.65 either use duplicati to limit the number of files or add a payment method if you’re ok with the additional costs.

By default duplicati stores 50MB remote volumes for remote backups I believe. It would be best to increase that to 61MB, but 50MB isn’t bad either. There are smaller chunks within those volumes and it’s possible duplicati will only download those chunks for verification. You’d have to check their docs for that. Storj CAN download only parts of files (stripes), but this isn’t true for updating. So keep that in mind. It may need to rewrite 50MB files when backing up. though the upload won’t be charged.

You may have already found this, but just to be sure: Duplicati - Storj DCS Docs

Edit: @heunland it looks like Duplicati uses MiB (with the wrong unit displayed) while Storj uses MB. Setting 64MiB would use 2 segments for each remote volume I think. The docs now suggest 64MiB (in the screenshot) but it might be better to set it to 61MiB or 60MiB to stay below the segment threshold.

I am currently running a test to see what happens
I didn’t change the 50MB as that seemed to work out at 1000 or so segments. Well inside the 10K segments.

When I tried the initial backup (before I deleted it) I had 37824.11 segment-months registered which I felt was an issue.

Given that I am testing I want to keep well within the 150GB limit for the moment. I can always delete everything and start again. 50GB ain’t so much

Currently is 288,500 individual files in 121,743 folders

Just a follow up.
<4000 segments, which is a massive improvement (so my sums were wrong, but still inside the sensible limit)

Seems to be running well.

Fellas, I’m thoroughly confused. What is the maximum segment size, in bytes please?

64,000,000 or 67108864?

It seems on the forum and documentation when referring to segment size and segment fee calculation MB is used when MiB is implied. When I upload a file of size 64MiB, which I assume shall be segment size, storj UI shows it as 67.1 MB(which is correct).

Is this supposed to result in 1 segment or 2? My usecase allows me to create files of size equal to any power-of-two, so I hope the segment size is 64MiB. Correct?

(because it would make no sense to set the segment size in decimals, but I did not see a single confirmation but a lot of variations of statements like “Segment size is 64MB”)

Maximum segment size is 64 MiB, or 67108864 bytes.

As evidence, I will point to: https://github.com/storj/storj/blob/471111122b297af385b67acbca325e5039c6e2a3/satellite/metainfo/config.go#L136 .

6 Likes