Updates on Test Data

As I said, just for fun… :wink:

3 Likes

it is probably logical that no one will expand their connection and run for new disks for now, what is enough is enough. why do they have investments that can be risky and the profit does not correspond to the risk)

Think of it more as a label of a generation/series of drives. Like they manufactured a variety of capacities in the X20 series… then a bunch more capacities in the X22 series etc. But yes the max sizes of each series tends to go up to match the X-number :slight_smile:

1 Like

i would bring capacity online, but im limited in bandwidh. so it’s impossible without vilolating requirements. (buying new 20+TB external drive and migrate/replace one node is still off the table. i think im fine with 26TB currently half full.)

After moving to the new house, i will ask neighbours, if i can use their guest net for my 4TB cold storage drive . (though missing the raspi running the node.)

In practical terms, base58 is not ok for the filesystem. The Windows API presents filesystems as case-insensitive (*) and APFS (macOS) is “case-insensitive but case-preserving”. Hex would be fine but is needlessly long (64 bytes). A base32-encoded satellite ID is 52 bytes, which isn’t a huge savings, but over all the times it shows up in logs, it really adds up. Also, the internals of the blob storage directory are not really meant to be user-facing. Also, we encode piece IDs with base32 (for similar reasons) and it makes the mental load a little bit lighter to use the same encoding for all filesystem paths.

(*) Yes, I know NTFS isn’t really case-insensitive, and you can use special flags to get at the case-sensitivity, but most windows tools expect case-insensitivity and it would be a fragile state. Imagine someone trying to copy their node directory to another location on disk without knowing they needed to use a special program to maintain case.

2 Likes

It also bloats file system metadata.

(me being me, sorry!)

1 Like

OK, so the ID format used as the name of the directory is chosen to be 1) case insensitive and 2) saves few bytes (compared to hex). Why not use it as well for the dashboard?

We’ve been using base58 for node IDs since well before we had to make that decision about the filesystem encoding. Changing it once people were already aware of node IDs would have been very confusing.

2 Likes

Just to make things clear, your dashboard is showing data after expansion, correct?

Is that 20PB target after expansion?

Results after the last Saltlake GC for one of my nodes: 1928063 pieces deleted, took 68h while receiving ingress. The amount deleted was around 500GB.

I hope my drive can keep up with all the pieces to be deleted from these tests.

2 Likes

The answer should be obvious. Take a look at the grafana dashboard once more. Is it showing customer data before or after expansion? Why is it called customer data in the first place and not total used space?

I noticed in last days an extreme consume of cpu. never been a problem in my system… something changed? Shared problem or just on my configuration?

1 Like

It’s not obvious. Before expansion it’s customer data. After expansion… it’s still customer data (that just takes up more space). In a graph like this I think a reasonable person would say if there’s 29.1PB used by customers now…

2024-06-17_usage

…that it’s “not even half full”. Like you could easily fit another 29.1PB of customer data in the free 37.2PB. But are you saying because that free space is “raw” that it really couldn’t take an expanded (29.1PB-customer * 2.2-expansion = 64PB) worth of data? We’re effectively closer to 2/3rds-full now? That would be a misleading report: not something you could make decisions from.

I’m getting confused as to what I’m seeing now :slight_smile:

SNOs deal in expanded/actually-used space. And it sounds like satellites can now handle multiple different expansion factors at once. So any report with un-expanded/variable-expansion-factor numbers isn’t showing useful data because you don’t know how much space expanded customer data will take?

Within that graph following two values (over ALL satellites) are compared:

max(storj_stats_storage_free_capacity_estimate_bytes) - green

statistical estimate of free storage node capacity, with suspicious values removed

sum(storj_stats_storage_remote_bytes) - blue

number of bytes stored on storage nodes (does not take into account the expansion factor of erasure encoding)

Because the expansion factor is not taken into account i called that value stored customer data. Of course its a bit comparing Apples to Eggs, but thats the nearest i could get to, without modifying (and thus possibly falsifying) the reported data.

Yes you are right, so it is important to understand the Values you are looking at. In my opinion it is still useful data because i, as an SNO, am interested to see if/how the network is utilised, which is reflected by those numbers.

3 Likes

Does that answer my question about the 20PB target somehow?

I’m not interested in the graphana dashboard. I’m interested in what you referred to here:

Thank you for explaining it! I can remember to sorta halve the free-space number to account for expansion: and it does make it clear why Storj has been asking the community for more nodes lately (as we’re closer to ‘full’ than I thought).

And even if it’s mixing slightly different stats: it is an accurate report on data-customers-pay-$4/TB/m-for vs. space-SNOs-could-be-paid-$1.5/TB/m-for.

1 Like

I know you aren’t asking me: but what I think I heard is that Storj is willing to pay SNOs their $1.50/TB/m rate to hold 20PB-on-disk-on-node of capacity-reservation data. But really with a 2.2x expansion that only represents the space required to hold about 9TB of paid customer data?

(If I got it wrong: somebody will tell me shortly :wink: )

What I think I heard is someone crying “the wolves are coming!” in the distance, if you know what I mean :wink:.

1 Like

Most of my nodes are now full and will not be able to deliver the same throughput as they have been for the last few weeks.

I did order additional 0.2PB worth of node hardware though it will take a month before they can all be brought online.

5 Likes

The forum search can tell you if the grafana dashboard is showing the numbers including or excluding expansion factor. For example here: Publicly Exposed Network Data (official statistics from Storj DCS satellites) - #30 by Arkina

The rest is simple math. I have given you some numbers. You can try to make them match the grafana dashboard and you will quickly find out our internal dashboard has to show the same otherwise it would have to show a different scale.