Updates on Test Data

pangolin · June 16, 2024, 1:13pm

As I said, just for fun…

jolmando · June 16, 2024, 1:17pm

it is probably logical that no one will expand their connection and run for new disks for now, what is enough is enough. why do they have investments that can be risky and the profit does not correspond to the risk)

Roxor · June 16, 2024, 1:48pm

Think of it more as a label of a generation/series of drives. Like they manufactured a variety of capacities in the X20 series… then a bunch more capacities in the X22 series etc. But yes the max sizes of each series tends to go up to match the X-number

daki82 · June 16, 2024, 1:56pm

i would bring capacity online, but im limited in bandwidh. so it’s impossible without vilolating requirements. (buying new 20+TB external drive and migrate/replace one node is still off the table. i think im fine with 26TB currently half full.)

After moving to the new house, i will ask neighbours, if i can use their guest net for my 4TB cold storage drive . (though missing the raspi running the node.)

thepaul · June 16, 2024, 1:57pm

In practical terms, base58 is not ok for the filesystem. The Windows API presents filesystems as case-insensitive (*) and APFS (macOS) is “case-insensitive but case-preserving”. Hex would be fine but is needlessly long (64 bytes). A base32-encoded satellite ID is 52 bytes, which isn’t a huge savings, but over all the times it shows up in logs, it really adds up. Also, the internals of the blob storage directory are not really meant to be user-facing. Also, we encode piece IDs with base32 (for similar reasons) and it makes the mental load a little bit lighter to use the same encoding for all filesystem paths.

(*) Yes, I know NTFS isn’t really case-insensitive, and you can use special flags to get at the case-sensitivity, but most windows tools expect case-insensitivity and it would be a fragile state. Imagine someone trying to copy their node directory to another location on disk without knowing they needed to use a special program to maintain case.

Toyoo · June 16, 2024, 3:50pm

It also bloats file system metadata.

(me being me, sorry!)

Pentium100 · June 16, 2024, 5:12pm

OK, so the ID format used as the name of the directory is chosen to be 1) case insensitive and 2) saves few bytes (compared to hex). Why not use it as well for the dashboard?

thepaul · June 16, 2024, 6:19pm

We’ve been using base58 for node IDs since well before we had to make that decision about the filesystem encoding. Changing it once people were already aware of node IDs would have been very confusing.

Mitsos · June 16, 2024, 9:46pm

Just to make things clear, your dashboard is showing data after expansion, correct?

Is that 20PB target after expansion?

Ambifacient · June 16, 2024, 11:21pm

Results after the last Saltlake GC for one of my nodes: 1928063 pieces deleted, took 68h while receiving ingress. The amount deleted was around 500GB.

I hope my drive can keep up with all the pieces to be deleted from these tests.

littleskunk · June 17, 2024, 9:21am

The answer should be obvious. Take a look at the grafana dashboard once more. Is it showing customer data before or after expansion? Why is it called customer data in the first place and not total used space?

agente · June 17, 2024, 9:36am

I noticed in last days an extreme consume of cpu. never been a problem in my system… something changed? Shared problem or just on my configuration?

Roxor · June 17, 2024, 10:44am

It’s not obvious. Before expansion it’s customer data. After expansion… it’s still customer data (that just takes up more space). In a graph like this I think a reasonable person would say if there’s 29.1PB used by customers now…

2024-06-17_usage

…that it’s “not even half full”. Like you could easily fit another 29.1PB of customer data in the free 37.2PB. But are you saying because that free space is “raw” that it really couldn’t take an expanded (29.1PB-customer * 2.2-expansion = 64PB) worth of data? We’re effectively closer to 2/3rds-full now? That would be a misleading report: not something you could make decisions from.

I’m getting confused as to what I’m seeing now

SNOs deal in expanded/actually-used space. And it sounds like satellites can now handle multiple different expansion factors at once. So any report with un-expanded/variable-expansion-factor numbers isn’t showing useful data because you don’t know how much space expanded customer data will take?

Arkina · June 17, 2024, 11:17am

Within that graph following two values (over ALL satellites) are compared:

max(storj_stats_storage_free_capacity_estimate_bytes) - green

statistical estimate of free storage node capacity, with suspicious values removed

sum(storj_stats_storage_remote_bytes) - blue

number of bytes stored on storage nodes (does not take into account the expansion factor of erasure encoding)

Because the expansion factor is not taken into account i called that value stored customer data. Of course its a bit comparing Apples to Eggs, but thats the nearest i could get to, without modifying (and thus possibly falsifying) the reported data.

Yes you are right, so it is important to understand the Values you are looking at. In my opinion it is still useful data because i, as an SNO, am interested to see if/how the network is utilised, which is reflected by those numbers.

Mitsos · June 17, 2024, 11:26am

Does that answer my question about the 20PB target somehow?

I’m not interested in the graphana dashboard. I’m interested in what you referred to here:

Roxor · June 17, 2024, 12:24pm

Thank you for explaining it! I can remember to sorta halve the free-space number to account for expansion: and it does make it clear why Storj has been asking the community for more nodes lately (as we’re closer to ‘full’ than I thought).

And even if it’s mixing slightly different stats: it is an accurate report on data-customers-pay-$4/TB/m-for vs. space-SNOs-could-be-paid-$1.5/TB/m-for.

Roxor · June 17, 2024, 12:36pm

I know you aren’t asking me: but what I think I heard is that Storj is willing to pay SNOs their $1.50/TB/m rate to hold 20PB-on-disk-on-node of capacity-reservation data. But really with a 2.2x expansion that only represents the space required to hold about 9TB of paid customer data?

(If I got it wrong: somebody will tell me shortly )

Mitsos · June 17, 2024, 12:55pm

What I think I heard is someone crying “the wolves are coming!” in the distance, if you know what I mean .

peter_linder · June 17, 2024, 12:59pm

Most of my nodes are now full and will not be able to deliver the same throughput as they have been for the last few weeks.

I did order additional 0.2PB worth of node hardware though it will take a month before they can all be brought online.

littleskunk · June 17, 2024, 1:31pm

The forum search can tell you if the grafana dashboard is showing the numbers including or excluding expansion factor. For example here: Publicly Exposed Network Data (official statistics from Storj DCS satellites) - #30 by Arkina

The rest is simple math. I have given you some numbers. You can try to make them match the grafana dashboard and you will quickly find out our internal dashboard has to show the same otherwise it would have to show a different scale.