Mismatch of proportions, or where is our money, Lebowski?

I’ll start with the background…
Since December 1, my total storage has grown by 100+ TB.
However !
I did not feel proportional to this growth, an increase in income.
If storage on December 26th is 100 TB = 150 usd more than on December 1st, then on December 26th I should receive approximately +5(150$/30days) dollars by December 1st. But this did not happen. Then I started looking at my charts.

Let’s look at the previous month - November (One of many nodes, size is not important - proportion is important).

disk space used - 12.7/11.4=1.114 +11.4%
storage summary 7.69/7.44=1.032 +3.2%

Dear Storj! This is how it should be, is everything right?

6 Likes

Yeah, actually I always keep asking the same question. Explanations so far aren’t satisfactory to be honest.

  1. Cluster size: of your clusters are too big, for example 2K+ you’re losing 1K on average for each file. Since average file size is 16KB (as I thought), this can explain up to 12% discrepancy of you’re using 4K clusters. But I’m using 128B clusters on ext4, so there should be less than 1% discrepancy contributable to this.
  2. Metadata: inode tables/trees, size, dates and so on; typically 1-2% of the file system.
  3. Missed deletions: when nodes are down, deletions can be missed and it depends on the bloom filters (general filter describing all file names of data on your node) to check whether files haven’t already been deleted in the past. Since bloom filters are overgeneralized (to make them more compact), they typically also match with 10% of deleted data. But my nodes all are 100% up.
  4. Slow file systems, not keeping up with pace of deletions. But have discrepancies over 10% on even SSD file systems.

So, I actually can’t fully answer the question.

1 Like

It is not even theoretically possible to use 128-byte sizes, the smallest ext4 supports is 1 kB. You are probably thinking of inode size, which is a different thing.

You’re correct, it’s inode size I took for cluster size. But in the end it doesn’t make a difference here.

BTW, right now deletions aren’t sent to nodes. It’s all bloom filters now.

2 Likes

The reported used space by your node (signed orders) maybe different from the physically used space for a lot of reasons, described above, and also (mostly), that the filewalker didn’t finish its job and didn’t move the garbage to the trash.
You need to search for errors, related to gc-filewalker, lazyfilewalker and retain in your logs and possible fix them.
If all filewalkers working without issues, the discrepancy should be a few percents.

P.S. I do not think, that generating more topics for the exact same problem will speedup a process to fix it.

1 Like

The sheriff doesn’t care about Indian problems…

Alexey!
The people here are mostly adults and some of those gathered are even quite educated.
These are very convenient answers - I saw them all, but they do not answer the question with which I started the topic - where did the income from the fact that storage increased by 100+ TB since the beginning of December disappear?

For myself, I concluded that such regrowth without income growth is not interesting to me, I am transferring all the nodes to constipation.

Thank you for your attention.

PS: I think it would be nice to compensate operators for all these existing problems in the software, say by 25%

1 Like

Yea some extra tokens like some one time Christmas gift, or New Year gift now.
I have some screens too, from the even beginning of the year 2023, where $20/TB for egress was a case, and even then, was some discrepancy between whats average TB storage in stats on left side, and whats used at right side, like 1-2TB, but that really didn’t matter that was peanuts when i got paid $22-26 for that 7TB node, now i got like $7,5-8 for the same node, and i just recently was able to complete the full filewalker and turned out that there was 2TB files to be deleted, that didn’t even was in trash, but now its fine.
(i mean as much fine as it can be, with a $1,5/TB/storage rate, instead of a much better preferred by SNOs: a $2/TB/storage, but hey! hah)

Instead we all complain in infinity for that discrepancy in many topics,
yea sure many nodes lost some pennys like $1,5-3 of potential profits, but its what happens sometimes when You create great things, and constantly make things better.

STORJ Inc. You could just i don’t know, make some one time extra surge, and apologies for the inconvenience of the situation, and we will be one big happy fammily again, at least for some time! :smiley: )

$2/$2 would be much better.

How can we check the file walker calculation? Do you have a manual way of calculation?

Hello @Odmin,
Welcome back!

I do not have this information. The only known source is logs of the node, it contains information about bloom filter size and how much pieces were moved to the trash or deleted.

1 Like

There was a problem with the garbage collection process during the last run, therefore you may see more overhead than usual.

You can execute it manually or you can check the cached value from the database. But usually it’s enough to check if you had any error in the log.

((I started to write more detailed instructions about debugging space issues… Will be posted soon.))

4 Likes

Will you also report back here?
I’m really curious, because I also have several nodes with unexplainable differences for as far as I can see.

Sure thing. Here it is.

Hope it helps. Let me know if you have more questions…

Or if size of blobs dir (without trash and deprecated satellites) are very different from what you got from the Satellite or Storagenode.

2 Likes

Yes, for me like 1TB or more difference.

Between which numbers? Satellite and Storagenode? Or Storagenode and disk usage? Did you exclude the trash usage?

GC might be failed last time… Did you see any error in the log? Or metrics about processed pieces?

1TB sounds fishy (unless you store 1PB :wink: on one node) Would be happy to help to debugging more, just send me more information (here/private or marton@ and storj domain)

2 Likes