Debugging space usage discrepancies

daki82 · January 13, 2024, 4:51am

Thats why in many, but not all cases, it is simply an other problem.
If we rule out similar symptoms and other problems and now an new problem occures, we can be happy to discover it.

This will bring us all forward.

Im sure the problem will be brought to devs via Bre or Knowledge.

Am i right?

elek · January 13, 2024, 8:35am

Based on simulations (see the Github issue), the usage of bloom filters and delayed deletion shouldn’t cause a problem. Max 0.5%-1% overhead, or even less.

The exception is the nodes with high number of blob files, because there is a size limit on the bloom filters (due to a RPC limit). There is an active planning phase to fix this limitations.

These are estimates based on a simulator. You can help fixing the issue with providing real data. If you have > 15-20M blobs in one satellite folder, you can send me the list of your blobs (here or to marton@ company domain) .

I can compare it with the list from the database, and calculate differences. (Also: It would help us to test improved version of the bloom filter).

You can do it with:

cd storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
du --all | gzip > ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa.txt.gz

This supposed to be a big file (1-2 Gb), so you may need to upload it temporary (for example to Storj with a free account )

Alexey · January 13, 2024, 10:15am

You may use an uplink share --url sj://my-bucket/report.zip to create a link or use a share feature in the satellite UI.

AiS1972 · January 13, 2024, 11:21am

Hello elec!

Let me show you in pictures that this is not about the 1 percent.
I have several dozen spare nodes, 1000-1100 MB in size, they have all been in constipation for about 3 months.
Trash, sorry, I automatically delete trash on these nodes once a day.
The node we are talking about is located on the ssd.
Well, now the actual pictures…

And so… We have a disk size of 1106 MB.
It is logical to get 1.1 or 1 (TB or TiB) * 1.5 = 1.65 or 1.5 dollars from it.
But something went wrong - 1.3 dollars were received.
This is 15-26 percent less than what you would expect to receive for such storage.

Once again about the system - the node is located on an SSD, walks are included.
I’m embarrassed to ask again… But… Where is my money, Lebowski?

PS:
For the respected Th3Van http://storj.dk , the situation is also far from 1.5usd/TB…

1.24 usd / TB

Thanks everyone!
Bye!

AiS1972 · January 13, 2024, 11:28am

elek:

You can do it with:
cd storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
du --all | gzip > ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa.txt.gz
This supposed to be a big file (1-2 Gb), so you may need to upload it temporary (for example to Storj with a free account )

Could you give me the command for Windows please?
Ready to provide a list of 45 million pieces on the US1 satellite.

tylkomat · January 13, 2024, 2:44pm

You could expect that, but the reality looks different. Since there is movement on the node and you get paid for holding a piece for x amount of time the real value will always be less. These minor differences may add up to that percentage you calculated.

Blobs are calculated at bytes / hour.
How is it calculated if a blob gets removed before the next hour is finished? Was that hour for free or is it counted partially?

AiS1972 · January 13, 2024, 4:33pm

OK. There are no questions.
Then it would be fair to indicate in the payment rates for node operators not 1.5 dollars per TB, but 1.5usd - 15-30% due to the peculiarities of the algorithms.
This will remove all unnecessary questions and will be honest and transparent.

Thanks everyone!
Bye!

tylkomat · January 13, 2024, 7:34pm

Great, I’m glad I could help

Let’s try a different approach then.
When did you take the screenshots?

snorkel · January 13, 2024, 7:50pm

Sorry for noob questions…
I’m on Docker/Synology/Linux.
Where will I find the .gz file? In the blob folder?
How long does the command run for 14TB?
Should I stop the node?

tylkomat · January 13, 2024, 8:19pm

Yes in the folder where you cd’d into, storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa in the example. You can also point it to another folder if you want. The part after > is the filename. You could point it to tmp for example du --all | gzip > /tmp/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa.txt.gz.

It may be faster if you stop the node as the hdd head doesn’t have to jump back and forth to handle node operations (read, write) and your read command.

I just tested on my node which is 4TB, but only 1.1 TB are in that folder and it took 12 min. I didn’t stop my node.

daki82 · January 13, 2024, 8:21pm

@elek Im asking the same, providing a screenshot from my fast node.
Im a bit concerned to reach 14 mil soon, and 1.6 TBmax. node half a year old.
as other blob folders have only max. 1.2TB but max. 2Mil files.
files

No discrepancies in dashboard, so im not concerned about that yet.

My guess would be someone BIG rejecting small files on that sat, living the easy storage life with his setup???
or some client has alot of small files

snorkel · January 13, 2024, 8:21pm

I stopped the node. So it seems it will take like 3 hours?

mgonzalezm · January 13, 2024, 9:29pm

Sent you a PM with the link from one of my nodes, little over 30.5 million files.

AiS1972 · January 14, 2024, 12:26am

Powershell:
Get-Childitem -Recurse -Path D:\Storj-V3\storage\blobs\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa\ | Out-File W:\Dir-Files\Files-US1.txt

jammerdan · January 14, 2024, 5:43am

Yeah great!

Th3Van · January 14, 2024, 6:04am

http://www.th3van.dk/blob-stats/

Not sure if any of my nodes has >15M blob files per satellite, but we will know in a day or two.
The du --all process are still running on all 105 nodes while typing this…

[17-01-2024 08:25 CEST]
The du --all process has finished, resulting in 420 log files containing 3.721.670.434 lines, and consuming ~131 GB.

Th3Van.dk

Alexey · January 14, 2024, 9:09am

It’s accounted as byte/hour, but calculated as byte/nanoseconds (or maybe even more precise - it’s just a difference between timestamps), so for not a whole hour it will be calculated only for the precise time when it was used.
See

github.com

storj/storj/blob/95eba16c44cc7ee0b193055ad1b227ec8d651c06/satellite/accounting/nodetally/observer.go#L103


      
          // for backwards compatibility.
          var monRangedTally = monkit.ScopeNamed("storj.io/storj/satellite/accounting/tally")
          
          // Finish calculates byte*hours from per node storage usage and save tallies to DB.
          func (observer *Observer) Finish(ctx context.Context) (err error) {
          	defer mon.Task()(&ctx)(&err)
          
          	finishTime := observer.nowFn()
          
          	// calculate byte hours, not just bytes
          	hours := finishTime.Sub(observer.lastTallyTime).Hours()
          	var totalSum float64
          	nodeIDs := make([]storj.NodeID, 0, len(observer.Node))
          	byteHours := make([]float64, 0, len(observer.Node))
          	nodeAliasMap, err := observer.metabaseDB.LatestNodesAliasMap(ctx)
          	if err != nil {
          		return err
          	}
          
          	for alias, pieceSize := range observer.Node {
          		totalSum += pieceSize

It’s not $1.5/TB, it’s $1.5/TB-mo, see

8.2.1. Storage of Storage Materials on Space - use of Space on the Storage Node by the Storage Services calculated in GB hours per month is paid at a rate of $0.0015 (USD) per GB month;

snorkel · January 14, 2024, 10:45am

Where did you vanished for 4 months, Th3Van?
We were worried…
Are you going for the certified part of Storj offer?
How’s your setup? How much TB are you storing right now?

Th3Van · January 14, 2024, 11:08am

I do my best to read/follow up on the forum… when I have time for it

I’m currently waiting for a delivery of a SM JBOD chassy with 60 x 20TB HDD’s that is going to be dedicated to the Commercial SNO program.

It’s currently running fine - you are welcome to find more infos on my stats page → www.th3van.dk which shows that all my nodes are currently storing 1.318.193.166.050.159 Bytes (~1,3 PetaBytes)

Th3Van.dk

snorkel · January 14, 2024, 11:43am

I ran the du command on Synology DS220+ 18GB RAM, 2 nodes, on the oldest one, after stopping it.
Statistics:
Node Ver. 1.94.2
Used 14.39TB/14.5TB
Trash 108.33GB
Average DSUTM 12.99TB (from sat)

ap1 from sat: 1.13TB
us1 from sat: 4.82TB
eu1 from sat: 7.21TB

du run time for ap1 1 hour
du run time for us1 14 hours
du not run for eu1

I will PM @elek with Mega links. It’s the only one I use for big files. storkshare is blocked by av.