Two weeks working for free in the waste storage business :-(

pangolin · July 22, 2024, 9:15pm

I wasn’t aware of that. What information is stored in this header?

Toyoo · July 22, 2024, 9:32pm

storj/storj/blob/d4e20ff7d9ac736b1260f9021e9bf167e695a384/storagenode/pieces/store.go#L478


      
          		w, err := store.Writer(ctx, satelliteID, pieceID, pb.PieceHashAlgorithm_SHA256)
          		if err != nil {
          			return err
          		}
          
          		_, err = io.Copy(w, r)
          		if err != nil {
          			return errs.Combine(err, w.Cancel(ctx))
          		}
          
          		header := &pb.PieceHeader{
          			Hash:         w.Hash(),
          			CreationTime: info.PieceCreation,
          			Signature:    info.UplinkPieceHash.GetSignature(),
          			OrderLimit:   *info.OrderLimit,
          		}
          
          		return w.Commit(ctx, header)
          	}()
          	if err != nil {
          		return Error.Wrap(err)

Julio · July 23, 2024, 11:45pm

So noted, thanks Toyoo

Julio · July 23, 2024, 11:57pm

512 byte header is positive, not breaking nice round numbers at all, rather quantifying availability.
Yes most pieces of course aren’t exactly 64M big, never said that. But it would substantively represent the how much inefficient ram usage could be possibly saved. There are people scratching their heads, why they have 20 GiG instances running in windows, and chasing their tails by trying to give the same 20 Gigs of ram to Primocache instead - lol. It needs to be fixed.
parity algorithm need not be cached, I suppose that’s tx/rx parity check.
Agreed no need to exactly match RS overhead, once again not the point. Just an over estimation needed for more efficient allocation.

Appreciate the succinct responses.

2 cents for 2 day

Alexey · July 24, 2024, 6:18am

What’s parity algorithm?

Julio · July 25, 2024, 9:52am

That which Toyoo mentioned.

Interesting there’s a header, and of that size - surely it could include the piece size if we needed, in order to create a variable length cache array in memory. Couldn’t possibly be that much to include that info from the uplink side, in order to co-ordinate a relational dynamic buffer size/array allocation on the node side.
P.S. I did check the current binary ‘storagenode.exe config --help’, and the buffer size lists the default as 128k still, but I really think that’s not the case currently.

2 cents

Alexey · July 26, 2024, 5:17am

Oh, that’s a different thing. It’s made on uplink’s side, not the node’s or the satellite’s side. Reed Solomon Erasure coding has several parameters, and most useful we know like 80/29, so this is an expansion factor after the file is encrypted and shaped to the segments, with this expansion factor the resulting number of segments will increase the amount of uploaded data, however, pieces are generated after the shaping and erasure coding:

See Introduction to Storj - Storj Docs.

Julio · July 26, 2024, 7:45am

Ok brick wall of communication here… giving up.
M_M, please DM me sometime on here, if that’s possible.

2 cents

andrew2.hart · August 2, 2024, 11:48am

Click the person’s name and send a message

Climbingkid · August 2, 2024, 3:12pm

Afternoon - So its been about three weeks since I last posted on this subject, and while the bloom filter did momentarily improve things, I now check and around 75% of storage is currently unpaid - unpaid storage going up, no room for paid data. This is not sustainable.

Node updated to v1.108.3, free space file walkers run and complete with no errors, and I am seeing GC filewalkers. df confirms used space - but 5.6TB of uncollected?

REPORTED BY     TYPE      METRIC                PRICE                     DISK  BANDWIDTH        PAYOUT
Node            Ingress   Upload                -not paid-                      203.77 GB
Node            Ingress   Upload Repair         -not paid-                        2.67 GB
Node            Egress    Download              $  2.00 / TB (avg)               20.08 GB       $  0.04
Node            Egress    Download Repair       $  2.00 / TB (avg)                1.12 GB       $  0.00
Node            Egress    Download Audit        $  2.00 / TB (avg)              255.49 KB       $  0.00
Node            Storage   Disk Current Total    -not paid-             8.20 TB
Node            Storage              ├ Blobs    -not paid-             7.70 TB
Node            Storage              └ Trash  ┐ -not paid-           496.43 GB
Node+Sat. Calc. Storage   Uncollected Garbage ┤ -not paid-             5.62 TB
Node+Sat. Calc. Storage   Total Unpaid Data <─┘ -not paid-             6.11 TB

CC

zip · August 2, 2024, 3:27pm

These calculations are based on what the Satellite is reporting your node should hold. And these reports were not very reliable recently, so it most likely is just the sat calculation that is lagging behind, causing the node and the script to report incorrect Average disk space used this month (as it is called on dashboard).
This won’t affect the payouts as that calculation is done as a separate process after each month AFAIK.
I also have a node using 16TBs of data, but reporting only 2, but again, because the Sat report is either missing or is incomplete.

I also believe there is a work planned to modify this in such a way, that if report will be missing an average will be calculated, which will fix these huge swings we are all seeing.

edo · August 2, 2024, 3:43pm

Hi Climbingkid,

Did you also see the topic “When will ‘Uncollected Garbage’ be deleted?” There’s a form there where you can provide some information to Storj if you believe you have a lot of uncollected garbage from SL.

It’s a good way to contribute to finding a solution!

pangolin · August 2, 2024, 3:51pm

Why do you think payout calculation is accurate while anything else is not?

Climbingkid · August 2, 2024, 4:05pm

@edo
Thanks for this - seems this issue has moved threads. Happy to follow that instead as its exactly what we were tracking here. We had thought that the lack of SL GC bloom filters was the solution as it had begun to resolve things but looks like thats not really the case.

Thanks

CC

Ambifacient · August 2, 2024, 4:09pm

The calculator is not going to be accurate at this time, it appears US1 is still having trouble broadcasting disk usage to nodes.

Alexey · August 3, 2024, 2:21am

Because it’s based on the sent orders by your node. These orders are cryptographically signed by three sides: by the uplink (the customer), by the satellite and by the node. They accounts a used space in GBh unlike average estimation, so they are pretty precise.

jammerdan · August 3, 2024, 4:08am

Are these orders also used for the nodes internal purposes?
If not, why not?
It sounds like it could be used as substitution if the satellite is not sending data on timely.
Also it sounds like it would not be required to hammer databases with each upload, if we could scan the orders in larger intervals and update databases then.

I don’t know, maybe this is already done today but it sounds like these order files could be useful.

Alexey · August 3, 2024, 6:52am

They are not used. I also asked this question, no answer so far. However, since this again GBh, the graph would look like this:

pangolin · August 3, 2024, 12:45pm

In my opinion it is always the same numbers in a database which are used. If the disk space graph is missing data, there is also no payment information for that day. The only question is why storj dosn’t get this fixed after such long time?

Alexey · August 4, 2024, 3:36am

If you would like to help - please send a pull request with the solution on the node’s side.
This is unlikely would be fixed on the satellite side, especially when we would have even more data. The report would be calculated even longer and unlikely there would be a daily reports.