Reduce disk size

skibboo · December 1, 2021, 8:16pm

How to do?
When i delete the container and start with a smaller size, it stops immediately after starting.

Andy idea?

Alexey · December 1, 2021, 9:04pm

The space will not be released anyway, it just will not accept more. The default minimum size is 500GB.
The space can be released though if the customers delete their data from your node.
The only way to free-up the space is to call a Graceful Exit: Graceful Exit Guide, but it’s a one-way ticket.

Pac · December 3, 2021, 7:32am

By the way, a partial graceful exit was something that StorjLabs wanted to implement in the past: is it still in the works?

I must say this feature really is needed ultimately because as we’re supposed to use spare space for Storj, anyone wanting to reclaim some of it shouldn’t have to kill their whole node to do so, as they would probably be happy to keep sharing spare space, just a bit less.

But currently, one can only reclaim nothing or eveything, for node operators with only one large node.

Any news with regards to this?

(Reducing allocated space to a node does make it shrink with time, but it is a very very slow process that one cannot realistically rely on)

andrew2.hart · December 3, 2021, 11:16am

You can edit the yaml file and change the default minimum(, but why do this? If you go below 500G you may as well stop completely)

Regarding reducing the actual disk space used, you can graceful exit just one of the satellites. Choose the one that removes the right amount of disk space for you.

If you really want to chance being disqualified, you can switch off your node for a couple of weeks. When you re-connect it will trash some data in the first week.

Yours Faithfully Andy

Pac · December 3, 2021, 9:58pm

Didn’t think of that. It does work as I experienced it unintentionally when my ISP went down for 40+ hours: it made me lose around 1.6% of data during the following 7 days (that’s because any node offline for more than 4 hours is flagged as potentially lost by satellites which start migrating/reconstructing data held by this node elsewhere - when the node gets back online, this migration process stops, but my understanding is that what’s already been moved away gets trashed - which is weird now that I think of it why not keep these duplicates? Don’t they make the network more reliable?).

Anyways, if a node’s scores are looking good, I guess that’s technically an approach that would work, but playing with scores like that really isn’t something I would want to do…

Alexey · December 4, 2021, 9:03am

Because of complains like:

Seems not, at least I do not see any work items related to it, except mention in blueprint Code search results · GitHub

Pac · December 4, 2021, 10:16am

I’m sorry, I don’t follow how this thread is related?

I’ll try to give more details on my thoughts and assumptions.
Right now, my understanding of what’s happening is as follows:

A node goes offline.
Once it has been offline for more than 4 hours, satellites assume it may be lost forever and start repairing pieces it was holding, little by little. I assume they do that only for pieces that fall under the threshold (52 currently IIRC), and that these pieces get back to 80 pieces amongst other nodes.
Progressively, pieces that get repaired successfully get flagged as removed from the offline node.
The node gets back online at some point.
Satellites stop the repairing process.
During the following days, bloom filters sent by satellites make the node delete all pieces that got repaired elsewhere while it was offline.

What I don’t get is why you would do step 3. If pieces weren’t deleted from the offline node, it would mean even more pieces available when (if) it comes back online, which means even more network resilience?

And if it would be a concern to have too much pieces on the network because of that (and too much money to pay SNOs as a consequence), the repair process (step 2) could repair data chunks to 79 pieces (instead of 80) just in case the node comes back online. Let’s call that “partial repair”

Now, there are probably elements above that I don’t get right as I do not know precisely how the Storj network works to be honest, but hopefully the reason I find this behavior strange is clearer now? ^^

Something may be wrong with my reasoning, please let me know

Oh
I took the liberty of reviving the following thread then:

Alexey · December 4, 2021, 11:19am

We have had complains regarding unpaid trash, this is exactly the same situation - the customers will pay only for 80 pieces, they will not pay for extra. So, the extra pieces would not be paid, similar to trash. But unlike trash which you store only for 7 days, you will store extra pieces until the customer will delete their data.

The description of the process is correct.

The problem with your suggestion is the same as with pieces lose/corruption - there is no guarantee, that pieces are still valid. This is mean, that to make sure they are still valid, we need to execute an audit for every “offline” piece for the same price as a repair.
This makes no sense, because if the minimum threshold would be reached, it will be just repaired for the same price and without additional complicate logic around audit and repair (this process should be as much light as possible - we have many nodes in the loop), but unlike immediate audit this repair may never happen (the customer may just delete their data).

Pac · December 4, 2021, 11:31am

Ah, I guess I get what you mean. Sort of.
Although… if you don’t trust pieces that get back online and that got repaired, I see no reason why you would trust all the other pieces still on the node…

What you’re saying is basically that money has already been spent for repairing some pieces, so instead of paying again for checking there’re still fine on the “back online node”, StorjLabs prefers deleting them altogether to avoid “double spending”.
It makes sense, but then my suggestion of “partial repair” would make sense. Wouldn’t it? ^^

Alexey · December 4, 2021, 11:41am

You wrapped up the problem exactly around the right word - “trust”.
See

exactly.

Unfortunately not. The repair is already happened, since you saw deletions of offline pieces, so we cannot save some expenses on that. If the satellite does not delete pieces it should pay for them. The customer will not pay to the satellite operator for extra pieces, so this payment for additional redundancy should go from the satellite operator’s pocket. Why they would want this?
Please note - there is a great chance that the second repair may never happen, so why to pay in advance?

BrightSilence · December 4, 2021, 12:09pm

Having the same piece on the network twice provides very little value. Storj uses erasure coding to ensure that any piece could be a part of repair when other pieces are lost. But as a result every piece is different and unique. This provides very strong protection without massive data inflation rates. Having a copy of the same piece only helps if that exact piece gets lost on the other node. Which gets back to the horrible inflation vs protection of simple replication. You can’t just lower the amount of pieces to repair, because repair can only happen with at least 29 unique pieces. If you have 29, but two or more are the same, the data is still lost. So keeping exact copies is essentially useless.

Besides, data loss is a good incentive to keep your node online consistently. And if you don’t, more data will be more frequently repaired to more reliable nodes, which is also better for data availability.

Pac · December 4, 2021, 12:49pm

If the repair was partial, I fail to see how we couldn’t save expense there. But maybe it’s too risky and makes the system even more complicated which is rarely a good thing, I’ll admit.

Yes, I agree and I was implicitly suggesting to reconstruct new pieces, not to copy existing ones (they’re unreachable anyway because they’re offline anyway).
That’s what the repair job does, right? At least that’s what I thought.

As there can be way more than 80 pieces for a single data chunk (that’s my understanding of the reed solomon magic), I thought the repair job was just regenerating new pieces that had almost no chance of already existing. But maybe my understanding of the reed solomon mechanism is wrong which well may be the case

Anyway, thanks for your time @Alexey & @BrightSilence
If it cannot be done there must be good reasons even though I might not grasp the whole picture. That’s fine

BrightSilence · December 4, 2021, 12:56pm

The repair that takes place when a node is offline recreates the piece that node used to have on another node. So that means there is now an exact copy. Though I guess it is in theory possible for repair to always create new pieces or at least do that for offline nodes. So that when they come back online, each piece would still be unique. However, I think the savings on repair costs are minimal as at best it can postpone subsequent repair, but I do wonder how many segments even see repair happen more than once. Repair replaces the unreliable nodes with new ones, so with every repair you get a more stable total set of nodes holding pieces for that segment. So there is probably limited value to keeping pieces on offline nodes around anyway. Especially since there is the added storage costs to pay for. And I think the incentive to keep nodes online reliably also adds value.

That said though, it is an interesting idea to use that expandability of the RS encoding concept. So yeah, filing that idea away for when it might become more useful.

Alexey · December 4, 2021, 1:03pm

I believe that pieces will be unique, not copies. We just removes pointers and adding new.
There is no point to have two implementations of Reed-Solomon encoding for the same case. The offline piece doesn’t different much from the corrupted or missed. So we download 29 healthy and reconstruct missing, then upload. During upload the unlinking and linking the pointers is happening automatically.
The only case when pieces could be mirrored is a Graceful Exit, but the case will be almost the same - we unlink the old node and link a new one.

BrightSilence · December 4, 2021, 1:07pm

Yes but I always thought that the missing ones were recreated. So initially you create the first 130 pieces (only wait for 80 to finish uploading). And repair would just recreate piece numbers in that range that were missing because they were either lost or offline. You can of course just keep numbering and instead create pieces 131+. Maybe it has already been implemented that way. Didn’t look into the code for this.

I’m aware that repair creates pieces different from the ones it downloads to trigger the repair, but the question is are they entirely new paces or replacements for the pieces lost?