Repair queue increasing since 2 days. Any clue why?

Jacob · November 24, 2022, 4:30pm

Given how close this lined up with the large GC batch we figured it wouldn’t hurt to ask to see what others are seeing.

Here is the amount of pieces being deleted across the network due to GC. It shows on the far right that while there are still some nodes working on it, at least 50% of the network is already back to purple (no GC deletions happening).

ACarneiro · November 24, 2022, 4:31pm

That is SO cool!

(But still needs 20 characters, apparently)

Balage76 · November 24, 2022, 6:22pm

I 'm wondering if this is just GC… I have a node which is full for 6+ month now.
Yesterday morning, starting from 4 AM it became unavailable for about 4 hours according to uptime monitor. It is a RPi3B with a good old WD RED HDD connected through USB.
I checked it as soon as it became available it it had 70GB in trash. Before that, it was allways around 2-3GB max.
As the node was full, it did not had any significant ingress, especially not 70GB from the past days…
The node is otherwise in perfect condition, no errors in the log.

BrightSilence · November 24, 2022, 7:00pm

GC has been off for a while. The fact that your trash increased a lot only confirms that that’s likely what it was.

Vadim · November 24, 2022, 7:02pm

AND how much data was deleted during that ?

ACarneiro · November 25, 2022, 6:30am

So as @BrightSilence warned me, my node is being hammered again. Hopefully not for as long as last time.
Thanks for the warning that this would happen again, I’ll just sit back and wait

(When I win Euromillions I’ll replace all the spinning rust with SSDs)

jammerdan · November 25, 2022, 6:55am

Maybe not just the bloom filter? I think it would be generally great, if the node software would be smart enough to recognize when resource intense activity can be performed and when not. Things like file walker or GC should not happen when the node is busy serving downloads and uploads. And if they must run, maybe the software should be smart enough to reduce requests to reduce the overall load on a node.

ACarneiro · November 25, 2022, 8:03am

Well, the network seems to be coping OK with this blip so it’s probably not worth spending a lot of developer time implementing a solution to a minimal problem….

BrightSilence · November 25, 2022, 8:11am

Repair is costly though, and individual nodes do seem to suffer. It’s worth preventing nodes from toppling during GC. Though I agree that these runs are abnormally large.

Pentium100 · November 25, 2022, 8:30am

GC and filewalker probably should run with reduced priority so that regular requests can be served.

ACarneiro · November 25, 2022, 8:43am

That’s is possibly a better solution, yes…

jammerdan · November 25, 2022, 10:41am

At least that would be a good start. Maybe it would be even sufficient.
But Storj schould do something.

Roberto · November 25, 2022, 11:29am

While this process is rare, it might be worth introducing limits to the filewalker and the GC. My nas has suffered quite a bit…

BrightSilence · November 25, 2022, 12:39pm

Looks like it’s eu1’s turn today. I hope nodes are holding up ok.

ACarneiro · November 25, 2022, 12:57pm

Not here they’re not…

BrightSilence · November 25, 2022, 12:59pm

Not all nodes get bloomfilters at exactly the same time, but I have several nodes running GC for EU1 right now.

Pentium100 · November 25, 2022, 3:55pm

Most likely it was not counted as “stored”. This came about when @BrightSilence noticed a discrepancy between the amount of data the node says it has and the amount the satellite says the node has.

For some reason my node was not as affected by this - before I had about 50-70GB of trash, right now ~170GB. This is with 24.5TB stored.

BrightSilence · November 25, 2022, 4:26pm

It wasn’t paid. GC cleans up pieces on your node that aren’t accounted for on the satellite end. Your node shouldn’t have them to begin with. So no it wasn’t included in the storage graph or payout overview. But it was included in the pie chart as that uses your nodes local storage totals.

ACarneiro · November 26, 2022, 8:24am

As you’d said, The Hammering lasted much less than the first time. Everything nominal again.

Makes me wonder whether I should move my nodes to a higher specced machine, although if this was just a one-off I’ll probably just wait and see…

BrightSilence · November 26, 2022, 8:26am

This time it was likely just a different satellite. EU1 most likely. New runs from the same satellite will not happen within the same week normally. I wouldn’t worry about the spec. This shouldn’t happen to this extent normally.