Repair queue increasing since 2 days. Any clue why?

Given how close this lined up with the large GC batch we figured it wouldn’t hurt to ask to see what others are seeing.

Here is the amount of pieces being deleted across the network due to GC. It shows on the far right that while there are still some nodes working on it, at least 50% of the network is already back to purple (no GC deletions happening).

6 Likes

That is SO cool! :smiley:

(But still needs 20 characters, apparently)

I 'm wondering if this is just GC… I have a node which is full for 6+ month now.
Yesterday morning, starting from 4 AM it became unavailable for about 4 hours according to uptime monitor. It is a RPi3B with a good old WD RED HDD connected through USB.
I checked it as soon as it became available it it had 70GB in trash. Before that, it was allways around 2-3GB max.
As the node was full, it did not had any significant ingress, especially not 70GB from the past days…
The node is otherwise in perfect condition, no errors in the log.

GC has been off for a while. The fact that your trash increased a lot only confirms that that’s likely what it was.

2 Likes

AND how much data was deleted during that ?

So as @BrightSilence warned me, my node is being hammered again. Hopefully not for as long as last time.
Thanks for the warning that this would happen again, I’ll just sit back and wait :slight_smile:

(When I win Euromillions I’ll replace all the spinning rust with SSDs)

Maybe not just the bloom filter? I think it would be generally great, if the node software would be smart enough to recognize when resource intense activity can be performed and when not. Things like file walker or GC should not happen when the node is busy serving downloads and uploads. And if they must run, maybe the software should be smart enough to reduce requests to reduce the overall load on a node.

5 Likes

Well, the network seems to be coping OK with this blip so it’s probably not worth spending a lot of developer time implementing a solution to a minimal problem….

Repair is costly though, and individual nodes do seem to suffer. It’s worth preventing nodes from toppling during GC. Though I agree that these runs are abnormally large.

2 Likes

GC and filewalker probably should run with reduced priority so that regular requests can be served.

3 Likes

That’s is possibly a better solution, yes…

At least that would be a good start. Maybe it would be even sufficient.
But Storj schould do something. :grimacing:

While this process is rare, it might be worth introducing limits to the filewalker and the GC. My nas has suffered quite a bit…

Looks like it’s eu1’s turn today. I hope nodes are holding up ok.

Not here they’re not…

1 Like

Not all nodes get bloomfilters at exactly the same time, but I have several nodes running GC for EU1 right now.

Most likely it was not counted as “stored”. This came about when @BrightSilence noticed a discrepancy between the amount of data the node says it has and the amount the satellite says the node has.

For some reason my node was not as affected by this - before I had about 50-70GB of trash, right now ~170GB. This is with 24.5TB stored.

It wasn’t paid. GC cleans up pieces on your node that aren’t accounted for on the satellite end. Your node shouldn’t have them to begin with. So no it wasn’t included in the storage graph or payout overview. But it was included in the pie chart as that uses your nodes local storage totals.

2 Likes

As you’d said, The Hammering lasted much less than the first time. Everything nominal again. :slightly_smiling_face:

Makes me wonder whether I should move my nodes to a higher specced machine, although if this was just a one-off I’ll probably just wait and see… :thinking:

This time it was likely just a different satellite. EU1 most likely. New runs from the same satellite will not happen within the same week normally. I wouldn’t worry about the spec. This shouldn’t happen to this extent normally.

2 Likes