Large amounts of trash on europe-west-1

baker · September 28, 2020, 6:55pm

I have not specified a debug port, but I am good at following directions. Although if it is easier to wait for someone else who already has access to the debug port, no worries. I am happy to help if I can though. I do not know how to determine the debug port.

dragonhogan · September 28, 2020, 7:08pm

sorry, not able to help on this one. My nodes have only been up for about 98 hours. I was installing some new light switches the other day and I stopped my nodes for about 30 minutes while I had the circuit to my office turned off as I was unsure if my connected UPS would last the entire duration as I have various other devices connected to it.

SGC · September 28, 2020, 7:16pm

i’m not mad… i’m disappointed lol

peem · September 28, 2020, 8:01pm

I have run:
-p 127.0.0.1:7777:7777

uptime 172h 31m (2020-09-21T15:27:03.043Z)

What command should I enter?

BrightSilence · September 28, 2020, 9:19pm

I have 331 hours uptime, but… apparently I forgot to set a debug port again after the last time I let the node recreate the config.yaml. Having to do this through CLI within docker, I’m struggling a little to find what you’re looking for. I only found this.

/app # wget -O - http://127.0.0.1:39736/mon/stats | grep piecedeleter-queue-full
Connecting to 127.0.0.1:39736 (127.0.0.1:39736)
writing to stdout
-                    100% |*********************************************************************************************************************|  586k  0:00:00 ETA
written to stdout
/app # wget -O - http://127.0.0.1:39736/mon/stats | grep piecedeleter
Connecting to 127.0.0.1:39736 (127.0.0.1:39736)
writing to stdout
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces count=170960.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces sum=4910217986971.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces min=3862.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces avg=28721443.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces max=5324050202.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces rmin=29596.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces ravg=2584719.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces r10=38486.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces r50=70223.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces r90=9623895.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces rmax=22486988.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces recent=53903.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces count=170960.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces sum=2697468.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces min=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces avg=15.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces max=291.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces rmin=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces ravg=1.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces r10=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces r50=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces r90=2.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces rmax=31.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces recent=0.000000
-                    100% |*********************************************************************************************************************|  586k  0:00:00 ETA
written to stdout

No mention of piecedeleter-queue-full.

Not sure if this helps. Let me know if I can try something else. (Ps. little surprised curl wasn’t available inside the container, but this worked. That thing is light weight )

littleskunk · September 28, 2020, 9:32pm

Nice one. Just to be sure that if from your node with the 400 GB of trash?

BrightSilence · September 28, 2020, 9:35pm

Yes, it is most definitely so (needed 20 chars )

SGC · September 29, 2020, 5:28am

so i assume this command parameter cannot be placed randomly in the run command sequence, since it seems that it’s not the parameter but its location which makes it work…

@littleskunk or @BrightSilence so what to i add to set this up… so it’s easy to find in future?

Alexey · September 29, 2020, 5:43am

SGC · September 29, 2020, 7:03am

tried adding it at the end of the run command but that didn’t seem to take, so added it before the --mount
which seems to have done the job, now i can access the info by going to http://storagenode-ip:5999/mon/funcs

oh yeah and ofc used the storagenode-ip instead of 127.0.0.1

adding it sooner than the other -p parameters seems like a bad idea, so made sense to put it just before the --mount 's started… i guess that when the --name storagenode storjlabs/storagenode:latest parameter is given maybe the parameter input ends…

seems a bit confusing that everything is a -p parameter but docker thing i suppose.

and also added the debug.addr “:5999” to config.yaml

peem · September 29, 2020, 7:49am

piecedeleter-queue-full

piecedeleter

trash

Zrzut ekranu z 2020-09-29 09-44-08

littleskunk · September 29, 2020, 10:58am

So it turns out Stefan deleted ~100 TB of old data from old buckets. Somewhere between satellite and storage nodes we lost a lot of deletes and later garbage collection fixed it. The issue I see here is not so much that the storage nodes are not getting paid. 400 GB unpaid for 2 weeks is just 30 cent. The big problem I see is more about what happens if we ever implement a bug in garbage collection. Within 7 days we want to be able to recover the data from the trash folder back to the storage node. This only works if the storage nodes trust garbage collection. As soon as storage nodes start to delete the trash folder from time to time we also lose the option to recover. I hope we can find the issue and fix it. In the meantime please do not delete anything in the trash folder.

peem · September 29, 2020, 11:15am

Docker node update is coming … can I do it?

baker · September 29, 2020, 12:28pm

I’m curious how you determined the random port that monkit was attached to inside the container.

peem · September 29, 2020, 12:48pm

I didn’t have to, I have docker run in my command …
-p 127.0.0.1:7777:7777

baker · September 29, 2020, 12:50pm

That was actually meant for Bright. I will definitely be adding that port map and config file entry for future use.

BrightSilence · September 29, 2020, 1:23pm

Oh I’m not worried about the payout. In my specific case even the 30 cents doesn’t apply as I have plenty of free space. So whether it temporarily stores garbage or is sitting there empty really has no impact. I’m just looking to help find if there is something bad going on.

Open a shell inside the container

docker exec -it storagenode /bin/sh

Then list ports being listened to

netstat -tulpn | grep LISTEN

You’ll find 28967, 7778 and the one you’re looking for.

baker · September 29, 2020, 1:33pm

Brilliant, thanks! I’ll add my data to the pile:

/app # wget -O - http://127.0.0.1:41969/mon/stats | grep piecedeleter-queue-full
Connecting to 127.0.0.1:41969 (127.0.0.1:41969)
-                    100%
/app # wget -O - http://127.0.0.1:41969/mon/stats | grep piecedeleter
Connecting to 127.0.0.1:41969 (127.0.0.1:41969)
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces count=42607.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces sum=325940.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces min=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces avg=7.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces max=98.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces rmin=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces ravg=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces r10=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces r50=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces r90=0.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces rmax=1.000000
piecedeleter-queue-size,scope=storj.io/storj/storagenode/pieces recent=0.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces count=42607.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces sum=14030282805357.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces min=35291.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces avg=329295252.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces max=96749857825.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces rmin=133873.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces ravg=20437380.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces r10=139005.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces r50=149185.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces r90=715065.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces rmax=498660832.000000
piecedeleter-queue-time,scope=storj.io/storj/storagenode/pieces recent=177330.000000

SGC · September 29, 2020, 5:25pm

DQ nodes that delete trash… and/or rename it to something that sounds less like trash…
filesmarkedfordeletion… terrible name but you get the idea… trash sounds like its not important…
or one can just call them marked files…

because they are not trash they are marked for deletion… anyways i think the problem is mostly in the naming scheme…

calling them marked i kinda like actually… because then people that don’t know won’t understand what they are… and thus they will have to ask and then they will learn what they are and that they shouldn’t delete them…

ofc i’m sure with enough time a better or more suitable name could be found… but that was what i could come up with off the top of my head.

littleskunk · October 1, 2020, 11:48am

@BrightSilence we noticed a second bug with our zombie segment reaper. The data Stefan deleted was very old. The zombie segment reaper is checking the creation time of the segment and simply didn’t clean up some leftovers from Stefans bucket. We fixed and executed the zombie segment reaper. Next garbage collection run should follow this weekend. I would expect that we will see another round of garbage.

The good news is we now should have finished the cleanup