Graceful Exit after database removing

joesmoe · June 5, 2020, 8:12am

When removing the database files and restarting the node, I get

By starting a graceful exit from a satellite, you will no longer receive new uploads from that satellite.
This action can not be undone.
Are you sure you want to continue? [y/n]
:y
Domain Name Node ID Space Used
satellite.stefan-benten.de:7777 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW 0 B
saltlake.tardigrade.io:7777 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE 0 B
asia-east-1.tardigrade.io:7777 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 0 B
us-central-1.tardigrade.io:7777 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S 0 B
europe-west-1.tardigrade.io:7777 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs 0 B
europe-north-1.tardigrade.io:7777 12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB 0 B

when trying to GE. Is it normal the the space would be all 0’s?

I am seeing uploads and downloads in the logs.

BrightSilence · June 5, 2020, 10:23am

Why did you remove all databases? That’s a really bad idea and only a last resort measure.

No it’s not, but since you removed all databases, you removed all node metadata so it isn’t aware of the pieces it holds atm. I think GE should still work as the satellite tells your node which pieces to transfer. But I feel like I have to make this next thing clear.

DON’T REMOVE YOUR DATABASES!!!

joesmoe · June 5, 2020, 11:26am

I backed them up first.

I really had no other option that i could see, i tried rebuilding them and such, but already got dq’d on one sat and suspended on another (even though audid 999).

This wasn’t due to RAID error, i kind of jumped in on someone elses topic.

After a few hours, the stats were rebuilt and now in GE does reflect the correct space used and i’ve initiated a GE.

I have many nodes and do not want to deal with one that’s all messed up. It has too much data (more than it should) and is DQ’d already on one sat, so time to just get back what i can and move on hah.

Chalk it up to a slight loss due to not following the instructions more carefully. I should know better.

joesmoe · June 5, 2020, 11:36am

By the way, it appears that this DID NOT WORK for me at all. I’m getting a errors on the GE now so please don’t follow my steps.

2020-06-05T11:35:57.881Z	ERROR	gracefulexit:chore	failed to transfer piece.	{“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “write tcp 172.17.0.51:45130->78.94.240.189:7777: write: connection reset by peer”, “errorVerbose”: “write tcp 172.17.0.51:45130->78.94.240.189:7777: write: connection reset by peer\n\tstorj.io/drpc/drpcstream.(Stream).RawFlush:287\n\tstorj.io/drpc/drpcstream.(Stream).MsgSend:321\n\tstorj.io/common/pb.(drpcSatelliteGracefulExitProcessClient).Send:1345\n\tstorj.io/storj/storagenode/gracefulexit.(Worker).transferPiece:274\n\tstorj.io/storj/storagenode/gracefulexit.(Worker).Run.func2:110\n\tstorj.io/common/sync2.(Limiter).Go.func1:43”}
2020-06-05T11:35:57.920Z	ERROR	gracefulexit:chore	failed to put piece.	{“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “Piece ID”: “OFTEGTGZKI7Z3EBZI4T4YAXL6RAGI4OK7UOF45KIXOSLCKHL3HSQ”, “error”: “piecestore: rpccompat: dial tcp: lookup storjv3junior.ddns.net on 213.133.100.100:53: no such host”, “errorVerbose”: “piecestore: rpccompat: dial tcp: lookup storjv3junior.ddns.net on 213.133.100.100:53: no such host\n\tstorj.io/common/rpc.Dialer.dialTransport:279\n\tstorj.io/common/rpc.Dialer.dial:256\n\tstorj.io/common/rpc.Dialer.DialNode:155\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(ecClient).dialPiecestore:65\n\tstorj.io/uplink/private/ecclient.(ecClient).PutPiece:193\n\tstorj.io/storj/storagenode/gracefulexit.(Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}
2020-06-05T11:35:57.920Z	ERROR	gracefulexit:chore	unable to send failure.	{“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”}
2020-06-05T11:35:57.920Z	ERROR	gracefulexit:chore	failed to transfer piece.	{“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “piecestore: rpccompat: dial tcp: lookup storjv3junior.ddns.net on 213.133.100.100:53: no such host”, “errorVerbose”: “piecestore: rpccompat: dial tcp: lookup storjv3junior.ddns.net on 213.133.100.100:53: no such host\n\tstorj.io/common/rpc.Dialer.dialTransport:279\n\tstorj.io/common/rpc.Dialer.dial:256\n\tstorj.io/common/rpc.Dialer.DialNode:155\n\tstorj.io/uplink/private/piecestore.Dial:51\n\tstorj.io/uplink/private/ecclient.(ecClient).dialPiecestore:65\n\tstorj.io/uplink/private/ecclient.(ecClient).PutPiece:193\n\tstorj.io/storj/storagenode/gracefulexit.(Worker).transferPiece:212\n\tstorj.io/storj/storagenode/gracefulexit.(Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

2020-06-05T11:36:43.865Z ERROR gracefulexit:chore failed to transfer piece. {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “v0pieceinfodb error: sql: no rows in result set”, “errorVerbose”: “v0pieceinfodb error: sql: no rows in result set\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).Get:131\n\tstorj.io/storj/storagenode/pieces.(*Store).GetV0PieceInfo:659\n\tstorj.io/storj/storagenode/pieces.(*Store).GetHashAndLimit:439\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).transferPiece:191\n\tstorj.io/storj/storagenode/gracefulexit.(*Worker).Run.func2:110\n\tstorj.io/common/sync2.(*Limiter).Go.func1:43”}

Beddhist · June 5, 2020, 12:14pm

Looks like your DDNS is not working.

joesmoe · June 5, 2020, 12:16pm

I don’t use DDNS though. I just hardcode the IP.

Also that’s not my ddns or ip anywher.

BrightSilence · June 5, 2020, 12:49pm

These errors are not on your end but on the receiving nodes end. You’re GE will proceed just fine. Every piece is tried 5 times on different nodes. So the chances of them failing 5x in a row are incredibly small. Only after 5 failures will the piece be counted as failed and you would have to fail 10% to fail GE.

Please look into this topic

SGC · June 5, 2020, 5:28pm

yeah i also kinda misunderstood how important they actually are for regular operations…
deleting the databases without doing a GE afterwards can basically mean you keep all the files you don’t have a record off for free…

so yeah don’t delete your databases, like NASA says…

“There isn’t a problem, so bad, that you cannot make it worse…”

ofc some problems can be helped with fast reaction times, but action without thought breaks stuff… xD

great saying tho…

kevink · June 5, 2020, 5:47pm

No, the satellite knows and pays for them.

SGC · June 5, 2020, 7:48pm

All i can do is regurgitate what i was told by one of the storjlings, because i was saying the databases didn’t seem that important if they could be restored… apperently it’s not as trivial a thing as i assumed, if one loose ones databases completely then one might as well do a GE because the large portions of the files might not be paid for…

DON’T DELETE YOUR DATABASES.

complex issues are have clear cut answers… so trying to understand it in depth without understanding the nitty gritty of its coding, is borderline impossible, so i’ve stopped trying lol

cdhowie · June 5, 2020, 8:32pm

Here’s the summary of what happens. You are still paid for pieces you store.

The biggest issue IMO is that you will continue to accept pieces even after your storage should be full. This can cause a whole host of other problems.

SGC · June 5, 2020, 8:48pm

yeah that looks about right… not sure what exactly unsent orders mean tho… lol
but ill leave that for those who care, i don’t plan on getting beyond 1% database loss in extreme cases so… meh

joesmoe · June 5, 2020, 10:42pm

I can’t link to it, but somewhere somebody (i think) may have said that without the DB’s, the first download o each file won’t be paid for.

cdhowie · June 6, 2020, 4:20am

My understanding is that someone wanting to download a file from your node has to submit an order for it. Your node collects orders in the database and submits them to the satellite for payment in batches. If the orders database is lost, any orders your node has fulfilled but not yet submitted to the satellite would be unpaid.

joesmoe · June 6, 2020, 10:30am

Yup that’s exactly the post I was looking for - thanks!

BrightSilence · June 6, 2020, 7:15pm

Not true. You would lose out on unsent orders only, which applies to bandwidth usage and usually contains at most the last hour of downloads. So the impact is minimal. You’ll still get paid for storage of all data.

This was about a possible abuse that could happen. Shady uplinks could reuse serials for downloads if they know your node lost the used serials data. However, this only happens if uplinks explicitly exploit this. Which is highly unlikely. So this would likely have no impact at all.

kevink · June 6, 2020, 7:35pm

They would still only be able to download that one piece of a file this specific storagenode has. That wouldn’t make a complete file so it would be almost useless alone. Therefore this specific abuse is a rather theoretical one than a practical. If the uplink knows your node lost the DB, it could now download all the files it has stored on your node 1/80 (or whatever the erasure split was) cheaper than before because downloading the remaining pieces of a file from other nodes will still need to be paid.
Thus making this a very complex to abuse situation with not much gain. The practical probability of someone even trying to abuse this is imho almost non-existant.

joesmoe · June 7, 2020, 8:20pm

Sorry I didn’t mean to start a huge theoritical debate here.

For record sake, i just am doing the GE because I don’t want extra files for years to come, and i’m not sure how it works when i deleted the database so far as payments for the first download.

(not saying its not explained and visible in the code - but it’s all above my head).

SGC · June 7, 2020, 9:15pm

i have trouble enough understanding my own system lol… all i really know is that the storj node needs a good deal of IOPS, and that i shouldn’t tinker with it…

apart from that, in general i find the more one digs into a subject, the more irrelevant the initial understandings and observations usually become, because it’s never really like the simplification that one uses to understand it…

its like when people explain rockets with one of those rockets based on compressed air and water… still sort of a rocket… and it works kinda like a rocket works… but every thing about it is just wrong in so many ways that the understanding learned from it, is not going to help you build a rocket… not even one little bit…

Gross simplifications are most often like that… and tho we can understand stuff like … if we put more fuel on or bigger engines on the rocket, the it will go further or faster… still kinda works… its just never that simple…

i doubt the mechanics of storj is much different… we most likely have next to no clue… even if we hear it from the enginerds then their over simplifications to dumb it down to our level, might make our knowledge mostly useless… ofc some of the people here are most likely more than many of the storj engineers, but when we don’t know shit about it… then picking them out in a crowd is pretty much a impossible.

my bet would be that the person that understands it the best would be the one making the very least sense to the rest of us … lol

BrightSilence · June 7, 2020, 9:28pm

If you’re serious about wanting to understand better how things work. The whitepaper does a really good job at starting at simple concepts and then increasing complexity with every chapter.

It’s a fairly long read, but I loved reading through it because of how it was set up. Just keep reading until it gets too complicated and that way you get the best understanding of the inner workings for the level of technical knowledge you have. Honestly though, I think you’ll have no trouble making it to the end.