Copied my node, now i've gotten atleast 1 failed audit

SGC · May 4, 2020, 8:27am

bleep me side ways…
i just had an audit fail out of 542 less than 24hours after booting up my node on it’s new location on the same drive, it seems my rsync failed to copy the node correctly…

used this, ran it like 3-4 times before shutting down the node, then it started finishing in good time, and i added the delete parameter after checking exactly what it did xD
then i ran that, which also finished in good time… then shutdown the node and ran it again a few times…

then tried to compare the sizes of the datasets… but that was basically impossible because of different compressions on zfs… and that the one was just a folder in a already used dataset
looked kinda alright so i figured rsync most likely worked like it was suppose to…
i may have been able to figure out if they where the same, and i guess i should have…

i’ve also run a scrub with no errors, so all checksums are correct, so i should be able to exclude a data error… it has to be an inability to copy the node folder correctly.

also i still have to old data, so in theory i should be able to fix it… atleast if my node gave me the option xD, keep calm and keep storjing … i’m sure it will be fine a few failed audits never hurt anyone right lol

used this command to copy the data…
how did i go wrong???

rsync -u -avHAXx --delete /zPool/storj /zPool/storagenodes/storj --progress -W -B=8192

also i strong assume my node will most likely survive and that there isn’t a way to fix this…
but i would like to avoid this again and help prevent others from doing the same.

andrew2.hart · May 4, 2020, 8:32am

Is it a failed audit or just a failed attempt that got re-tried (and good) later?

Also, you found a compression that works with encrypted data???

SGC · May 4, 2020, 8:48am

right i should find my log… never had a failed audit before… so i doubt the data will be there, i would guess that the copy was not 100%.
zfs with compression doesn’t compress the encrypted data, but it does compress the non used space of blocks written and thus it does take up different amounts of space…

i switched from lz4 to zle so that my zfs wouldn’t tried to compress the encrypted data all the time and just do zero length encoding, basically using short hand for writing many zero’s in blocks and files…
only gives me 2% less written space when working with 256k size records.

looking at logs now… apparently i duno how to find a failed audit in my logs xD
trying to figure out what to search for, will post the related log information when i find it

SGC · May 4, 2020, 9:11am

This looks odd…

2020-05-03T16:49:13.464Z	INFO	piecestore	download started	{“Piece ID”: “2G3ZYVAJLEOITB7YH2HN4B26KS4AGOA4Z24PWTKJRX3I47NBIXIA”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET”}
2020-05-03T16:49:21.408Z	INFO	piecestore	downloaded	{“Piece ID”: “2G3ZYVAJLEOITB7YH2HN4B26KS4AGOA4Z24PWTKJRX3I47NBIXIA”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET”}
2020-05-03T16:54:12.963Z	INFO	piecestore	download started	{“Piece ID”: “2G3ZYVAJLEOITB7YH2HN4B26KS4AGOA4Z24PWTKJRX3I47NBIXIA”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET”}
2020-05-03T16:54:22.978Z	ERROR	piecestore	download failed	{“Piece ID”: “2G3ZYVAJLEOITB7YH2HN4B26KS4AGOA4Z24PWTKJRX3I47NBIXIA”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET”, “error”: “usedserialsdb error: database is locked”, “errorVerbose”: “usedserialsdb error: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(usedSerialsDB).Add:35\n\tstorj.io/storj/storagenode/piecestore.(Endpoint).verifyOrderLimit:76\n\tstorj.io/storj/storagenode/piecestore.(Endpoint).doDownload:523\n\tstorj.io/storj/storagenode/piecestore.(drpcEndpoint).Download:471\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:995\n\tstorj.io/drpc/drpcmux.(Mux).HandleRPC:107\n\tstorj.io/common/rpc/rpctracing.(Handler).HandleRPC:66\n\tstorj.io/drpc/drpcserver.(Server).handleRPC:111\n\tstorj.io/drpc/drpcserver.(Server).ServeOne:62\n\tstorj.io/drpc/drpcserver.(Server).Serve.func2:99\n\tstorj.io/drpc/drpcctx.(Tracker).track:51”}

SGC · May 4, 2020, 9:17am

looks like most of my 80 download failed entries shows that database locked issue.
as posted above

========== AUDIT ==============
Critically failed:     0
Critical Fail Rate:    0.000%
Recoverable failed:    1
Recoverable Fail Rate: 0.184%
Successful:            542
Success Rate:          99.816%
========== DOWNLOAD ===========
Failed:                80
Fail Rate:             0.675%
Canceled:              224
Cancel Rate:           1.889%
Successful:            11551
Success Rate:          97.436%
========== UPLOAD =============
Rejected:              9
Acceptance Rate:       99.990%
---------- accepted -----------
Failed:                0
Fail Rate:             0.000%
Canceled:              17238
Cancel Rate:           19.572%
Successful:            70835
Success Rate:          80.428%
========== REPAIR DOWNLOAD ====
Failed:                0
Fail Rate:             0.000%
Canceled:              0
Cancel Rate:           0.000%
Successful:            14
Success Rate:          100.000%
========== REPAIR UPLOAD ======
Failed:                0
Fail Rate:             0.000%
Canceled:              271
Cancel Rate:           18.261%
Successful:            1213
Success Rate:          81.738%
========== DELETE =============
Failed:                0
Fail Rate:             0.000%
Successful:            1580
Success Rate:          100.000%

andrew2.hart · May 4, 2020, 9:20am

Look in other threads for the locked issue. If it is an issue.

Maybe your audit is OK after all

SGC · May 4, 2020, 9:21am

looks like it might be a storagenode software issue in 1.3.3
because i ofc updated while the node was down anyways…