One satellite randomly failed Audit checks and disqualified my node

millersdeathrow · April 11, 2020, 4:15pm

new to storj and been running for about 3 months, server has roughly 70tb in it and has 20tb dedicated to storj node has 100% up time but randomly failed audit check and dropped to 94% but the failed satellite is still downloading data… im at a loss of what happened, internet is fine, ports still open, firewall was configured, no drive failures, just woke up to a DQed node. satellite that got DQed was the bulk of my storage on the network with 4tbs. sooo is it even worth bothering anymore?

nerdatwork · April 11, 2020, 4:35pm

Welcome to the forum @millersdeathrow!

Can you show your dashboard ? Also check your log for audit failure which might help you figure out what happened.

madbitz · April 20, 2020, 5:17pm

Same issue, just had a drop in audit check even though i have been online the whole time. Audit check is on 97% but uptime is still 100%

SGC · April 20, 2020, 5:24pm

audits are testing data integrity afaik
if you failed it means some of your data have become corrupted or otherwise was inaccessible for the storagenode at the time of the audit.

nerdatwork · April 20, 2020, 6:10pm

Check your log for failed and GET_AUDIT entries. Check your score and see the number of failed audits so you can figure out what happened.

madbitz · April 21, 2020, 10:14am

Not sure what number i am looking for? Also, it started picking up after i posted this message. Now today again its going down below 96% Haven’t touched it, just left it alone. So don’t understand whats going on. i did a restart as was having memory issues, which has been ongoing for a long time before i installed storj. Apart from that. nothing else. Never had any issues on the older version.

nerdatwork · April 21, 2020, 11:14am

Use the script available here to find out how many audits are failing. Check your log for failed and “GET_AUDIT” which will show the error next to them.

madbitz · April 21, 2020, 2:03pm

Getting a lot of upload failed, Why, ports are open, drives are ok. Not sure i understand why this is happening.
Errors such as:
2020-04-19T23:00:05.066+0100 ERROR piecestore failed to add order {“error”: “ordersdb error: database is locked”, “errorVerbose”: "ordersdb error: database is

2020-04-20T00:35:17.558+0100 ERROR piecestore failed to add bandwidth usage {“error”: “bandwidthdb error: database is locked”, “errorVerbose”: "bandwidthdb error: database is

SGC · April 21, 2020, 2:07pm

the database is locked is usually an issue with samba / network storage…

if you got issues with your memory it might be a very wise idea to figure out what is wrong with it… memory is pretty important for data reliability.

madbitz · April 21, 2020, 2:08pm

Not 100% its memory as a memory test comes back as good. strange. Anyone got any suggestions for reliable mem test software?
Just tried again, not memory issue. Any other suggestions.

SGC · April 21, 2020, 4:14pm

didn’t say it was memory, i said long term memory issues can come and bit you in the arse…

i said that the database locked thing is often related the some types of network drives, which is something many storagenode operators run into face first…

else i would suggest you share a recent node boot log (first part of a new log or latest restart of a node will make it write “boot” information to your log.)
aside from that some more detailed information about what kind of setup you have…

odd of us guessing blindly what’s gone wrong is not very high aside from on most common issues… like say… NETWORK DRIVES… i said it again, because 3rd time’s the charm and maybe you will actually tell us about the network drive you thought you could run storj on…

another thing that comes to mind is mounts… its states in the storj setup documentation that incorrect docker mounts can lead to failure of audits. or over allocation of space for the storage node, or using non mmap support file systems.

https://documentation.storj.io/setup/cli/storage-node

madbitz · April 21, 2020, 4:42pm

Sure NP, i am running 1 pc with hyper-v on win10 on both host and virtual machine. Drives are all attached in the host machine via sata. I am running 2x5tb pre formatted 3.5’" drives with 4.54Tb available space on each so running the two drives in Two-way mirror on a storage pool.

i7 processor 4 core, 3 virtual processors used in hyper v and rest as system. 8gb mem. short i know, really needs 16 or more. Needs upgrade.

This aside, i was running ok before until the new version. i filled my drive with 2tb as a test. But stupidly i tried to move to more disk space. epic fail, didn’t work so had to start all over again with new token etc. whilst i was in this process i added the second drive in a pool instead of as a single drive.
Apart from this, that’s the only thing changed.

Been running this node, i think since Friday last week, was ok until yesterday when the audit started dropping, never dropped at all on my original 2Tb setup. Available disk space was still 4.54tb. But was just testing the water with 2Tb. Only other thing that has changed is i have a new set of network power adaptor plugs. Faster than the original pair or power adaptors, so should be a benefit.
Can i just add, love the new payout feature in dashboard. Helps a lot.

SGC · April 21, 2020, 5:39pm

okay, alittle more informed, tho not much wiser lol…

maybe if we apply some deduction, since everything else seems to work fine, and a two way mirror would be quite redundant so i would exclude disk failures, cable failures, data corruption and since it’s a new node i assume it wasn’t filled beyond capacity…

maybe some power management that spun down the drives causing them not to respond fast enough for the audit… tho not sure if thats possible…

so yeah… check that power management for turn off harddrives that needs to be off…
i know some people with usb drives have had that issue…

else some windows storage space issue… maybe a disk failure, but that usually slows down the mirror array to crawl… i assume thats a clever engineers fault for making it a way to broadcast that a drive is going bad… but i kinda doubt that it is what is troubling you… but worth verifying that you can read and write correctly to the drive and check that the smart on the harddrives aren’t indicating some sort of trouble…

but yeah my money is on power management… else i’m running low on ideas… for simple issues…
atleast you crashed and burned hard… better to do that a few times now, rather than a year in or so… so there is atleast that in the good news category xD

madbitz · April 21, 2020, 9:01pm

Can i ask, why are the audit values different depending on which satellite you chose from the drop down menu on the console.
On some satellites i am still showing 100% for Audit

SGC · April 21, 2020, 9:27pm

each satellite sends and keep track of different data…
they can only audit their own customers data… some time in the future most of not all of them will be merged into working together… but for now they are their own seperate “internet raid controller” and we are their drives… well if you are still at 100% on the others then it’s a sign that your error is a minor one… such as power management. or similar issues.

satellites are also incharge of paying the SNO’s for their services… and so SNO can essentially choose between them, on a later date… so like say if you decided you didn’t trust a certain company having a certain satellite to pay you… then you can block it…

but i’m really not very well informed in all this detailed stuff… i’m mostly just a well informed novice / rookie

Alexey · April 22, 2020, 8:28am

Powershell:

sls "GET_AUDIT" "C:\Program Files\Storj\Storage Node\storagenode.log" | sls "failed"