Storage node to Satellite : I've lost this block, sorry

andrew2.hart · May 21, 2020, 1:09pm

Just a simple scrub. If a block is lost tell the satellite.

This is better for both ends. The satellite wants to know and the storagenode would prefer not to get disqualified.

The storagenode pays for this leniency by disk bandwidth and power.
#enable scrub constant, weekly, monthly
#scrub=weekly

The satellite manages this using reputation calculations (and held amount)
So you can’t use it to cheat or graceful exit

Pentium100 · May 21, 2020, 1:28pm

This is similar to my idea about restoring from backup.

But yeah, I would like to at least have a list of pieces my node is supposed to have so I can check myself to see if there is some data loss going on.

andrew2.hart · May 21, 2020, 1:34pm

Yes. I thought mine is simpler though. (and yes I have a bad block - what can you do!?)

Pentium100 · May 21, 2020, 1:54pm

Use RAID to prevent that.

kevink · May 21, 2020, 3:25pm

only a few bad files/blocks dont dq you.

SGC · May 21, 2020, 3:35pm

posts like this makes me want to scrub my pool… but 48 hours between scrubs seems excessive… on the other hand my ingress is pretty low at the moment…

BrightSilence · May 22, 2020, 7:38am

At some point you’re just wearing down your disks.

SGC · May 22, 2020, 7:41am

afaik they can read lots and lot… i mean they are rated for like 550TB throughput… that means write cycles… but thats most likely the spindle magnetic coating or whatever it is that loosing its ability to magnetize or hold it.

without patrol reads or scrubs one won’t know when a disk is bad…
usually my LSI raid controller would run a patrol read every week by default.

BrightSilence · May 22, 2020, 7:54am

Yes, but I feel like you’ve started a new scrub at least 4 times this week, based on your comments. Every two weeks is more than enough.

SGC · May 22, 2020, 8:46am

more like 6 in the first 3 days… lol TWO WEEKS!!! i was thinking something more like 4-5 maybe 6

have you ever heard the term you cannot beat a dead horse… this is what they mean… if the hdd dies and one doesn’t notice it during stuff like monitoring disk information, then a scrub is suppose to verify it’s bad or atleast throw the disk into giving hints of performance issues.

if i scrub every 2 weeks then i could easily have two drives … i suppose the odds are very size dependent … well i could have 1 drive lets say during the end of the last scrub… doesn’t get to throw errors, it just starts to tank in performance.

then i might be essentially running without redundancy in 14 days… because my redundant drive cannot handle a full read without dying and i’m at the mercy of the condition of the other drives…

so i would rather beat drives to death with scrubbing than have my array die because i wasn’t aware of a developing issue.

besides… the storagenode is down again… ingress and egress is like 300kb/s

so perfect time for a scrub, also my array have had lot and lots of issues with reads and writes… for now i don’t really trust by backplane… so going a bit more crazy than i usually would…

BrightSilence · May 22, 2020, 9:18am

Just google for some advise on this. From what I can tell two weeks seems the most common, some do once a month, some do every week. Nobody does 6x in 3 days…

I personally do once a month. But I have a RAID6 array and it takes quite a bit longer to do compared to raidz2. I also do monthly extended SMART tests and daily quick SMART tests. That’s probably already overkill for my use.

BrightSilence · May 22, 2020, 9:26am

To get back on track though. If I were a bad person I would definitely tell the satellite that I accidentally lost the least downloaded 10% of pieces every week.

Pentium100 · May 22, 2020, 10:19am

OTOH, how about treating that as a self-report audit failure?

Let’s say there was some kind of a problem where I may have lost or corrupted some data, but I do not know if it’s enough to get me disqualified or not. I run a script that figures out how many files are missing and tells the satellite, the satellite then disqualifies me (or allows doing partial GE) or the data loss was not enough to get DQ, but now I know that and do not have to wait for DQ.

SGC · May 22, 2020, 10:32am

lol thats awesome…
well like i said, i’ve recently had a lot of trouble with my backplane, throwing out write and read errors left and right due to bad connections to the drives… didn’t show up on any smart tests.

and i’m just about 20% in to my scrub of my pool already… so it’s barely even a bother… also the storj network is slow right now… so might as well utilize the time… if traffic is high and i trusted my backplane more i might do every week or less…
but i would still argue that it’s good practice to run through the data often to weed out bad drives… ofc on a raid6 you also have the redundancy to survive it… i duno what common practice is… i know scrubbing eats a ton of performance when its running… so it makes sense to minimize it for that reason alone…

for now and until i actually get some more redundant setups made, i’m stuck with scrubbing…
i mean to run semi efficient raidz2 / raid6 then i would need like 8 or 10 drives pr raid
anything less like say 6 drives, then redundancy would be 33% of the pool capacity and i could do mirrors for 50% which are better and safer in almost every way… aside from their write performance isn’t nearly as great and their IO starts out halved… ofc in the end many mirrors end up having much much higher io than any raid5 or raid6 can do within reasonable prices.

another thing would be that running mirrors which i actually kinda want to… is that zfs can add and remove them to existing pools without trouble… meaning all the advantages of raid and basically none of the flaws.

ofc one does pay by getting only ½ capacity… so there is that… but yeah … sadly i think with zfs it’s the only way that makes good sense… long term… if nothing else then to use a mirror pool for the avg everyday adapting pool, and then have some more slower storage based on “raid6” with 10 drives

not sure one can tier storage in zfs tho… might need some other layer above it to do that between the different pools.

isn’t nearly all pieces the least downloaded every week… i mean i get like at best 500kb/s avg egress

so thats 300gb in a month out of over 7tb stored… and within that is the repeat cases which then becomes like well being the optimist that i am lets say 10% that get downloaded more than once…

because test data…

so thats 10% of 4% so 0.4% … i’m not sure storj would be happy if you reported that you lost 99.6% of your data every week… and how would you know which data would be downloaded in the future because you are not predicting you are postdicting…

SGC · May 22, 2020, 10:34am

i actually really like that idea…
would make sense to just get it over with, in case of failure… and recover what can be recovered.

BrightSilence · May 22, 2020, 10:42am

That’s an interesting suggestion. Though it would come with some challenges.
I looked at some stats for last month. My node stored on average 7.5TB and all audits added up to 6.9MB. So less than 1/1,000,000th of data is audited every month. If you would report all missing files as a failed audit, you’d probably instantly disqualify a node that would have easily survived just a few missing files.
So there’s have to be some system that weighs these self reported failures against normal audits.

Currently traffic is quite homogenic. With more mixed customer use in the future, this will change eventually though. You can’t build a system for now, it needs to work in all scenarios.

BrightSilence · May 22, 2020, 10:46am

My NAS has an option to lower the impact of system performance by slowing down the process. With a reasonable frequency of scrubbing this is a decent option to prevent annoying slowdowns. You don’t want to rely on manually triggering this stuff anyway. Schedule it frequently and just let it do its thing.

SGC · May 22, 2020, 10:56am

well i’m not quite done with my system yet… so haven’t automated everything i need to… just like i haven’t gotten my watchdog to work… i should really also get that checked…

my system is pretty adaptive, going to be setting up a BXE server on the pool tonight… then my main computer will run it’s OS from the pool lol
thats going to be fun, or terrible… but from what i can see it should work fine… my pool might have many configuration errors, but it manages fairly okay.

yes i should automate scrubs … but for now i like to do it when the traffic is low… maybe ill setup some sort of trigger for that… and include some time factors on to that… the scrub in zfs i’m not sure can be adjusted… but it does take less priority than everything else… but still my pool does get like 80ms latency atleast from what i can measure, but not sure thats real for actual requested important data… that might just jump ahead which the iostat metric might not account for…

seems to run while when scrubbing also… but when i get my OS on the pool it should give me a much better sense of how it really reacts…

the graphs can only give one so much… the other day i had this weird latency thing what didn’t show up anywhere… all the monitoring data i got back was fine… but in nano the … marker blinky thing a majig
would lag … never did figure out why tho and it seemed to vanish after a while…

Toyoo · May 22, 2020, 1:37pm

Only if the incentive is right. If it’s treated exactly like an audit failure, I’d disable it because I’d pay in additional I/O of the scrub only for my audit score to potentially go lower. On the other side if it didn’t cost anything, people would abuse it to have instant graceful exit without having to transfer data out and without the node having to be old enough.

andrew2.hart · July 2, 2020, 7:36am

Just an update to say that I rsynced the data to another disk and there were 300 incomplete files but the disk is OK. I think the cause was a power fault where the ssd failed to write.
A scrub could have fixed the problem with about 500M of download, those cents could have been taken from my held amount and saved me a day