The goal is not to make use of as much available IOPs as possible, nor did I say that in the statement above. I said “utilize only available IOPs”, not “maximize use of IOPs headroom”.
It’s an implementation detail of the high level design goal: run smart test without affecting user workloads.
The focus is to minimize impact on user initiated transactions, not run smart test as fast as possible.
Naive algorithm, that would try to cram smart related IO at every opportunity, will inevitably interfere with user traffic. Waiting until air is clear is a good and simple strategy to avoid that: this also ensures that smart is running only when user IO is infrequent enough that occasionally delaying some of it has low impact.
Even better approach could be to analyze access timing (most are very repeatable) and better predict idle periods. But this extra complexity is 100% unnecessary. The existing algorithm is good enough at minimizing interference and it does not need to be very efficient at utilizing all available IO — nobody cares how long does smart test take as long as it does not interfere with important tasks.
We try to be productive here with arguments and couterargumets, ignoring the personal attributes of the “hominem” who makes them.
Otherwise we would have another Facebook and co. You can’t learn anything usefull from Facebook.
I don’t know about how long are you a member, but AR is one of the oldest here, with a lot of knowledge and helped a lot of us.
And trust me, he’s been called worse than arrogant , and he usualy ignores the personal attacks.
Hehe, mine is random too, generated with diceware. It’s domain name for random stuff along with GitHub account; I came up with it by rolling dice until I got word combination that is not offensive. And here I login with the same GitHub account. I use other nicks on other forums of course, that’s the beauty of diceware generated names – helps avoid second most complex problem in computer science after cache invalidation.
I did not flag it, but perhaps whoever did was because it does violate forum rules.
Actually, @xgDkAbzkp9yi, a pending sector is converted to a reallocated sector (=the marked bad sector is reassigned to one of the spare ones) only when a write operation is performed on that sector.
A SMART (long=extended) test does not write any data to the drive. It simply tries to read the entire surface (and by extension tests the head + actuator), hence it only marks sectors as bad. A subsequent write operation (by normal usage, outside of the test) will actually cause the sector to be re-allocated.
A pending sector can be marked as good if there is at least one successful read on it while it is marked as bad.
Please do not correct me. I’m too tired to look up sources.
I’m not sure about the parallel -P 16 flag. Seems to be faster than single thread, but it is anyway limited by the disk iops.
Then I look into the stderr file. I get a lot of could not open file xxx.sj1: open xxx.sj1: no such file or directory. That is probably a file deleted by the node after the start of the find command.
The wrong checksum looks like this: valid-looking sj1 blob with hash mismatch. I haven’t had that one yet. I just corrupted a copy of one sj1 file by hand to see the output.
The disk error gives I/O error. So it is easy to spot that file and delete it. So far I found two sj1 files and two unreadable sectors. As there are not many unreadable sectors, they may be hand corrected.
First look into dmesg to get the sector LBA:
[ +0.000002] critical medium error, dev sdb, sector 83878328 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 2
[ +0.000003] critical medium error, dev sdb, sector 83878328 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.000003] critical medium error, dev sdb, sector 83878328 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.000003] critical medium error, dev sdb, sector 1307823336 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[ +0.000002] critical medium error, dev sdb, sector 1307823336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.000002] critical medium error, dev sdb, sector 1307823336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.000002] critical medium error, dev sdb, sector 1307823336 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[ +0.000003] critical medium error, dev sdb, sector 1307823336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.000002] critical medium error, dev sdb, sector 1307823336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.000002] critical medium error, dev sdb, sector 1307823336 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Here are two unreadable sectors at LBA 83878328 and 1307823336. You may check by
sudo hdparm --read-sector 83878328 /dev/sdb
Then you try to rewrite and relocate that sector by
sudo hdparm --write-sector 83878328 /dev/sdb
Afterwards you can check, if the sector is readable again with hdparm --read-sector.
You may also look at sudo smartctl -a /dev/sdb for reallocated and pending sectors:
Looks like two sectors were reallocated. Not sure if there are still some unreadable sectors. I’ll wait until the sj1 check finishes and then give it another smart long test.
Absolutely right. SMART test just reads. It never repairs unreadable sector. You need to write to that sector. If there are just a few, use hdparm --write-sector. If there is a lot of them, use badblocks -w and re-write the whole disk (after you recovered all readable data, of course).
You treat these bad block as if were one time thing. If the disk starts to see bad blocks, than more and more will come, let the system deal with them. It’s not like you see 5 bad blocks, you do a week of tests and repairs, you finish with those 5 baddies and than your drive is brand new and no more baddies for the next 5 years.
Sometimes you are correct, sometimes you are not. Some drives develop a bunch of bad sectors and after correcting them there are no new ones for years. On contrary, some drives develop new bad sectors all the time, as you said. It depends.
Here comes the need for periodical SMART long tests. I do it on all my disks once a month.
Small numbers of reallocated sectors don’t mean anything. It’s a number that keeps growing that is the problem.
Example: toshiba with 16 reallocated sectors since the first pre-install test, still humming along fine after 2 years of dedicated storj use and not a single sector increase in that.
Anyway, just to end the thread: when you get your brand new drive (does not apply to used drives) run a long scan on it, then a zero pass (ie write zeros to the entire surface until out of space), then re-read it back (either a long test or a read operation). This will make sure your drive doesn’t fail within the first week you install it. If a drive that passes that, fails within 10 years, I’ll be extremely surprised (27 years in the industry, I have never seen a drive that passed that and failed within 10 years).
For used drives, don’t even bother, just throw them in service immediately upon unboxing. Well, maybe a conveyance test just to make sure the actuator isn’t jammed somewhere, but nothing more than that. Everything that the drive would have identified, has already been identified through its previous usage.
The SMART long test I know how to do it in linux, but how to write and read the entire drive with zeros? Is there a tool? Or something in linux (ubuntu)?
It happens that I have a brand new Exos that awaits to get in production.
Maybe you’re right, I don’t have expirience with testing drives and checking for bad sectors. On old drives, 20 years ago, I did tests, because they were way smaller, and I saw baddies, but since the ssd became so popular, I stopped using HDDs. Storj got me back into them.
I agree with your post entirely apart from the last part about used drives.
Used drives needs full read-write-read pass, especially them. Otherwise you are risking encountering read or write error during production. It’s better to catch all bad sectors before comissioning, not after.
Do not mess up, otherwise you will understand why dd is named dd (=disk destroyer).
Don’t use a bigger bs size, just let it run as is, no matter how long it takes.
As far as read goes, just another long test after dd is done.
If a drive is used, that means that most of the sectors should have been read already. SMART doesn’t need a test in order to mark a bad sector. If it encounters a bad sector while normal usage, it will still be marked as bad, and a write operation later will reallocate it.
Still, I disagree not to run any diagnostics when encountering a used drive. Earlier in this thread I mentioned that I bought 2TB drive for pennies on eBay, when I run all my tests on it, more bad sectors have been discovered and marked out of use. I kept scanning it using SpinRite and then SMART long test, until no more bad sectors were discovered by each pass. Drive has stabilized at 184 bad sectors, and it’s working well to this day.
I think it’s unwise not to do any tests and diagnostics on newly acquired hardware. Same goes for CPU, GPU and hard drives, they should be fully stress tested before commissioning into some meaningful use.
vendor matters. I would not buy a disk from a random dude and expect it not to have been dropped on concrete thrice in previous life.
I agree about not testing used drives. I also would not test new drives if for some reason world runs out of used ones and I had to buy a new one.
Here is the thought process.
Case of a good drive, with checking:
spend two days running tests
start resilvering array, ready for use immediately
Net outcome: wasted 2 days.
Case of a good drive, without checking:
start resilvering array. Ready for use immediately
Net outcome: no time wasted.
Case of a bad drive, with checking
spend up to two days running tests.
bad drive identified, restart the process.
Net outcome: spent up to two days but found bad drive. Did not sacrifice array performance on the resilver that would not need to be redone.
Case of a bad drive, without checking
start resilvering. Ready for use immediately.
bad disk identified after resolver completes or fails restart the process.
No time wasted, sacrificed some performance on a resilver that would need to be redone.
Therefore, since probability that the disk is bad is very low, not checking is much preferred in the long run: suffering low performance seldom is much preferred than wasting two days every time