While zfs tries to aggregate random writes, it does not always work (a heavily fragmented pool etc). It also turns some of what would be sequential reads into random ones.
Also, if you do not use a separate device for zil, it does result in random writes.
So, the performance of raidz is about the same as with a single drive, but in some circumstances it can be faster.
So, without slog, it wtires sequentially to the zil and then to the real area? Because it seems that 1) either zil is written semi-randomly or 2) zil is in a slightly different place than where the data finally ends up.
Then again, most of my testing was done with zvols in O_DIRECT mode. Maybe it works differently with normal datasets.
proxmox created my zvol or whatever its called when you create the first pool on somethingā¦
it set the zvol block or blocksize to 8k
i duno what that means⦠i assume its the smallest block that can be written to the zvol? which the pool exists on or whatever⦠so it just made sense that if 32k is the max recordsize then it fits niceā¦
the zvol blocksize is fixed at creation and cannot like recordsize be changed⦠but i really donāt know much about it aside from what random bits i picked up here and thereā¦
i basically just pressed a few buttons to create the first pool i made and am still running onā¦
iāve seen it says 8k zvol block size or something like that somewhere⦠i wanted to change it but found out i couldnāt read online that i could increase my throughput, but ended up thinking this was working quite nicely⦠and 64k zvol blocksize seemed crazy so i just left it⦠to my knowledge i can change recordsize how i like⦠everything else is datasets inside the pool⦠tho i do have two pools ⦠one for my OS just in case i want to boot while i try to repair my pool or such⦠saw that in a lecture called zfs for newbies⦠was really goodā¦and the os is not on the HBAās so i can troubleshoot those as well if disaster strikes.
friends donāt let friends use dedup lol
well im sure if one vdev is full it will fill the other one⦠and yes i know one isnāt suppose to go past 80% capacity but ⦠we will see if i have upgraded when i get thereā¦
fragmentation looks fine for now lol at 30% full
zil or slog it does not matter. transactions (sync and async) are accumulated in the txg pool and, upon overflowing it or in time, are thrown sequentially on the disk, on free space
O_DIRECT and reflinks not implemented in zfs upon this time.
Yes, zil is in a slightly different place than where the data finally ends up. ZIL is transaction log for sync writes. It can be stay on entire pool in special zone or on other device called SLOG. From ZIL sync writes, combined in records writing to a pool in final place somewhere in free pool space.
so people keep telling me⦠i might have to listen eventually if this keeps up⦠but yeah⦠not going to change this any time soon⦠seems to fit fine thus far⦠i got 7tb data on drive and it takes 7tb data on drive⦠couldnāt be much better than that i think⦠i duno all this zfs voodoo yet⦠and i sure wonāt change it before i have to or understand why i am changing it.
besides i got no way to change it, for now until i get more drives and more bays iām locked with the setup i got⦠if i wanted it or notā¦
tho on second thought the last guy that told me to change said it was to improve throughputā¦
you zPool from Zfs discussions
looks like stripe of two raidz vdevs ie (4+1) + (3+1)
whats show zpool status zPool ? One of device in top 4+1 looks degraded
well technically zfs doesnāt stripe vdevs⦠so calling it that could lead to confusion, it was why i asked about how it load balanced earlier, because thats basically what it does instead of striping⦠if one vdev is busy the other one takes over⦠which i can see on my numbers⦠because one was empty when i added it and it has been taking the main part of the dataā¦
which was also why i added it, because i had a drive that was acting up and thus i wanted to take some load off that vdev/pool at the timeā¦
donāt mind the mess, itās fine⦠i just pulled a few drives while everything was running⦠it wasnāt to happy about it⦠but it should all be there⦠takes a while for it to figure it out tho lol
WHY DO TESTING WHEN YOU CAN DO CRASH TESTING⦠xD xD
pool: zPool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub in progress since Thu May 14 16:52:34 2020
12.0T scanned at 464M/s, 8.92T issued at 346M/s, 13.5T total
68.3M repaired, 66.21% done, 0 days 03:50:13 to go
config:
NAME STATE READ WRITE CKSUM
zPool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
wwn-0x5000cca2556e97a8 ONLINE 0 0 0
wwn-0x5000cca2556d51f4 ONLINE 0 0 3 (repairing)
ata-HGST_HUS726060ALA640_AR11021EH21JAB ONLINE 0 0 0
ata-HGST_HUS726060ALA640_AR11021EH2JDXB DEGRADED 0 0 3.14K too many errors (repairing)
wwn-0x5000cca232cedb71 ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
ata-TOSHIBA_DT01ACA300_531RH5DGS ONLINE 0 0 0
ata-TOSHIBA_DT01ACA300_Z252JW8AS ONLINE 0 0 0
ata-TOSHIBA_DT01ACA300_99QJHASCS ONLINE 0 0 0
ata-TOSHIBA_DT01ACA300_99PGNAYCS ONLINE 0 0 0
logs
ata-OCZ-AGILITY3_OCZ-B8LCS0WQ7Z7Q89B6-part5 ONLINE 0 0 0
cache
ata-Crucial_CT750MX300SSD1_161613125282-part1 ONLINE 0 0 0
errors: No known data errors
on another note is there something i should do for zfs before i do a shutdown⦠iāve noticed that even tho the system is running fine, it will sometimes throw disk access errors on the server terminal during final shutdown⦠was i suppose to shutdown zfs first with a seperate command rather than just using shutdown.?
ZFS stripe vdevs. You cant extend raidz vdev by adding New hdd to it, but you simply extend entire pool by adding other vdev to it. With single hdd vdev, mirror vdev or raidz vdev. All top level vdevs act as stripe. All New writings, ie records, smooth on top level vdevs, two raidz in you case.
I pay your attention, that BTRFS (Linux implementation), have a lot of issues, I strongly recommend keep away from this filesystem on Linux. Also, this FS was tested by or community for storj purposes and the result is awful.
iām talking about the individual vdevs, like when you add two disk without any kind of mirror or raidz function⦠then zfs will āload balanceā between them, which ever is fastest gets the data or goes first i guess,
you say stripe, but it isnāt and it is, it will never stripe across both vdevs it will stripe on either one, this improves performance and ensures that a loss of a vdev isnāt a loss of the full pool, even tho it would kinda be a random mess of what survives i guessā¦
learned that in one of the zfs lectures i saw by its developers⦠and from what i can see on my own two vdevs it does put more data on the empty one rather than the half full one.
is there a way i can make an array of raidz1 vdevs into a raidz1, ofc i would need to create it like a raidz1 of vdevs and each vdev also being raidz1 and then like with regular raidz1 then disks cannot be removed or added vdevs or individual disks.
been looking for commands to do something like that, but most people writing about it, really doesnāt touch much upon advanced feature like that⦠and i really wanted to create⦠nested raidz1ās just like hyperscale mirror setups
You should consider that the loss of a vdev is essentially the loss of the pool, even though some information may be recovered. Letās say I had zfs on a single drive, filled it up, added a second drive, write some new files and then the new drive failed.
All my old files are still on the old drive, but it would be difficult to access them.
yeah i know, but still each vdev is essentially a complete part⦠so essentially uselessā¦
unless if you have a vdev die and the critical stuff is on the other vdev⦠xD
then iām sure its great⦠duno how zfs will behave then tho⦠i should setup a vm for testing zfs setups⦠that could be a lot of fun.
read some very interesting stuff from this one guy that did extensive testing like that⦠i forget what he was testing, but it was interesting lol⦠but would be cool to be able to try different setups, to get a better sense with what my options are and how crazy i can go with advanced setupsā¦
i can only regurgitate what zfs developers said in lectures, and i can see with a iostat -v that my one vdev gets most of the data because itās emptyā¦so that seems to verify what i seem to clearly remember the developers said.
i really cannot say, but i donāt trust random guy on the internet over a zfs developer when it comes to zfs.
i have no doubt you would do the same, i do think the striping across vdevs is a common myth in the zfs userbase, if iām not mistaken.
but its basically all news to me⦠so what to i know⦠rookie with an attitude coming though⦠xD
writings placed on vdevs relative it free space it is true. But you say
you mean that all data goes only to one of vdev? it is ridiculous.
That is unnecessary. Use the reference guides.
man zpool
A pool can have any number of virtual devices at the top of the configuration (known as āroot vdevsā). Data is dynamically distributed across all top-level devices to balance data among devices. As new virtual devices are added, ZFS automatically places data on the newly available devices.
Whats why you current pool configuration call āstripe of raidzā
i never said all the data, it doesnāt stripe across the vdevs, which is exactly what it says in the man you just quoted⦠dynamically distributedā¦
writing to only 1 vdev would hurt over all performance so that doesnāt make any sense, unless if one got filled before the other, which i assume the dynamic distribution would take into account.
clearly you want to call it a stripe, even tho it isnāt⦠it doesnāt put a stripe on two raidz or mirrors it stripes on either them⦠and then puts the next stripe on another vdev⦠depending on if its serial or parallel i guess.
those manual like that in linux is pretty handy⦠xD now i actually learned something useful