Zfs discussions

While zfs tries to aggregate random writes, it does not always work (a heavily fragmented pool etc). It also turns some of what would be sequential reads into random ones.
Also, if you do not use a separate device for zil, it does result in random writes.

So, the performance of raidz is about the same as with a single drive, but in some circumstances it can be faster.

where is no random writes. All writes are sequential! They all write on new area.

So, without slog, it wtires sequentially to the zil and then to the real area? Because it seems that 1) either zil is written semi-randomly or 2) zil is in a slightly different place than where the data finally ends up.

Then again, most of my testing was done with zvols in O_DIRECT mode. Maybe it works differently with normal datasets.

proxmox created my zvol or whatever its called when you create the first pool on somethingā€¦
it set the zvol block or blocksize to 8k
i duno what that meansā€¦ i assume its the smallest block that can be written to the zvol? which the pool exists on or whateverā€¦ so it just made sense that if 32k is the max recordsize then it fits niceā€¦
the zvol blocksize is fixed at creation and cannot like recordsize be changedā€¦ but i really donā€™t know much about it aside from what random bits i picked up here and thereā€¦
i basically just pressed a few buttons to create the first pool i made and am still running onā€¦

iā€™ve seen it says 8k zvol block size or something like that somewhereā€¦ i wanted to change it but found out i couldnā€™t read online that i could increase my throughput, but ended up thinking this was working quite nicelyā€¦ and 64k zvol blocksize seemed crazy so i just left itā€¦ to my knowledge i can change recordsize how i likeā€¦ everything else is datasets inside the poolā€¦ tho i do have two pools ā€¦ one for my OS just in case i want to boot while i try to repair my pool or suchā€¦ saw that in a lecture called zfs for newbiesā€¦ was really goodā€¦and the os is not on the HBAā€™s so i can troubleshoot those as well if disaster strikes.
friends donā€™t let friends use dedup lol

well im sure if one vdev is full it will fill the other oneā€¦ and yes i know one isnā€™t suppose to go past 80% capacity but ā€¦ we will see if i have upgraded when i get thereā€¦
fragmentation looks fine for now lol at 30% full

zil or slog it does not matter. transactions (sync and async) are accumulated in the txg pool and, upon overflowing it or in time, are thrown sequentially on the disk, on free space

O_DIRECT and reflinks not implemented in zfs upon this time.

Yes, zil is in a slightly different place than where the data finally ends up. ZIL is transaction log for sync writes. It can be stay on entire pool in special zone or on other device called SLOG. From ZIL sync writes, combined in records writing to a pool in final place somewhere in free pool space.

small volblocksize + raidz + ashift12 = a lot of wasted space.
With raidz and ashift=12 you should use 64K volblocksize.

so people keep telling meā€¦ i might have to listen eventually if this keeps upā€¦ but yeahā€¦ not going to change this any time soonā€¦ seems to fit fine thus farā€¦ i got 7tb data on drive and it takes 7tb data on driveā€¦ couldnā€™t be much better than that i thinkā€¦ i duno all this zfs voodoo yetā€¦ and i sure wonā€™t change it before i have to or understand why i am changing it.
besides i got no way to change it, for now until i get more drives and more bays iā€™m locked with the setup i gotā€¦ if i wanted it or notā€¦

tho on second thought the last guy that told me to change said it was to improve throughputā€¦

you zPool from Zfs discussions
looks like stripe of two raidz vdevs ie (4+1) + (3+1)
whats show zpool status zPool ? One of device in top 4+1 looks degraded

well technically zfs doesnā€™t stripe vdevsā€¦ so calling it that could lead to confusion, it was why i asked about how it load balanced earlier, because thats basically what it does instead of stripingā€¦ if one vdev is busy the other one takes overā€¦ which i can see on my numbersā€¦ because one was empty when i added it and it has been taking the main part of the dataā€¦

which was also why i added it, because i had a drive that was acting up and thus i wanted to take some load off that vdev/pool at the timeā€¦

donā€™t mind the mess, itā€™s fineā€¦ i just pulled a few drives while everything was runningā€¦ it wasnā€™t to happy about itā€¦ but it should all be thereā€¦ takes a while for it to figure it out tho lol
WHY DO TESTING WHEN YOU CAN DO CRASH TESTINGā€¦ xD xD

  pool: zPool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub in progress since Thu May 14 16:52:34 2020
        12.0T scanned at 464M/s, 8.92T issued at 346M/s, 13.5T total
        68.3M repaired, 66.21% done, 0 days 03:50:13 to go
config:

        NAME                                             STATE     READ WRITE CKSUM
        zPool                                            DEGRADED     0     0     0
          raidz1-0                                       DEGRADED     0     0     0
            wwn-0x5000cca2556e97a8                       ONLINE       0     0     0
            wwn-0x5000cca2556d51f4                       ONLINE       0     0     3  (repairing)
            ata-HGST_HUS726060ALA640_AR11021EH21JAB      ONLINE       0     0     0
            ata-HGST_HUS726060ALA640_AR11021EH2JDXB      DEGRADED     0     0 3.14K  too many errors  (repairing)
            wwn-0x5000cca232cedb71                       ONLINE       0     0     0
          raidz1-3                                       ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_531RH5DGS             ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_Z252JW8AS             ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_99QJHASCS             ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_99PGNAYCS             ONLINE       0     0     0
        logs
          ata-OCZ-AGILITY3_OCZ-B8LCS0WQ7Z7Q89B6-part5    ONLINE       0     0     0
        cache
          ata-Crucial_CT750MX300SSD1_161613125282-part1  ONLINE       0     0     0

errors: No known data errors

on another note is there something i should do for zfs before i do a shutdownā€¦ iā€™ve noticed that even tho the system is running fine, it will sometimes throw disk access errors on the server terminal during final shutdownā€¦ was i suppose to shutdown zfs first with a seperate command rather than just using shutdown.?

ZFS stripe vdevs. You cant extend raidz vdev by adding New hdd to it, but you simply extend entire pool by adding other vdev to it. With single hdd vdev, mirror vdev or raidz vdev. All top level vdevs act as stripe. All New writings, ie records, smooth on top level vdevs, two raidz in you case.

No. Just simple shutdown.

I pay your attention, that BTRFS (Linux implementation), have a lot of issues, I strongly recommend keep away from this filesystem on Linux. Also, this FS was tested by or community for storj purposes and the result is awful.

iā€™m talking about the individual vdevs, like when you add two disk without any kind of mirror or raidz functionā€¦ then zfs will ā€œload balanceā€ between them, which ever is fastest gets the data or goes first i guess,

you say stripe, but it isnā€™t and it is, it will never stripe across both vdevs it will stripe on either one, this improves performance and ensures that a loss of a vdev isnā€™t a loss of the full pool, even tho it would kinda be a random mess of what survives i guessā€¦

learned that in one of the zfs lectures i saw by its developersā€¦ and from what i can see on my own two vdevs it does put more data on the empty one rather than the half full one.

is there a way i can make an array of raidz1 vdevs into a raidz1, ofc i would need to create it like a raidz1 of vdevs and each vdev also being raidz1 and then like with regular raidz1 then disks cannot be removed or added vdevs or individual disks.

been looking for commands to do something like that, but most people writing about it, really doesnā€™t touch much upon advanced feature like thatā€¦ and i really wanted to createā€¦ nested raidz1ā€™s just like hyperscale mirror setups

You should consider that the loss of a vdev is essentially the loss of the pool, even though some information may be recovered. Letā€™s say I had zfs on a single drive, filled it up, added a second drive, write some new files and then the new drive failed.
All my old files are still on the old drive, but it would be difficult to access them.

yeah i know, but still each vdev is essentially a complete partā€¦ so essentially uselessā€¦
unless if you have a vdev die and the critical stuff is on the other vdevā€¦ xD
then iā€™m sure its greatā€¦ duno how zfs will behave then thoā€¦ i should setup a vm for testing zfs setupsā€¦ that could be a lot of fun.

read some very interesting stuff from this one guy that did extensive testing like thatā€¦ i forget what he was testing, but it was interesting lolā€¦ but would be cool to be able to try different setups, to get a better sense with what my options are and how crazy i can go with advanced setupsā€¦

really would like to do raidz1 on raidz1 vdevs

It will stripe New writings across all vdevs

This is impossible. Even with a mirror device.

i can only regurgitate what zfs developers said in lectures, and i can see with a iostat -v that my one vdev gets most of the data because itā€™s emptyā€¦so that seems to verify what i seem to clearly remember the developers said.

i really cannot say, but i donā€™t trust random guy on the internet over a zfs developer when it comes to zfs.
i have no doubt you would do the same, i do think the striping across vdevs is a common myth in the zfs userbase, if iā€™m not mistaken.

but its basically all news to meā€¦ so what to i knowā€¦ rookie with an attitude coming thoughā€¦ xD

writings placed on vdevs relative it free space it is true. But you say

you mean that all data goes only to one of vdev? it is ridiculous.

That is unnecessary. Use the reference guides.

man zpool

A pool can have any number of virtual devices at the top of the configuration (known as ā€œroot vdevsā€). Data is dynamically distributed across all top-level devices to balance data among devices. As new virtual devices are added, ZFS automatically places data on the newly available devices.

Whats why you current pool configuration call ā€œstripe of raidzā€

i never said all the data, it doesnā€™t stripe across the vdevs, which is exactly what it says in the man you just quotedā€¦ dynamically distributedā€¦
writing to only 1 vdev would hurt over all performance so that doesnā€™t make any sense, unless if one got filled before the other, which i assume the dynamic distribution would take into account.

clearly you want to call it a stripe, even tho it isnā€™tā€¦ it doesnā€™t put a stripe on two raidz or mirrors it stripes on either themā€¦ and then puts the next stripe on another vdevā€¦ depending on if its serial or parallel i guess.

those manual like that in linux is pretty handyā€¦ xD now i actually learned something useful