Zfs discussions

Pentium100 · May 14, 2020, 9:57pm

While zfs tries to aggregate random writes, it does not always work (a heavily fragmented pool etc). It also turns some of what would be sequential reads into random ones.
Also, if you do not use a separate device for zil, it does result in random writes.

So, the performance of raidz is about the same as with a single drive, but in some circumstances it can be faster.

Krey · May 14, 2020, 10:04pm

where is no random writes. All writes are sequential! They all write on new area.

Pentium100 · May 14, 2020, 10:09pm

So, without slog, it wtires sequentially to the zil and then to the real area? Because it seems that 1) either zil is written semi-randomly or 2) zil is in a slightly different place than where the data finally ends up.

Then again, most of my testing was done with zvols in O_DIRECT mode. Maybe it works differently with normal datasets.

SGC · May 14, 2020, 10:11pm

proxmox created my zvol or whatever its called when you create the first pool on something…
it set the zvol block or blocksize to 8k
i duno what that means… i assume its the smallest block that can be written to the zvol? which the pool exists on or whatever… so it just made sense that if 32k is the max recordsize then it fits nice…
the zvol blocksize is fixed at creation and cannot like recordsize be changed… but i really don’t know much about it aside from what random bits i picked up here and there…
i basically just pressed a few buttons to create the first pool i made and am still running on…

i’ve seen it says 8k zvol block size or something like that somewhere… i wanted to change it but found out i couldn’t read online that i could increase my throughput, but ended up thinking this was working quite nicely… and 64k zvol blocksize seemed crazy so i just left it… to my knowledge i can change recordsize how i like… everything else is datasets inside the pool… tho i do have two pools … one for my OS just in case i want to boot while i try to repair my pool or such… saw that in a lecture called zfs for newbies… was really good…and the os is not on the HBA’s so i can troubleshoot those as well if disaster strikes.
friends don’t let friends use dedup lol

well im sure if one vdev is full it will fill the other one… and yes i know one isn’t suppose to go past 80% capacity but … we will see if i have upgraded when i get there…
fragmentation looks fine for now lol at 30% full

Krey · May 14, 2020, 10:13pm

zil or slog it does not matter. transactions (sync and async) are accumulated in the txg pool and, upon overflowing it or in time, are thrown sequentially on the disk, on free space

O_DIRECT and reflinks not implemented in zfs upon this time.

Yes, zil is in a slightly different place than where the data finally ends up. ZIL is transaction log for sync writes. It can be stay on entire pool in special zone or on other device called SLOG. From ZIL sync writes, combined in records writing to a pool in final place somewhere in free pool space.

Pentium100 · May 14, 2020, 10:13pm

small volblocksize + raidz + ashift12 = a lot of wasted space.
With raidz and ashift=12 you should use 64K volblocksize.

SGC · May 14, 2020, 10:17pm

so people keep telling me… i might have to listen eventually if this keeps up… but yeah… not going to change this any time soon… seems to fit fine thus far… i got 7tb data on drive and it takes 7tb data on drive… couldn’t be much better than that i think… i duno all this zfs voodoo yet… and i sure won’t change it before i have to or understand why i am changing it.
besides i got no way to change it, for now until i get more drives and more bays i’m locked with the setup i got… if i wanted it or not…

tho on second thought the last guy that told me to change said it was to improve throughput…

Krey · May 14, 2020, 10:21pm

you zPool from Zfs discussions
looks like stripe of two raidz vdevs ie (4+1) + (3+1)
whats show zpool status zPool ? One of device in top 4+1 looks degraded

SGC · May 14, 2020, 10:25pm

well technically zfs doesn’t stripe vdevs… so calling it that could lead to confusion, it was why i asked about how it load balanced earlier, because thats basically what it does instead of striping… if one vdev is busy the other one takes over… which i can see on my numbers… because one was empty when i added it and it has been taking the main part of the data…

which was also why i added it, because i had a drive that was acting up and thus i wanted to take some load off that vdev/pool at the time…

don’t mind the mess, it’s fine… i just pulled a few drives while everything was running… it wasn’t to happy about it… but it should all be there… takes a while for it to figure it out tho lol
WHY DO TESTING WHEN YOU CAN DO CRASH TESTING… xD xD

  pool: zPool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub in progress since Thu May 14 16:52:34 2020
        12.0T scanned at 464M/s, 8.92T issued at 346M/s, 13.5T total
        68.3M repaired, 66.21% done, 0 days 03:50:13 to go
config:

        NAME                                             STATE     READ WRITE CKSUM
        zPool                                            DEGRADED     0     0     0
          raidz1-0                                       DEGRADED     0     0     0
            wwn-0x5000cca2556e97a8                       ONLINE       0     0     0
            wwn-0x5000cca2556d51f4                       ONLINE       0     0     3  (repairing)
            ata-HGST_HUS726060ALA640_AR11021EH21JAB      ONLINE       0     0     0
            ata-HGST_HUS726060ALA640_AR11021EH2JDXB      DEGRADED     0     0 3.14K  too many errors  (repairing)
            wwn-0x5000cca232cedb71                       ONLINE       0     0     0
          raidz1-3                                       ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_531RH5DGS             ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_Z252JW8AS             ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_99QJHASCS             ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_99PGNAYCS             ONLINE       0     0     0
        logs
          ata-OCZ-AGILITY3_OCZ-B8LCS0WQ7Z7Q89B6-part5    ONLINE       0     0     0
        cache
          ata-Crucial_CT750MX300SSD1_161613125282-part1  ONLINE       0     0     0

errors: No known data errors

on another note is there something i should do for zfs before i do a shutdown… i’ve noticed that even tho the system is running fine, it will sometimes throw disk access errors on the server terminal during final shutdown… was i suppose to shutdown zfs first with a seperate command rather than just using shutdown.?

Krey · May 14, 2020, 10:34pm

ZFS stripe vdevs. You cant extend raidz vdev by adding New hdd to it, but you simply extend entire pool by adding other vdev to it. With single hdd vdev, mirror vdev or raidz vdev. All top level vdevs act as stripe. All New writings, ie records, smooth on top level vdevs, two raidz in you case.

Krey · May 14, 2020, 10:39pm

No. Just simple shutdown.

Odmin · May 15, 2020, 6:20am

I pay your attention, that BTRFS (Linux implementation), have a lot of issues, I strongly recommend keep away from this filesystem on Linux. Also, this FS was tested by or community for storj purposes and the result is awful.

SGC · May 15, 2020, 7:36am

i’m talking about the individual vdevs, like when you add two disk without any kind of mirror or raidz function… then zfs will “load balance” between them, which ever is fastest gets the data or goes first i guess,

you say stripe, but it isn’t and it is, it will never stripe across both vdevs it will stripe on either one, this improves performance and ensures that a loss of a vdev isn’t a loss of the full pool, even tho it would kinda be a random mess of what survives i guess…

learned that in one of the zfs lectures i saw by its developers… and from what i can see on my own two vdevs it does put more data on the empty one rather than the half full one.

is there a way i can make an array of raidz1 vdevs into a raidz1, ofc i would need to create it like a raidz1 of vdevs and each vdev also being raidz1 and then like with regular raidz1 then disks cannot be removed or added vdevs or individual disks.

been looking for commands to do something like that, but most people writing about it, really doesn’t touch much upon advanced feature like that… and i really wanted to create… nested raidz1’s just like hyperscale mirror setups

Pentium100 · May 15, 2020, 7:59am

You should consider that the loss of a vdev is essentially the loss of the pool, even though some information may be recovered. Let’s say I had zfs on a single drive, filled it up, added a second drive, write some new files and then the new drive failed.
All my old files are still on the old drive, but it would be difficult to access them.

SGC · May 15, 2020, 8:10am

yeah i know, but still each vdev is essentially a complete part… so essentially useless…
unless if you have a vdev die and the critical stuff is on the other vdev… xD
then i’m sure its great… duno how zfs will behave then tho… i should setup a vm for testing zfs setups… that could be a lot of fun.

read some very interesting stuff from this one guy that did extensive testing like that… i forget what he was testing, but it was interesting lol… but would be cool to be able to try different setups, to get a better sense with what my options are and how crazy i can go with advanced setups…

really would like to do raidz1 on raidz1 vdevs

Krey · May 15, 2020, 9:26am

It will stripe New writings across all vdevs

Krey · May 15, 2020, 9:30am

This is impossible. Even with a mirror device.

SGC · May 15, 2020, 9:43am

i can only regurgitate what zfs developers said in lectures, and i can see with a iostat -v that my one vdev gets most of the data because it’s empty…so that seems to verify what i seem to clearly remember the developers said.

i really cannot say, but i don’t trust random guy on the internet over a zfs developer when it comes to zfs.
i have no doubt you would do the same, i do think the striping across vdevs is a common myth in the zfs userbase, if i’m not mistaken.

but its basically all news to me… so what to i know… rookie with an attitude coming though… xD

Krey · May 15, 2020, 10:02am

writings placed on vdevs relative it free space it is true. But you say

you mean that all data goes only to one of vdev? it is ridiculous.

That is unnecessary. Use the reference guides.

man zpool

A pool can have any number of virtual devices at the top of the configuration (known as “root vdevs”). Data is dynamically distributed across all top-level devices to balance data among devices. As new virtual devices are added, ZFS automatically places data on the newly available devices.

Whats why you current pool configuration call “stripe of raidz”

SGC · May 15, 2020, 10:40am

i never said all the data, it doesn’t stripe across the vdevs, which is exactly what it says in the man you just quoted… dynamically distributed…
writing to only 1 vdev would hurt over all performance so that doesn’t make any sense, unless if one got filled before the other, which i assume the dynamic distribution would take into account.

clearly you want to call it a stripe, even tho it isn’t… it doesn’t put a stripe on two raidz or mirrors it stripes on either them… and then puts the next stripe on another vdev… depending on if its serial or parallel i guess.

those manual like that in linux is pretty handy… xD now i actually learned something useful