How much "slower" is ZFS?

All nodes to a single zfs array? How many?

I want to create a 8 drive RAIDz2 with 6 nodes so it’ll be still 1 node per hdd. My case can offer space for around 12 hot swappable hdds, that’s for the very far future to upgrade.

So just one vdev? I would go for two raidz1 vdevs instead, 4 disks in each. This will provide more balanced performance.

I would not plan on adding disks to vdev s, even though it’s supported now. You can always add new vdevs.

Why are you so preoccupied with failures during resolver? With raidz1 resilveing array single disk fault tolerance is maintained throughout the process. If disk dies during resilver no data is lost. There is no reason to have two disk fault tolerances for vast majority of users. And besides, you need to have periodic scrub, (which is much equivalent to resilver) and you won’t have disk failure every other scrub.

This is false. Very. That not at all how this works.

This is just how zfs works in general (see transaction group). Nothing to do with slog.

Slog will make zero difference for storj. It only affects synchronous writes. Are your main workloads predominantly synchronous? Having fast special device is what will be crucial for storj, and all other workloads.

It seems you have a lot of misconceptions about zfs. I’d suggest clearing that out first to avoid spending too much time with ill-fitting configurations.

Thanks, I never thought of that.

Why not? I would like to keep my redundancy, when adding a 3rd RAIDz1, I would have 3 parity drives instead of 2, reducing the possible capacity.

As I read, it tells, it’s for synchronous writes smaller than 1MiB

  • takes a long time; adding new vdev is instant.
  • has performance impact during additions,
  • performs worse than multiple vdevs, (including during rebuilds)
  • adding disks to vdev is a rather new feature, and not used by enterprises, therefore has minimal real world test coverage; I would not trust it for a few years, or ever. There is just no need.

But this will keep the ratio of parity to data. If you add another disk to existing vdev you are making the system less reliable. Not that it matters, see above, I’m suggesting two disk redundancy, especially on such small arrays is overkill/waste.

Not the default threshold bit, which is indeed 1MB. The previous sentence i quoted when replying in the comment above.

Slog is not a cache. It is write-only and is never read under normal circumstances. It only speeds up acknowledging of synchronous writes.

Storj produces none (except databases, which you should also make async and/or push to ssd) so it won’t make any difference for Storj. A small caveat is that metadata updates are also done synchronously, and the default 5 second cadence of transaction group writes will contribute to those synchronous writes. The performance improvement due to this would be minimal.

1 Like

Not a good idea in my opinion. However I am interested to see how it works.

1 Like

This sounds like a really bad idea.

running 6 nodes on an array is the same as running six nodes on a single disk.

  1. It’s against storj’s prescriptions (one node per disk).
  2. if you have 6 nodes kick off housekeeping tasks (filewalkers, garbage cleanup) on 6 nodes simultansouly it’s going to hammer the array
  3. there is still the max theoretical size that a storj bloom filter can handle for garbage cleanup, which is 24TB I think. So if single node goes over that you’re going to end up with unpaid uncleaned up garbage.
  4. as mentioned before, a storj node doesn’t need redundancy. storj itself provides the redundancy.
  5. running multiple nodes under one ip address slows your ingress so they don’t really fill much faster.

Now, if you have the array set up the way you like for your personal storage needs, then I could imagine setting up a single storj node on it and letting it grow. Then if it gets large enough to mostly fill a disk, then migrate it to a new single disk. (bear in mind migration currently takes weeks on large disks), then spin up a new node on the array…

1 Like

It’s nice, that I’m learning every day something new regarding ZFS. I want to do the Array, because I’m also curious and becoming a fan of ZFS and it’s features. I have 2x4TB Sata SSDs laying around, which I could use as a Special device in Mirror. I think with 4TB and 2% ratio, I’m safe for around 180TB data (not sure I’ll reach that in my lifetime). Do you recommend doing all vdev kinds, like special device, slog and l2arc? My system has around 64GB of ram, and I would like do use some of it for myself, not only ZFS, so I think I need to add some caching. How large do you recommend slog and l2arc size? I also want to host own data on the array, like plex and nextcloud files. I would like just to use “one device” for my files and don’t want to setup several shares on several drives, like SSDs for files who need faster access. I would like to keep it “simple” so that’s why I also want to use ZFS, just “one” share with all the benefits of SSDs and HDDs combined, not splitting it up on several single drives with different characteristics for each use-case.

What happens if one SSD fails? Do you need to run zpool replace for the metadata device? I would expect No because you only have one (logical) metadata device per zfs pool, right?

Why didn’t you use zpool mirror feature for metadata devices?

1 Like

He probably is using a 3-way mirror for his metadata device: but he’s mirroring 3 partitions on SSDs, not 3 entire SSDs. That’s very common, because metadata-only (no small files) doesn’t need much space.

So yeah if one of those partitions goes bad… you replace it like any other mirror in ZFS: nothing special.

1 Like

3-way mirroring is also possible with “normal” zfs features (zpool add X special mirror part1 part2 part3).

I thought his configuration looks more like this:

  • each node has it’s own zpool and a special lv* device
  • lv0-n (for zfs special device for every zpool) ← vg0 ← pv0-2 (3 ssds mirrored)

so for example 2 nodes:

zpool node0

  • special device lv0
  • data hdd rust-0

zpool node1

  • special device lv1
  • data hdd rust-1

So the question is:

  • @JWvdV why did you use LVM in combination with zfs?

Please make sure these SSDs support PLP. if not – buy used enterprise SSDs with PLP support from eBay.

This highly depends on the content stored, but 2% is definitely a massive overkill, and will indeed last forever.

For reference, on my pool there is 70TB of data allocated, and 580GB allocated on a special device. This is less than 1%.

details
% zpool list -v
NAME                                             SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
boot-pool                                        103G  6.74G  96.3G        -         -     5%     6%  1.00x    ONLINE  -
  gptid/0dbcf727-921a-11ec-8600-5404a613ddea     103G  6.74G  96.3G        -         -     5%  6.54%      -    ONLINE
pool1                                            205T  70.4T   135T        -         -    32%    34%  1.00x    ONLINE  /mnt
  raidz1-0                                      65.4T  24.4T  41.0T        -         -    36%  37.4%      -    ONLINE
    gptid/c37ca4dd-8cec-11ee-a737-0025907f338a  16.4T      -      -        -         -      -      -      -    ONLINE
    gptid/0dd0e79b-8f5b-11ee-880d-0025907f338a  16.4T      -      -        -         -      -      -      -    ONLINE
    gptid/fe07bab4-8cec-11ee-a737-0025907f338a  16.4T      -      -        -         -      -      -      -    ONLINE
    gptid/b069f182-9060-11ee-880d-0025907f338a  16.4T      -      -        -         -      -      -      -    ONLINE
  raidz1-2                                      72.7T  25.2T  47.5T        -         -    34%  34.6%      -    ONLINE
    gptid/ea5dbbec-2df9-11ef-affd-ac1f6bbbe6a4  18.2T      -      -        -         -      -      -      -    ONLINE
    gptid/6cf2c522-2dfa-11ef-affd-ac1f6bbbe6a4  18.2T      -      -        -         -      -      -      -    ONLINE
    gptid/f8dddfc6-2dfb-11ef-affd-ac1f6bbbe6a4  18.2T      -      -        -         -      -      -      -    ONLINE
    gptid/cf57e9ee-2dfc-11ef-affd-ac1f6bbbe6a4  18.2T      -      -        -         -      -      -      -    ONLINE
  raidz1-4                                      65.4T  20.2T  45.2T        -         -    28%  30.8%      -    ONLINE
    gptid/f4b15820-55b0-11ef-9feb-ac1f6bbbe6a4  16.4T      -      -        -         -      -      -      -    ONLINE
    gptid/ce772271-c72b-11ef-af81-c46237009f95  16.4T      -      -        -         -      -      -      -    ONLINE
    gptid/f512d2b5-55b0-11ef-9feb-ac1f6bbbe6a4  16.4T      -      -        -         -      -      -      -    ONLINE
    gptid/f52f0b8a-55b0-11ef-9feb-ac1f6bbbe6a4  16.4T      -      -        -         -      -      -      -    ONLINE
special                                             -      -      -        -         -      -      -      -         -
  mirror-3                                      1.82T   583G  1.25T        -         -    89%  31.3%      -    ONLINE
    gptid/c8bd1b5f-32ca-11ef-b5fc-ac1f6bbbe6a4  1.82T      -      -        -         -      -      -      -    ONLINE
    gptid/d7a0f4ed-32ca-11ef-b5fc-ac1f6bbbe6a4  1.82T      -      -        -         -      -      -      -    ONLINE
logs                                                -      -      -        -         -      -      -      -         -
  gptid/3843364f-b073-11ec-93ce-5404a613ddea    13.4G  75.5M  12.9G        -         -     0%  0.56%      -    ONLINE
It's probably even smaller, depending on how you look at it

Zdb reports total metadata size 70GB – which yields 0.01%

full log
Traversing all blocks ...


	bp count:              98408591
	ganged count:             61078
	bp logical:      17584024401920      avg: 178683
	bp physical:     16431646518272      avg: 166973     compression:   1.07
	bp allocated:    22409110564864      avg: 227714     compression:   0.78
	bp deduped:                   0    ref>1:      0   deduplication:   1.00
	bp cloned:                    0    count:      0
	Normal class:    75842432557056     used: 33.90%
	Special class      622125375488     used: 31.12%
	Embedded log class              0     used:  0.00%

	additional, non-pointer bps of type 0:     272398
	 number of (compressed) bytes:  number of bps
			 17:     34 *
			 18:    252 *
			 19:     55 *
			 20:    223 *
			 21:    140 *
			 22:  26310 ****************************************
			 23:    725 **
			 24:    166 *
			 25:   3335 ******
			 26:    696 **
			 27:   2041 ****
			 28:   4461 *******
			 29:  13965 **********************
			 30:   1215 **
			 31:   1002 **
			 32:   2049 ****
			 33:   5524 *********
			 34:   1391 ***
			 35:   1088 **
			 36:   1333 ***
			 37:   1112 **
			 38:    645 *
			 39:    897 **
			 40:   1075 **
			 41:   1111 **
			 42:   1012 **
			 43:  15272 ************************
			 44:   1731 ***
			 45:    768 **
			 46:    676 **
			 47:    910 **
			 48:    821 **
			 49:   1306 **
			 50:   1418 ***
			 51:   1859 ***
			 52:   1416 ***
			 53:   2332 ****
			 54:   4086 *******
			 55:   4027 *******
			 56:   6513 **********
			 57:  14490 ***********************
			 58:   6180 **********
			 59:   6429 **********
			 60:   4412 *******
			 61:   4979 ********
			 62:   3437 ******
			 63:   3277 *****
			 64:   2440 ****
			 65:   1742 ***
			 66:   2306 ****
			 67:   2192 ****
			 68:   3336 ******
			 69:   2721 *****
			 70:   2154 ****
			 71:   2029 ****
			 72:   1894 ***
			 73:   2091 ****
			 74:   2178 ****
			 75:   2199 ****
			 76:   2419 ****
			 77:   3519 ******
			 78:   2168 ****
			 79:   2059 ****
			 80:   2264 ****
			 81:   1891 ***
			 82:   2021 ****
			 83:   2276 ****
			 84:   2226 ****
			 85:   2191 ****
			 86:   2146 ****
			 87:   9919 ****************
			 88:   1803 ***
			 89:   1794 ***
			 90:   1590 ***
			 91:   1941 ***
			 92:   1678 ***
			 93:   1651 ***
			 94:   1818 ***
			 95:   1958 ***
			 96:   1745 ***
			 97:   1724 ***
			 98:   1720 ***
			 99:   4044 *******
			100:   3315 ******
			101:   1617 ***
			102:   1631 ***
			103:   1329 ***
			104:   1303 **
			105:   1381 ***
			106:   1528 ***
			107:   1532 ***
			108:   1486 ***
			109:   7598 ************
			110:   1494 ***
			111:   1485 ***
			112:   2656 *****
	Dittoed blocks on same vdev: 5818943

Blocks	LSIZE	PSIZE	ASIZE	  avg	 comp	%Total	Type
     -	    -	    -	    -	    -	    -	     -	unallocated
     2	32768	 8192	24576	12288	 4.00	  0.00	object directory
     4	131072	23040	135168	33792	 5.69	  0.00	    L1 object array
   221	113152	113152	4300800	19460	 1.00	  0.00	    L0 object array
   225	244224	136192	4435968	19715	 1.79	  0.00	object array
     2	32768	 8192	24576	12288	 4.00	  0.00	packed nvlist
     -	    -	    -	    -	    -	    -	     -	packed nvlist size
     1	32768	 4096	12288	12288	 8.00	  0.00	    L2 bpobj
   220	7208960	1007616	3035136	13796	 7.15	  0.00	    L1 bpobj
  8673	1136787456	70682624	212410368	24490	16.08	  0.00	    L0 bpobj
  8894	1144029184	71694336	215457792	24225	15.96	  0.00	bpobj
     -	    -	    -	    -	    -	    -	     -	bpobj header
     -	    -	    -	    -	    -	    -	     -	SPA space map header
 14054	230260736	56133632	172707840	12288	 4.10	  0.00	    L1 SPA space map
 81629	10699276288	7266537472	22575157248	276558	 1.47	  0.10	    L0 SPA space map
	 number of ganged blocks: 6
 95683	10929537024	7322671104	22747865088	237741	 1.49	  0.10	SPA space map
	 number of ganged blocks: 6
    18	745472	745472	745472	41415	 1.00	  0.00	ZIL intent log
   350	45875200	1433600	2908160	 8309	32.00	  0.00	    L5 DMU dnode
   349	45744128	1429504	2899968	 8309	32.00	  0.00	    L4 DMU dnode
   349	45744128	1429504	2899968	 8309	32.00	  0.00	    L3 DMU dnode
   352	46137344	1515520	3076096	 8738	30.44	  0.00	    L2 DMU dnode
  2784	364904448	112989184	226553856	81377	 3.23	  0.00	    L1 DMU dnode
1661428	27220836352	6732938752	13781274624	 8294	 4.04	  0.06	    L0 DMU dnode
1665612	27769241600	6851736064	14019612672	 8417	 4.05	  0.06	DMU dnode
   357	1462272	1462272	2969600	 8318	 1.00	  0.00	DMU objset
     -	    -	    -	    -	    -	    -	     -	DSL directory
    59	31744	 3072	49152	  833	10.33	  0.00	DSL directory child map
    57	87040	57856	245760	 4311	 1.50	  0.00	DSL dataset snap map
    76	578560	140800	479232	 6305	 4.11	  0.00	DSL props
     -	    -	    -	    -	    -	    -	     -	DSL dataset
     -	    -	    -	    -	    -	    -	     -	ZFS znode
     -	    -	    -	    -	    -	    -	     -	ZFS V0 ACL
    77	2523136	315392	638976	 8298	 8.00	  0.00	    L3 ZFS plain file
 22729	744783872	93669888	191963136	 8445	 7.95	  0.00	    L2 ZFS plain file
3562294	116729249792	16231137280	33005772800	 9265	 7.19	  0.15	    L1 ZFS plain file
92459063	17419918568960	16399359938048	22334742589440	241563	 1.06	 99.67	    L0 ZFS plain file
	 number of ganged blocks: 61072
96044163	17537395125760	16415685060608	22367940964352	232892	 1.07	 99.82	ZFS plain file
	 number of ganged blocks: 61072
  1028	33685504	3791360	8421376	 8192	 8.88	  0.00	    L2 ZFS directory
 30786	1008795648	135128576	285409280	 9270	 7.47	  0.00	    L1 ZFS directory
507462	5671192576	1539297792	3438469120	 6775	 3.68	  0.02	    L0 ZFS directory
539276	6713673728	1678217728	3732299776	 6920	 4.00	  0.02	ZFS directory
    44	45056	45056	507904	11543	 1.00	  0.00	ZFS master node
    19	622592	77824	155648	 8192	 8.00	  0.00	    L1 ZFS delete queue
   577	5487104	1310720	2621440	 4543	 4.19	  0.00	    L0 ZFS delete queue
   596	6109696	1388544	2777088	 4659	 4.40	  0.00	ZFS delete queue
     -	    -	    -	    -	    -	    -	     -	zvol object
     -	    -	    -	    -	    -	    -	     -	zvol prop
     -	    -	    -	    -	    -	    -	     -	other uint8[]
     -	    -	    -	    -	    -	    -	     -	other uint64[]
     -	    -	    -	    -	    -	    -	     -	other ZAP
     -	    -	    -	    -	    -	    -	     -	persistent error log
     1	32768	16384	49152	49152	 2.00	  0.00	    L1 SPA history
   214	28049408	2926592	10211328	47716	 9.58	  0.00	    L0 SPA history
   215	28082176	2942976	10260480	47723	 9.54	  0.00	SPA history
     -	    -	    -	    -	    -	    -	     -	SPA history offsets
     -	    -	    -	    -	    -	    -	     -	Pool properties
     -	    -	    -	    -	    -	    -	     -	DSL permissions
     -	    -	    -	    -	    -	    -	     -	ZFS ACL
     -	    -	    -	    -	    -	    -	     -	ZFS SYSACL
     -	    -	    -	    -	    -	    -	     -	FUID table
     -	    -	    -	    -	    -	    -	     -	FUID table size
    36	22528	 4096	12288	  341	 5.50	  0.00	DSL dataset next clones
     -	    -	    -	    -	    -	    -	     -	scan work queue
   870	759296	422400	1769472	 2033	 1.80	  0.00	ZFS user/group/project used
     -	    -	    -	    -	    -	    -	     -	ZFS user/group/project quota
     -	    -	    -	    -	    -	    -	     -	snapshot refcount tags
     -	    -	    -	    -	    -	    -	     -	DDT ZAP algorithm
     -	    -	    -	    -	    -	    -	     -	DDT statistics
 51144	27921920	27916800	418897920	 8190	 1.00	  0.00	System attributes
     -	    -	    -	    -	    -	    -	     -	SA master node
    44	67584	67584	516096	11729	 1.00	  0.00	SA attr registration
    88	1441792	360448	1015808	11543	 4.00	  0.00	SA attr layouts
     -	    -	    -	    -	    -	    -	     -	scan translations
     -	    -	    -	    -	    -	    -	     -	deduplicated block
   446	818176	722944	4534272	10166	 1.13	  0.00	DSL deadlist map
     -	    -	    -	    -	    -	    -	     -	DSL deadlist map hdr
    24	16896	 4096	12288	  512	 4.12	  0.00	DSL dir clones
     4	524288	13312	61440	15360	39.38	  0.00	bpobj subobj
     -	    -	    -	    -	    -	    -	     -	deferred free
     -	    -	    -	    -	    -	    -	     -	dedup ditto
    40	1310720	163840	491520	12288	 8.00	  0.00	    L1 other
   558	2428416	524288	4534272	 8125	 4.63	  0.00	    L0 other
   598	3739136	688128	5025792	 8404	 5.43	  0.00	other
   350	45875200	1433600	2908160	 8309	32.00	  0.00	    L5 Total
   349	45744128	1429504	2899968	 8309	32.00	  0.00	    L4 Total
   426	48267264	1744896	3538944	 8307	27.66	  0.00	    L3 Total
 24110	824639488	98980864	203472896	 8439	 8.33	  0.00	    L2 Total
3610202	118342516736	16536677376	33694310400	 9333	 7.16	  0.15	    L1 Total
94773154	17464717359104	16415006252032	22375203434496	236092	 1.06	 99.85	    L0 Total
	 number of ganged blocks: 61078
98408591	17584024401920	16431646518272	22409110564864	227714	 1.07	100.00	Total
	 number of ganged blocks: 61078
5948872	164102062592	32285892096	74362949632	12500	 5.08	  0.33	Metadata Total

Block Size Histogram

block	psize			lsize			  asize
size	Count	Size	Cum.	Count	Size	Cum.	Count	Size	Cum.
512	202282	103568384	103568384	150262	76934144	76934144	0	0	0
1024	2326134	2957831168	3061399552	2287732	2930058240	3006992384	0	0	0
2048	1557704	4288518144	7349917696	1532560	4239606784	7246599168	0	0	0
4096	7381636	33058527744	40408445440	1813787	10383113216	17629712384	93657	383619072	383619072
8192	3155435	36815309824	77223755264	2581627	30790138368	48419850752	9904189	81283747840	81667366912
16384	2886615	65362488832	142586244096	4278774	84719106048	133138956800	5020826	96235188224	177902555136
32768	3938321	174830935040	317417179136	6209236	227794521088	360933477888	4548540	200904597504	378807152640
65536	4007373	374690483200	692107662336	1103730	96534283264	457467761152	3560800	323255410688	702062563328
131072	63907816	8512431848448	9204539510784	68389320	9044851895808	9502319656960	65474659	11803360190464	12505422753792
262144	1088600	368261090304	9572800601088	934150	309772929024	9812092585984	1694704	637099204608	13142521958400
524288	2371165	1287644188672	10860444789760	2939128	1547666604032	11359759190016	2452196	1739845967872	14882367926272
1048576	5313112	5571201728512	16431646518272	5915887	6203257126912	17563016316928	5386622	7526742638592	22409110564864
2097152	0	0	16431646518272	0	0	17563016316928	0	0	22409110564864
4194304	0	0	16431646518272	0	0	17563016316928	0	0	22409110564864
8388608	0	0	16431646518272	0	0	17563016316928	0	0	22409110564864
16777216	0	0	16431646518272	0	0	17563016316928	0	0	22409110564864

                            capacity   operations   bandwidth  ---- errors ----
description                used avail  read write  read write  read write cksum
pool1                     69.5T  136T 5.68K     0 25.0M     0     0     0     0
  raidz1                  24.2T 41.2T    29     0  122K     0     0     0     0
    /dev/gptid/c37ca4dd-8cec-11ee-a737-0025907f338a                   7     0 30.5K     0     0     0     0
    /dev/gptid/0dd0e79b-8f5b-11ee-880d-0025907f338a                   7     0 31.0K     0     0     0     0
    /dev/gptid/fe07bab4-8cec-11ee-a737-0025907f338a                   7     0 30.7K     0     0     0     0
    /dev/gptid/b069f182-9060-11ee-880d-0025907f338a                   7     0 29.9K     0     0     0     0
  /dev/gptid/3843364f-b073-11ec-93ce-5404a613ddea                   (log)  808K 13.0G     0     0   528     0     0     0     0
  raidz1                  24.9T 47.8T     3     0 15.2K     0     0     0     0
    /dev/gptid/ea5dbbec-2df9-11ef-affd-ac1f6bbbe6a4                   0     0 3.81K     0     0     0     0
    /dev/gptid/6cf2c522-2dfa-11ef-affd-ac1f6bbbe6a4                   0     0 3.67K     0     0     0     0
    /dev/gptid/f8dddfc6-2dfb-11ef-affd-ac1f6bbbe6a4                   0     0 3.80K     0     0     0     0
    /dev/gptid/cf57e9ee-2dfc-11ef-affd-ac1f6bbbe6a4                   0     0 3.89K     0     0     0     0
  mirror (special)         579G 1.25T 5.65K     0 24.9M     0     0     0     0
    /dev/gptid/c8bd1b5f-32ca-11ef-b5fc-ac1f6bbbe6a4               2.82K     0 12.4M     0     0     0    60
    /dev/gptid/d7a0f4ed-32ca-11ef-b5fc-ac1f6bbbe6a4               2.83K     0 12.5M     0     0     0    60
  raidz1                  19.9T 45.5T     0     0 2.26K     0     0     0     0
    /dev/gptid/f4b15820-55b0-11ef-9feb-ac1f6bbbe6a4                   0     0   582     0     0     0     0
    /dev/gptid/ce772271-c72b-11ef-af81-c46237009f95                   0     0   573     0     0     0     0
    /dev/gptid/f512d2b5-55b0-11ef-9feb-ac1f6bbbe6a4                   0     0   591     0     0     0     0
    /dev/gptid/f52f0b8a-55b0-11ef-9feb-ac1f6bbbe6a4                   0     0   564     0     0     0     0

Depends on your workload. I definitely recommend special device. This makes night and day in performance. Furetheremore, since you have massive amount of space on special device, you can configure datasets to sends files smaller than specified size to special device as well. You can build a histogram of files you have now, and see what threshold size will end up with the best utilization of your SSDs.

Slog will help offload IO from main army for the intent log. 100-200MB is plenty for most users. Buy smallest optane you can find - which is 16GB for $10 last I checked.

you probably don’t need L2ARC. But if you have one laying around – might as well stick it there. But not necessary.

It’s the other way around. ZFS will use whatever is leftover for ARC. So, make sure there is some leftover. 32GB shall be fine.

Slog is not a cache. l2ARC is a cache, but it’s filling very slowly on purpose. See above for recommendation.

Create one pool. On that one pool you can have multiple datasets. Each dataset can have its own parameters – like sync on/off, atime on/off, record size, compression (don’t disable compression), and small block size, to fit the usecase. To force files to SSD all you need to do is set small block size greater or equal to record size.

For example, you would disable sync and atime updates for dataset that holds storj data, but keep it on for, say, dataset that keep Time Machine bundles, or users homes.

To start with, keep everything at defaults. Defautls on ZFS are the best settings for vast majority of scenarios. Only change settings after you found bottleneck and fully understand the effects (both beneficial and detrimental) of the change. Most new users rush to mess with compression and record size - try to refrain from doing so, until you stumble on a reason to, and then measure effect before and after the change.

ZFS is very robust and low maintenance, and works well out of the box.

2 Likes

Because I created logical devices for vdevs on three SSDs with LVM, in a ratio of 5GB/TB for every hard drive. So I use three SSDs for about 12 hard drives with variable sizes. And in that way, every hard drive has three mirrored vdevs.

1 Like

I would add that it doesn’t rebalance the data, so it will not improve the speed.

It should be pointed out this can actually be bad on a low RAM system because L2ARC also consumes RAM to track the objects in it.

1 Like

It does consume some ram, but that small amount of ram facilitates caching of larger amount of data; in predominately repetitive read scenarios it’s still a net win.

I got my M10 SSDs from Aliexpress (2,50€ each) now, I created a test pool and wanted to add the ssd as SLOG, unfortunately, the SSD has 512e sectors, my pool 4Kn, so SLOG isn’t possible with this SSD. I tried the Intel MAS tool and the nvme tool on linux, but no chance converting this SSD to 4Kn. What can I do? How did you managed to get it work?

Oh my…

First of all, why are you buying SSDs on AliExpress?

Second, you don’t need slog. Disable sync on storj dataset instead.

Third – it’s not up to ssd what sector size you use. Use ashift parameter when adding it to the array to specify any sector size you want.

Also… wasn’t the node switched to request async writes by default? I feel like that was part of an upgrade within the last year… but my search skills struggle to find it.

(When @gingerbread233 mentioned M10’s: I’m guessing he found a sale blowing out those baby Optane drives?)

2 Likes

Not all optane drives from AliExpress would be optane drives except the label.. And not always has the promoted physical capacity.

1 Like