Zfs discussions

littleskunk · May 14, 2020, 12:02pm

Correct. That is why I have executed the speed test on europe as well for better comparison. You can see the difference in speed. My target is to optimize for customer data and not for test data even if that would be beneficial short term.

europe is still europe west for me. I know that we have a europe north as well.

Krey · May 14, 2020, 12:06pm

in my command tar simply pack many files in one stream without compression, next send it to pipe viewer where we can see current and average speed, next send data to null ie trash it. This emulate reading many files with actually random file order.

SGC · May 14, 2020, 12:12pm

i really hate the du commands but doing those now to compare… tho most of my storagenode was written with 256k records so should be interesting.
great i did a du for the entire pool… lol

Pentium100 · May 14, 2020, 12:27pm

How do I know which folder is for which satellite? The folder name is not satellite ID, but something else.

SGC · May 14, 2020, 12:29pm

# du -hd0 -B1 /zPool
11743747000832  /zPool
# du -bd0 -B1 /zPool
11797343601454  /zPool
# du -bd0 -B1 /zPool/storagenodes/storj
7552277561142   /zPool/storagenodes/storj
# du -hd0 -B1 /zPool/storagenodes/storj
7697093245952   /zPool/storagenodes/storj

i get this on mine…
the biggest one of the storagenode dataset turns out to be 7TB exactly which is what zfs reports… so basically no point in doing du when i instantly get the same result from zfs
did seem like that arc finally figure out what i was doing now… took forever on the first scan but then again it is 10tb and i’m drive short so running on degraded which makes my performance a bit low

Krey · May 14, 2020, 12:31pm

this is satelliteID in Base32

SGC · May 14, 2020, 12:37pm

like a pro lol

do i need to install something to do those tar, pv apt-gtar commands?

SGC · May 14, 2020, 12:51pm

thats a really interesting and good point… not sure i can apply that sadly… as my SLOG lives on an old SATA controller on my mobo, so its limited to like 300MB/s maybe 600MB/s at best and thus 5sec is a pretty good current spot for my setting… else i would most likely end up saturating my SATA disk bandwidth depending on how zfs is at load balancing and prioritizing in this case.
(did test that the old SATA isn’t introducing latency, so for now until i got more bays, this will have to do
and its really just the low latency i need, the throughput in this case is almost irrelevant)

but i will most def write the command down into my little zfs list of commands and what they do…
not always easy to find whatever is relevant…

any of you understand how to change the L2ARC feedrate… been wanting to look at that… or the metadata ratio… haven’t been able to find a straight answer on how this is done.
but if its to convoluted ill just wait awhile until i got better time for it… feedrate isn’t to unreasonable… but still takes 72hours for my server to warm up lol which can be a bit annoying
ofc persistent l2arc is around the corner soon hopefully… so maybe not worth the hassle of doing anything really aside from upgrading when ZoL becomes OpenZFS v2.0

SGC · May 14, 2020, 12:58pm

Okay rookie question…
how do i stop whatever i got half started with…

# tar -c /zPool/storagenodes/storj/storj/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/a2/* | pv -brap > /dev/null
-bash: pv: command not found
tar: Removing leading `/' from member names

Pentium100 · May 14, 2020, 1:22pm

Like you do most Linux commands - Ctrl+C

SGC · May 14, 2020, 1:29pm

but it wasn’t active in the console, it vanished into the background like with && or is it just &
ended up rebooting, had to try and fix an issue anyways… so good enough excuse lol
think i got my drive back online tho… so that’s a win… f’ing corroded backplane is killing me… lol
but pretty sure i got the bays with issues isolated now… but will be interesting to see how the resilvering goes because i zeroed the drive, and then the first scrub should also be kinda interesting… its been almost a week and i’ve been doing crazy stuff lol zfs gets so mad when i pull more than two drives in a running zpool… amazing how it just recovers without loosing a byte tho…

kevink · May 14, 2020, 1:37pm

Running that inside storage/blobs/6r2… gives me:

     17 4608
     28 1049088
     62 2105856
    170 2335232
    418 2630144
    452 1896960
  10316 1057280
  76033 2331136

Not sure I interprete this correctly but if I do, it means that 76k files have a size of ~2.33MB.
Would still mean that most data is 2.33MB big, not only the testdata.

Pentium100 · May 14, 2020, 1:37pm

Get some contact cleaner and clean it.

SGC · May 14, 2020, 1:57pm

i will when i got some downtime to spare or get another server setup… don’t have any more drives to put in it anyways… but yeah i intent to… just difficult when its the only thing that can server my storagenode presently… and i got way to much downtime already from all kinds of issues… so i’m just trying not to get suspended… if thats active yet… if nothing else its good practice for when it is…

thanks for the tip tho, had been pondering how best to do that…

when the fan wires corrode over and vibrate till they break… take a hint and control the humidity and temperature of the room xD

littleskunk · May 14, 2020, 2:24pm

Which satellite is it? If you go over your entire disk you will hit a lot of test data. You would need to select a specific satellite like us-central, asia-east or europe-west. They should have less or maybe even no testdata.

SGC · May 14, 2020, 2:33pm

we where looking at file sizes and ended up making an assumption, that the maximum piece size was 2.something MB, because most files seemed to cap out at that size… just seemed like that made sense at the time… ofc you got some insider knowledge… or more educated knowledge than our hipshot…

is there a fixed max size of pieces or can they basically grow unlimited… i sort of assumed so since this is basically live data, that like a filesystem and harddrive sectors they would have certain max sizes so that customers could stream parts of their files depending on what they demanded, instead of being forced to download an entire file… like say if they where to (i duno whats possible and not) to edit video straight of the data on the tardigrade platform.

you get the idea… so you can seek to the middle of a big file by getting a piece that represents the middle of the files … object or whatever its called.

BTW have you set autotrim on? its off per default in ZoL, gives you like 30% better performance on ssd’s if you don’t like the trimming of deleted data ofc i’m sure it doesn’t get easier to recover after trimming… xD
zpool set autotrim=on or off poolname

if you are interested in the details about it.

and in case anyone want to go full retard on zfs configuration, i’ve used this a lot by now…
seems this is how to do that ARC stuff i’ve been looking for… but haven’t dared to do that deep dive yet
https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-advanced.html

kevink · May 14, 2020, 2:44pm

6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa
not sure which one that is. Got about 170GB in that directory so it isn’t stefan-benten or saltlake.

littleskunk · May 14, 2020, 4:55pm

That should be europe north with a lot of testdata

SGC · May 14, 2020, 5:33pm

this is kinda interesting…

had disk trouble, which might have been backplane trouble… no surprise there, pulled the drive did some tests on another comp deleted the partition and re partitioned it, just because i wanted to see how my resilvering time was while i was at it.
all tests seemed kinda fine, so i put the drive back and zfs resilvered it in… maybe under an hour…
maybe less… which seemed kinda odd… but okay kinda knew that would be BS
tho nice to know what one cannot always count on a resilvered drive to have verifiable good data.
ran a scrub… and now it’s thrown me 3.14k cksum errors thus far and it’s only 28% done…
xD seem more right…

but watch out for resilvering, always scrub afterwards. else your data might be wrong or gone

kevink · May 14, 2020, 6:28pm

  10322 16896
  16309 12800
  30719 8704
  58167 2331136
 125921 4608

on qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa
and on ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa:

   8110 258560
   8962 16896
  13696 12800
  20929 221696
  26330 8704
  49530 2331136
 109895 4608

So this does clearly show that the majority of files are about 4k-16K… But also a lot of files with 2.33MB.

The question is: Does it make more sense to optimize the recordsize for big files or for small files? Also would the difference be visible in successrates for uploads and especially for downloads?
What other consequences does optimizing have?

Right now on most nodes (oddly enough) the download rates are >99% so there won’t be any visible change. In theory however reading and sending a file should be faster if less IOPS are needed. For 4K files that makes no difference as the recordsize is already bigger but with 2.33MB files there is a difference of 3 IOPS with 1MB recordsize and 37 IOPS with 64K recordsize. But I have no idea how much difference this would actually make in reading latency and will also depend on the general load on the drive (which is probably mostly defined by the db load?).
Similarily if the file is 4K but the recordsize 1MB then reading takes a lot longer than needed, maybe 4ms with 4k recordsize (assuming 1MB/s reading rate) and 16ms with 1MB recordsize (assuming 60MB/s reading rate)?
I am however not sure how zfs compression will play into this if you have it activated (which you probably should, at least with higher recordsizes). Would a 4k file be compressed to a 4K record and when reading would then only be 4K read and not a whole record of 1MB?

Optimizing for uploads is hardly possible because afaik the writes are async and therefore independent of the recordsize, unless the drive can’t handle the load at some point and needs to reject pieces/becomes too slow. Only then recordsize might help to reduce the IO load on the drive. To reduce the IOPS you’ll need a bigger recordsize but if the majority of files is 4K then you will have a bigger recordsize anyway and it will probably make no difference for the IOPS if you have 64K or 1MB as recordsize.

So my conclusion is that the recordsize might reduce the IOPS needed, which might help some SMR drives (idk?) but (without having proof) I guess it doesn’t make much of a difference and the DB load might be a bigger problem. I will definitely test that with the new release when I can put all the DBs onto my SSD.