Machine freezing for short periods

Pac · June 7, 2020, 8:30am

Cheers @nerdatwork, interesting article indeed.

However, considering what’s in it, I wouldn’t expect my disk to stall so miserably to the point of not being able to sustain Storj ingress, which by experience never goes above a few MB per sec.
Besides afaik, because Storj doesn’t store 32kB pieces (they are way bigger than that) it shouldn’t imply too many iops for disks.

My SMR disk is a 2TB 2"5 (Toshiba MQ04UBD200) though, so it may well be very different from the tested WD in the article.

willyc · June 7, 2020, 2:52pm

I agree with this. While SMR drives are slow, they’re not as slow as the upstream speed of my Internet connection. As long as the iops are optimised, an SMR drive should be able to handle Storj. There were certainly some major caching improvements in 1.3.3, but then my SMR drive failed and I’m still waiting for my replacement to arrive to continue testing.

Pac · June 7, 2020, 3:22pm

My drive still can’t keep up with the latest version (1.5.2).
I had to reconfigure it again this morning so it does not accept anymore data, because of sustained incoming ingress causing it to stall after a few tens of minutes or so.

Maybe I should check if my disk is alright…

SGC · June 7, 2020, 4:15pm

you cannot make a SMR keep up with random read writes… have you tried moving the database to the ssd your OS is running from.
that should significantly improve the workload… else you need a big ssd cache to turn the random writes into sequential writes…

or you can do some kinda of tier storage solution like using microsoft storagespaces.

Pac · June 7, 2020, 4:59pm

The thing is running on an Rpi, so moving databases to the sd card doesn’t seem to be a great idea as it would probably kill the sd card fairly quickly. Or I should put an additionnal storage device dedicated to that.

Worth a try I guess ^^

SGC · June 7, 2020, 5:07pm

i think some sd cards are better suited for that than others… but not 100%
i believe its after they got hugely popular as Rpi OS drives then the manufactures started creating special higher iops cards for that use case…

ofc a regular old ssd might be better suited and cheaper…

BrightSilence · June 8, 2020, 9:24am

Hopefully 1.6.3 will show an improvement on SMR HDD’s. That release will move used_serials to RAM instead of a db file. Since used serials and orders are as far as I know the only “per transfer” DB writes, this should effectively cut DB writes in half. Could be enough to fix the issue.

SGC · June 8, 2020, 9:32am

hmmm didn’t think about that… would be a nice easy fix…
but i don’t have much faith in SMR drives, they are just so terribly slow for random writes, nothing really fixes that… i mean 700kb/s
i can barely keep from laughing every time i think of that number…
it’s absurd…

Mad_Max · June 9, 2020, 10:54pm

Have you tried to tune a new write cache/aggregation option introduced in Storj v.1.4. ?

There was already write caching before, and after v.1.4 it is available for user tuning via config files or command line option. It was disused about month ago in the corresponding GitHub issue: https://github.com/storj/storj/issues/3854#issuecomment-624522307

But it looks like that nobody of the SNOs with SMR HDDs has yet been tested it. You can be the first one.

Default value for write buffer is 128KiB (per each data piece/upload). It is interesting to see if increasing it to say 1-2 MiB will help mitigate SMR performance issues.
filestore.write-buffer-size: 1MiB

Pac · June 10, 2020, 7:18am

Hey @Mad_Max, good idea, it’s worth a try. How do I use this as a parameter passed to docker?

I tried this:

docker run -d --restart unless-stopped \
    -p ...:28967 \
    -p ...:14002 \
    -e WALLET="..." \
    -e EMAIL="..." \
    -e ADDRESS="..." \
    -e STORAGE="1600GB" \
    --filestore.write-buffer-size "2MB" \
    --mount type=bind,source="...",destination=/app/identity \
    --mount type=bind,source="...",destination=/app/config \
    --name storj_node_1 \
    --log-driver none \
    storjlabs/storagenode:beta

But docker doesn’t recognize this option which makes sense I guess.
Maybe -e filestore.write-buffer-size="2MB" \ ?

Alexey · June 10, 2020, 7:25am

Not like this. You can pass any option to the storagenode inside the docker, if you put this option as a last option in your docker run command after the storagenode image:

...
storjlabs/storagenode:beta \
--filestore.write-buffer-size="2MB"

Or you can add this option to the config.yaml in your storage location, save the config and restart or re-create a storagenode container.

Pac · June 10, 2020, 7:59am

Ah right, thx @Alexey

Testing it right now, will let you guys know what happens.

I was wondering: Why not setting this to a way higher value? Could it cause any harm? For instance --filestore.write-buffer-size="128MB"?

Alexey · June 10, 2020, 8:11am

Just try and return back with results

Mad_Max · June 10, 2020, 8:19am

It can consume significant amounts of RAM. Because this buffer is set per each datapiece /upload so RAM usage will be multiplied by number of all unfinished uploads. It is NOT a total size of a write cache.
And can be a vector of DoS attacks on the nodes (initiate a LOT of slow concurrent uploads to node and it will crash running out of RAM while tries to allocate RAM for all incoming uploads) .

And it does not make any sense in setting it above 3 MiB (or 4 MiB - if it should be in powers of 2 - i am not sure if arbitrary values are allowed). As current max size of a data pieces is about 2.2 MiB. So 3 MiB should be enough already to buffer ANY data piece on the current network and write whole pieces to disk at once (in single disk write request).

Pac · June 10, 2020, 8:25am

Thx for these insights @Mad_Max.

Then, looks like “4MiB” would be a perfect candidate. I’ll stick with 2MB for now, as the test has been started already, and is currently running.

Pac · June 10, 2020, 8:27am

Do we know if Storagenode supports correctly prefixes “MB” vs “MiB”? I realize I set it to “2MB” whereas maybe I should have put “2MiB” in this case, as it’s a quite technical value?

Alexey · June 10, 2020, 8:32am

Yes, it will use the units which you specified.
We uses a decimal units (si) by default

Mad_Max · June 10, 2020, 8:36am

I think it can recognize both variants correctly. At least i set allocated apace in “GB” and “TB” units in config files and they works just fine.
You can check logs of a node startup though. If node can not interpret any of option provided it usually throws corresponding errors to log

Pac · June 10, 2020, 9:27am

After ~1h30:

load average: 121.11, 106.88, 81.60
the storagenode program is taking 700+MiB in RAM
and the disk’s w_await is around 1500ms to 2500ms:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.32    0.00    1.40   94.28    0.00    0.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
mmcblk0          0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda              0.50    1.00      2.00    346.00     0.00     0.00   0.00   0.00 1177.00 2139.00   0.87     4.00   346.00 580.00  87.00

Hm. Doesn’t look great. The node is still responding, but the load average is getting worse and worse. It usually freezes the node in the following hour or so when this happens.

Retrying now with --filestore.write-buffer-size="4MiB", before my disk gets full (there is only 190GB left on it won’t be able to run any more tests when it’s full).

Mad_Max · June 10, 2020, 10:11am

Is a disk in a healthy condition? If i read it right disk is doing just 1.5 requests per second (1 write + 0.5 read) and 340 KB/s of data writing on average and still choking?
Is looks too bad even for SMR drive with full CMR cache zone and more like disk having some hardware issues (like some unstable sectors about going to bad).

P.S.
Altrough it can be due to monitoring interval was too short. These “round” number (0.50 and 1.00) looks suspicions. Like if it was just 2 read and 4 write request during 4 seconds interval.