Machine freezing for short periods

Pac · June 10, 2020, 11:36am

@Mad_Max I’m not sure how to check correctly the disk’s health. If you have any pointers, I’d be happy to run some commands on it.

It does a lot more requests/s when not choking.
For instance, right now (not running any test in the end, because I’m waiting for the node to calm down - indeed just after starting up, I don’t know what it’s doing but it always reads like crazy during a lot of time before settling down; probably checking files, I dunno - I will run the test with “4MiB” when it’s back to normal) here are the figures:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.65    0.00    3.91   37.58    0.00   55.86

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
mmcblk0          0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda            226.50    0.00   4332.00      0.00   861.00     0.00  79.17   0.00    8.37    0.00   0.97    19.13     0.00   4.30  97.50

SGC · June 10, 2020, 12:46pm

duno what it does but recently when rebooting my node it will do some sort of work on disk that greatly increases iowait… it can take a pretty long time to complete…
for 1 hdd nodes i would suspect this could take 3-4 hours… but thats just a rough guess… so you will need to allow the node to run for like 24 hours before you can really evaluate the new settings.

iowait will drop sharply when its done…with whatever it is doing… and it will restart and run for the same amount of time every time you reboot the node.

Pac · June 10, 2020, 1:23pm

It took roughly 3hours on my node (1.8TB allocated space).
It must validate something, transactions or so, because while this massive read operation is taking place, the “remaining space” on the web dashboard is wrong: it was telling me that 3.3GB were available, but now that it has calmed down, it tells me -8.8GB are available

Anyways, it’s over now and everything is quiet again on my device: load average: 0.05, 0.04, 0.10

Let’s start it with some space available, with --filestore.write-buffer-size="4MiB" to see if it changes anything.

Pac · June 10, 2020, 2:41pm

After roughly 1h15 with the --filestore.write-buffer-size="4MiB" option, the node is dying again it seems: load average: 299.51, 282.63, 210.63.

Besides, something really suspect happens time to time, not sure if it was the case with previous configurations, but it is little by little taking a very large amount of RAM (1000MB+) up to the point where it suddenly goes back to almost nothing, and it rises again (I saw it above 1580MB once).

When checking the logs, it appears that the start-up routine is triggered several times, not only at start-up time. For instance, here is a quote in the middle of this test “session”:

[START OF THE LOG FILE]
2020-06-10T13:22:10.975Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2020-06-10T13:22:10.980Z        INFO    Operator email  {"Address": "..."}
2020-06-10T13:22:10.981Z        INFO    Operator wallet {"Address": "..."}
2020-06-10T13:22:11.842Z        INFO    Telemetry enabled
2020-06-10T13:22:11.946Z        INFO    db.migration    Database Version        {"version": 39}
2020-06-10T13:22:12.988Z        INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.
2020-06-10T13:22:13.870Z        INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.
2020-06-10T13:22:13.871Z        INFO    trust   Scheduling next refresh {"after": "5h41m38.286636482s"}
2020-06-10T13:22:13.871Z        INFO    bandwidth       Performing bandwidth usage rollups
2020-06-10T13:22:13.871Z        INFO    Node 12VHJRHeqGmnHsD3bcV2bkpfbv8jwER8NtX7h2wpgED8qDD6oHS started
2020-06-10T13:22:13.871Z        INFO    Public server started on [::]:28967
2020-06-10T13:22:13.872Z        INFO    Private server started on 127.0.0.1:7778
2020-06-10T13:22:15.446Z        INFO    piecestore      download started        {"Piece ID": "O4PHHU64KAJOZINH255MGEWIUCG7XLY2YYKDF2OWN2ZR6ZA2FYIQ", "Satellite ID": "12EayR...
2020-06-10T13:22:16.326Z        INFO    piecestore      downloaded      {"Piece ID": "O4PHHU64KAJOZINH255MGEWIUCG7XLY2YYKDF2OWN2ZR6ZA2FYIQ", "Satellite ID": "12EayRS2V1kEsW...
[...]
2020-06-10T14:22:39.917Z        INFO    piecestore      upload started  {"Piece ID": "KZCZESNGC7CAXUKYTHUEQDN2DPOWEAG7ZA5ZSM6UNVTO6QIHFN2A", "Satellite ID": "12rfG3sh9NCWiX...
2020-06-10T14:22:40.271Z        INFO    piecestore      upload started  {"Piece ID": "BZW7E3WAVNX6QNH4EZI2W46ZEJBFV5RI47LS7WXGLT7HVFKPPETQ", "Satellite ID": "121RTSDpyNZVcE...
2020-06-10T14:23:34.760Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
2020-06-10T14:23:34.763Z        INFO    Operator email  {"Address": "..."}
2020-06-10T14:23:34.763Z        INFO    Operator wallet {"Address": "..."}
2020-06-10T14:23:35.792Z        INFO    Telemetry enabled
2020-06-10T14:23:37.745Z        INFO    db.migration    Database Version        {"version": 39}
2020-06-10T14:23:43.470Z        INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.
2020-06-10T14:23:44.382Z        INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.
2020-06-10T14:23:44.382Z        INFO    bandwidth       Performing bandwidth usage rollups
2020-06-10T14:23:44.388Z        INFO    trust   Scheduling next refresh {"after": "4h56m45.640427281s"}
2020-06-10T14:23:44.390Z        INFO    Node 12VHJRHeqGmnHsD3bcV2bkpfbv8jwER8NtX7h2wpgED8qDD6oHS started
2020-06-10T14:23:44.390Z        INFO    Public server started on [::]:28967
2020-06-10T14:23:44.390Z        INFO    Private server started on 127.0.0.1:7778
2020-06-10T14:23:44.603Z        INFO    piecestore      upload started  {"Piece ID": "ANB5SA53KLJYDQIS3XXIM5OOTLONEEEJ7KQPGYJUPVXNIMNIQHUA", "Satellite ID": "12L9ZFwhzVpuEK...
2020-06-10T14:23:44.662Z        INFO    piecestore      upload started  {"Piece ID": "BHQCQJ4IJSAK6CXMKHSB7YAVWPUZ2LBLGADZ7K7CQM76BPS4XIQQ", "Satellite ID": "12rfG3sh9NCWiX...
[...]

Is the storagenode software crashing, or something? (still running v1.5.2 as it did not auto-update yet - watchtower is running).

Maybe my disk is not in good shape. Or… SMR disks really are bad, except for casual home users.

SGC · June 10, 2020, 3:12pm

the log is persistent, so reboots will also appear in the logs, also if it took 3hr the first time, then it will most likely take the same 3 hours for each reboot before whatever the node does after reboot is done.

i find extremes is usually the best way to find balance points.
so i think something like setting the write buffer to 64mb or even 128mb would be perfectly okay… tho keep in mind this is something like 1000 times the default, and thus because we don’t exactly know how it works could make it all act rather weird…

however taking into account that it’s supposely a write buffer then 5-10sec of upload should be what you want to aim for…
i’m usually at 3-4mb/s so thats 16-32mb write buffer…
see how it reacts and then if its being all weird you try something in the lower end like say 1mb… do a test run see how it turns out…
then you switch between highs and lows to try and gauge where the balance point is…

randomly guessing i find mostly a waste of time… and gives me bad results when trying to optimize such unknown variables.

but really with a write buffer you should at worst just be wasting whatever isn’t used of it… in ram… so should be minimal effect from setting to to high… if you got like 4gb or so.

Pac · June 10, 2020, 3:20pm

Are they? If it were the case it’d be surprising if storagenode would not log shutting down operations, or similar (I very rarely kill a node - most of the time I let it stop gracefully with a maximum time of 3600s, which is more than enough even when the disk is stalling like crazy).

Besides, whenever I start the node, when I run docker logs immediately after, I do not get any log history. For instance, just tried this now (after restarting the 4MiB buffer-size test node):

pi@raspberrypi:~/storj $ sudo docker logs storj_node_1 2>&1
2020-06-10T15:16:37.185Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}
2020-06-10T15:16:37.188Z	INFO	Operator email	{"Address": "..."}
2020-06-10T15:16:37.189Z	INFO	Operator wallet	{"Address": "..."}
pi@raspberrypi:~/storj $

nerdatwork · June 10, 2020, 3:53pm

Logs are deleted unless you redirect them to a file.

Pac · June 10, 2020, 4:02pm

I don’t. So something funny is going on I think ^^

SGC · June 10, 2020, 4:23pm

logs are persistent in docker until watchtower updates the node or one uses docker rm storagenode and then reinitialize the run command.

the log will say something like system shutdown command received upon a system restart…
and i’m sure a docker restart or stop should do the same… but i cannot remember having seen a particular log for that… might just be the same as when i do a server reboot.

ofc if you are running logs that have max mb settings or something like that, it might vary a bit…
but stock its like described above

Pac · June 10, 2020, 4:36pm

Aaah right. Well I always remove the container after stopping a node, because when I stop a node it’s usually because I want to change the startup command. That explains why my logs always start from scratch when starting a node.

But then, it also means that the logs I posted a few posts back are from a single run, with no stopping of the container in the middle.

If I’m not mistaking… when doing many tests, sometimes we cross our wires… ^^’

SGC · June 10, 2020, 4:54pm

i don’t think you have to run the docker rm command, to reinitialize with different run parameters.
if you want to keep the logs…

else if you are on linux, i made a script that exports the logs while keeping the docker logs, but it does takes a bit of system resources if the logs in docker gets extensive…

you can find it here, if you expand the post or click the link
this will also keep your logs across basically whatever you are doing… because it exports the previous whole minute to an external log file.

Mad_Max · June 11, 2020, 2:12am

Its a garbage collection and occupied space calculation (its count all files and folders volume to calculate total size of a storage, which produce a LOT of small read ops as there can be few millions of files!). Also stat collection for display in Dashboard. Altrought for me it just about 3 - 10 mins usually after start on my 2-4 TB nodes.

Probable it heavily depends on disk cache provided by OS. I run them on full size home servers (4 and 8 cores x86 CPUs and 8 - 16 GB of RAM) not Raspberry/NAS or other weak hardware.

P.S.
This activity after startup runs in a “background”. It does not block regular upload/downloads during this time. But of course it puts additional significant load on hardware until it finished.

Mad_Max · June 11, 2020, 2:21am

Only if node was rebooted “gratefully” or exit due to internal properly handled error.
If it just crash unexpectedly or process killed by OS - then there will be no any entries about it in the logs. And log will be looking exactly as in @Pac example with just repeating start-up routine out of sudden in the middle of work. I have few crashed long ago and already see this happen - logs was looking the same way as in this example.

So i think node was crashing or killed by container software too. Jumps of memory usage back ti almost zero also suggest this. And using too much RAM is a possible trigger for it . Its is a Rpi host so i guess just 2 GB of RAM total? It could be “out of RAM” crashes as i warned in previous post about tuning write cache option.

Yes, a single run of a container, but multiple runs (due to crashes presumably) of a Storj node app inside this container.

Mad_Max · June 11, 2020, 2:51am

SMART internal status dump for a start is enoug. If nothing suspicious there then better to use diagnostic tool provided by HDD manufacturer.

I am using Victoria HDD usually if i need to check health of my HDDs, but i think it has versions for DOS/Windows only. not sure about good equivalent for Linux

Pac · June 11, 2020, 7:21am

I’m using a 4GB version. 32bit OS though, dunno if that could explain a process kill when it goes beyond 2GB maybe.

I tried checking smart data with smartctl, but the USB enclosure the disk is in isn’t recognized

pi@raspberrypi:~ $ sudo smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [armv7l-linux-4.19.97-v7l+] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sda: Unknown USB bridge [0x0480:0x0901 (0x000)]
Please specify device type with the -d option.

Pac · June 11, 2020, 7:25am

Really?? I mean, how can a mechanical disk be that fast? You can have as many CPUs as you want, checking millions of file entries on a regular HDD cannot be done in 3 minutes!

What’s your secret?!

SGC · June 11, 2020, 8:04am

the node rebooting can be down to lack on internet connection i think… or if ports aren’t open or something like that… then it will simply reboot… not sure if it actually does that if the internet drops for a while… i suppose that might also depend on if you are using ddns and if that changes ip… there are many things that could cause a reboot… / crash tho it should write why in the logs… if it’s not a crash ofc… xD

my system has crashed more than my node thus far…usually that which pulls down the node with it…

My IOPS might not be the greatest, but i registered doing 112k reads pr sec…
I run a 2x raidz1 pool with dedicated 600gb l2ARC and OS / SLOG SSD, 48GB RAM and two cpu’s for a total of 16 threads.
and i’m pretty sure it takes me like 90 minutes to run the node reboot.
granted my system is currently running a bit weird… i cannot get my slog to run more than 8000 IOPS but still that is like 40 what a regular hdd can do… and that applies to my writes…
so i have flat write iops of 400 because it’s split over two raidz1’s and on top of that i got an ssd that will take random writes and make it into sequential writes.
then my ARC and L2ARC will remember data after it’s been reused… meaning that when i’m reading stuff that has already been accessed once before, it will remember it for like months…
and if its reused it will start to stick in either ram or l2arc depending on how often it’s used…

there isn’t anyway to compete with that using 1 disk… if your system finishes the node start up in 3-10min then it’s down to something else… maybe because you are on windows it’s slightly different in its programmer, or windows does something that fixes the issue that linux has with the storj programming…
windows is pretty good at making stuff just run smooth…

because my pool is still running kinda poorly i mean 112k reads pr sec is kinda weak, i’m migrating my node to a new pool to optimize so i can get it up to full speed… should really be 5x faster in many of the iops tasks.

got to reboot my node soon, ill check and see exactly how long my node postboot activity takes… now my server has been up for 7 days, and then node have been rebooted a few times i think… hmmm or not at all… no matter ill try to reboot it twice… that would make it 3 times the system has seen the boot, and then it should all be cached and ready to go.

but my gut kinda tells me it’s a windows thing…
would be kinda interesting if it is, because that would mean there is a way around it…

Alexey · June 12, 2020, 11:10am

You can figure out if it was killed by system:

journalctl | grep "shim reaped"

Pac · June 12, 2020, 12:18pm

@Alexey Thx!

I’ve got plenty of entries on the couple of days I ran tests on my Node:

pi@raspberrypi:~ $ journalctl | grep "shim reaped"
Jun 09 10:50:11 raspberrypi containerd[564]: time="2020-06-09T10:50:11.368010169Z" level=info msg="shim reaped" id=683058c8eec348ea59733a4425426854bfdb4e49289450556e14093c6acf39e9
Jun 09 11:46:30 raspberrypi containerd[564]: time="2020-06-09T11:46:30.496174686Z" level=info msg="shim reaped" id=683058c8eec348ea59733a4425426854bfdb4e49289450556e14093c6acf39e9
Jun 09 11:51:46 raspberrypi containerd[564]: time="2020-06-09T11:51:46.611062321Z" level=info msg="shim reaped" id=25a9b7d459a791d218426e3e77b2615525188d0fd07a03186bdd1fdfa1bead87

[... many more ...]

Jun 10 14:40:40 raspberrypi containerd[564]: time="2020-06-10T14:40:40.383645471Z" level=info msg="shim reaped" id=7eb5f85cac40522af729b68f1d815a6869fb295f24bf15f3eabee8ceefd9a1c3
Jun 10 14:58:15 raspberrypi containerd[564]: time="2020-06-10T14:58:15.904045345Z" level=info msg="shim reaped" id=7eb5f85cac40522af729b68f1d815a6869fb295f24bf15f3eabee8ceefd9a1c3
Jun 10 15:17:43 raspberrypi containerd[564]: time="2020-06-10T15:17:43.863690174Z" level=info msg="shim reaped" id=818e95c15d2b1f6afd9ef3283462d896f5a67feaeb7d18cba2adbf6a3d3285d1

So that confirms what we saw in the logs: The storagenode process gets repeatedly killed when the disk is stalling, at least with a high write-buffer-size.

Pac · June 12, 2020, 12:22pm

Just ran a PRAGMA integrity_check; on a copy of all *.db files to check their health: they all seem fine apparently. Pheeww…