High load due to high disk usage

From time to time, I get a very high CPU/disk usage, that drains the server down (8 core server).

Other services are working fine, load is about 2, then goes the « used-space-filewalker » and boom, load skyrocket to 20-30.

This started to happen a few weeks (months?) back, and I don’t know why it’s so aggressive.


image

I’ll add some logs below.

2023-11-01T15:02:09Z    INFO    Downloading versions.   {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2023-11-01T15:02:10Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.90.2"}
2023-11-01T15:02:10Z    INFO    Version is up to date   {"Process": "storagenode-updater", "Service": "storagenode"}
2023-11-01T15:02:10Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.90.2"}
2023-11-01T15:02:10Z    INFO    Version is up to date   {"Process": "storagenode-updater", "Service": "storagenode-updater"}
[...]
2023-11-01T15:03:13Z    ERROR   services        unexpected shutdown of a runner {"process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:176\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:165\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75"}
2023-11-01T15:03:28Z    WARN    servers service takes long to shutdown  {"process": "storagenode", "name": "server"}
2023-11-01T15:03:28Z    WARN    services        service takes long to shutdown  {"process": "storagenode", "name": "retain"}
2023-11-01T15:03:28Z    INFO    services        slow shutdown   {"process": "storagenode", "stack": "goroutine 1027 [running]:\nstorj.io/storj/private/lifecycle.(*Group).logStackTrace.func1()\n\t/go/src/storj.io/storj/private/lifecycle/[...]
(big long description of stack trace removed)
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory
2023-11-01 15:03:48,444 INFO exited: storagenode (exit status 1; not expected)
2023-11-01 15:03:49,877 INFO spawned: 'storagenode' with pid 5598
2023-11-01 15:03:49,878 WARN received SIGQUIT indicating exit request
2023-11-01 15:03:50,029 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-11-01T15:03:50Z    INFO    Got a signal from the OS: "terminated"  {"Process": "storagenode-updater"}
2023-11-01 15:03:50,030 INFO stopped: storagenode-updater (exit status 0)
2023-11-01T15:03:50Z    INFO    Configuration loaded    {"process": "storagenode", "Location": "/app/config/config.yaml"}
2023-11-01T15:03:50Z    INFO    Anonymized tracing enabled      {"process": "storagenode"}
2023-11-01T15:03:50Z    INFO    Operator email  {"process": "storagenode", /* removed */}
2023-11-01T15:03:50Z    INFO    Operator wallet {"process": "storagenode", /* removed */}
2023-11-01T15:03:50Z    INFO    server  kernel support for server-side tcp fast open remains disabled.  {"process": "storagenode"}
2023-11-01T15:03:50Z    INFO    server  enable with: sysctl -w net.ipv4.tcp_fastopen=3  {"process": "storagenode"}
2023-11-01T15:03:50Z    INFO    Telemetry enabled       {"process": "storagenode", /* removed */}
2023-11-01T15:03:50Z    INFO    Event collection enabled        {"process": "storagenode", /* removed */}
2023-11-01T15:03:51Z    INFO    db.migration    Database Version        {"process": "storagenode", "version": 54}
2023-11-01 15:03:53,068 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-11-01T15:03:54Z    INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.        {"process": "storagenode"}
2023-11-01T15:03:55Z    INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.    {"process": "storagenode"}
2023-11-01T15:03:55Z    INFO    Node xxx started        {"process": "storagenode"}
2023-11-01T15:03:55Z    INFO    Public server started on [::]:28967     {"process": "storagenode"}
2023-11-01T15:03:55Z    INFO    Private server started on 127.0.0.1:7778        {"process": "storagenode"}
2023-11-01T15:03:55Z    INFO    pieces:trash    emptying trash started  {"process": "storagenode", /* removed */}
2023-11-01T15:03:55Z    INFO    bandwidth       Performing bandwidth usage rollups      {"process": "storagenode"}
2023-11-01T15:03:55Z    INFO    lazyfilewalker.used-space-filewalker    starting subprocess     {"process": "storagenode", /* removed */}
2023-11-01T15:03:55Z    INFO    trust   Scheduling next refresh {"process": "storagenode", "after": "8h1m55.523334006s"}
2023-11-01T15:03:55Z    INFO    lazyfilewalker.used-space-filewalker    subprocess started      {"process": "storagenode", /* removed */}
2023-11-01T15:03:55Z    INFO    collector       deleted expired piece   {"process": "storagenode", /* removed */, "Piece ID": "LGVFZBREIDM75GI62CV2Z3JZDVH4JPIKLVEZP2I4QRQA3JUZPLLA"}
2023-11-01T15:03:55Z    INFO    collector       deleted expired piece   {"process": "storagenode", /* removed */, "Piece ID": "3XBDSNYCFQ2OCLZVBOYZ4RXANVHFRVUYRCATLQZSF2PY5JHDVYYA"}
2023-11-01T15:03:55Z    INFO    collector       deleted expired piece   {"process": "storagenode", /* removed */, "Piece ID": "5J5IRTBXP7CAUDZPKMRJKA3Z3AMOZC2XQOUPP5TSHL52GQK5LKRA"}
2023-11-01T15:03:55Z    INFO    collector       deleted expired piece   {"process": "storagenode", /* removed */, "Piece ID": "CEEMO7ZLMFRLY4QMADBDHY5E7BEJB5727P6DYIEVR5GDI4TUDFUA"}
2023-11-01T15:03:55Z    INFO    collector       deleted expired piece   {"process": "storagenode", /* removed */, "Piece ID": "QZDNFHBNEMGXS5YLFQHRISU5PSMZDG2MZGGUMNEZK4JDQC35777A"}
2023-11-01T15:03:55Z    INFO    collector       deleted expired piece   {"process": "storagenode", /* removed */, "Piece ID": "LIWL6YJIO6UOUZTI7BFSU7OEYVBKLZRXU4XYNQ4CPYOO43RVFVHA"}
2023-11-01T15:03:55Z    INFO    collector       deleted expired piece   {"process": "storagenode", /* removed */, "Piece ID": "PULS5CSPD2QF3TMK72MV6ZC5YU3HNY4CM5DCVDKZPNGIFOCUULQA"}
2023-11-01T15:03:55Z    INFO    collector       deleted expired piece   {"process": "storagenode", /* removed */, "Piece ID": "V4J3JE7JO476TAVPKENSL5OPXHGDSTLYFFQJWA54VDRYGT7OWDUQ"}
2023-11-01T15:03:55Z    INFO    collector       collect {"process": "storagenode", "count": 8}
2023-11-01T15:03:55Z    INFO    lazyfilewalker.used-space-filewalker.subprocess Database started        {"process": "storagenode", /* removed */, "process": "storagenode"}
2023-11-01T15:03:55Z    INFO    lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started   {"process": "storagenode", /* removed */, "process": "storagenode"}
2023-11-01 15:03:57,061 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-11-01 15:04:00,064 WARN killing 'storagenode' (5598) with SIGKILL
2023-11-01 15:04:00,064 INFO waiting for storagenode, processes-exit-eventlistener to die
2023-11-01 15:04:00,690 INFO stopped: storagenode (terminated by SIGKILL)
2023-11-01 15:04:00,691 INFO stopped: processes-exit-eventlistener (terminated by SIGTERM)
2023-11-01 15:06:14,516 INFO Set uid to user 0 succeeded
2023-11-01 15:06:14,612 INFO RPC interface 'supervisor' initialized
2023-11-01 15:06:14,613 INFO supervisord started with pid 1
2023-11-01 15:06:15,617 INFO spawned: 'processes-exit-eventlistener' with pid 10
2023-11-01 15:06:15,623 INFO spawned: 'storagenode' with pid 11
2023-11-01 15:06:15,628 INFO spawned: 'storagenode-updater' with pid 12
2023-11-01T15:06:15Z    INFO    Configuration loaded    {"Process": "storagenode-updater", "Location": "/app/config/config.yaml"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "console.address"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "storage.allocated-disk-space"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "server.private-address"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "healthcheck.details"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "operator.email"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "contact.external-address"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "storage.allocated-bandwidth"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "operator.wallet-features"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "server.address"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "operator.wallet"}
2023-11-01T15:06:15Z    INFO    Invalid configuration file key  {"Process": "storagenode-updater", "Key": "healthcheck.enabled"}
2023-11-01T15:06:15Z    INFO    Anonymized tracing enabled      {"Process": "storagenode-updater"}
2023-11-01T15:06:15Z    INFO    Configuration loaded    {"process": "storagenode", "Location": "/app/config/config.yaml"}
2023-11-01T15:06:15Z    INFO    Anonymized tracing enabled      {"process": "storagenode"}
2023-11-01T15:06:15Z    INFO    Operator email  {"process": "storagenode", /* removed */}
2023-11-01T15:06:15Z    INFO    Operator wallet {"process": "storagenode", /* removed */}
2023-11-01T15:06:15Z    INFO    Running on version      {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.90.2"}
2023-11-01T15:06:15Z    INFO    Downloading versions.   {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2023-11-01T15:06:15Z    INFO    server  kernel support for server-side tcp fast open remains disabled.  {"process": "storagenode"}
2023-11-01T15:06:15Z    INFO    server  enable with: sysctl -w net.ipv4.tcp_fastopen=3  {"process": "storagenode"}
2023-11-01T15:06:16Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.90.2"}
2023-11-01T15:06:16Z    INFO    Version is up to date   {"Process": "storagenode-updater", "Service": "storagenode"}
2023-11-01T15:06:16Z    INFO    Current binary version  {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.90.2"}
2023-11-01T15:06:16Z    INFO    Version is up to date   {"Process": "storagenode-updater", "Service": "storagenode-updater"}
2023-11-01T15:06:16Z    INFO    Telemetry enabled       {"process": "storagenode", /* removed */}
2023-11-01T15:06:16Z    INFO    Event collection enabled        {"process": "storagenode", /* removed */}
2023-11-01 15:06:17,287 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-11-01 15:06:17,287 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-11-01 15:06:17,287 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-11-01T15:06:17Z    INFO    db.migration    Database Version        {"process": "storagenode", "version": 54}
2023-11-01T15:06:18Z    INFO    preflight:localtime     start checking local system clock with trusted satellites' system clock.        {"process": "storagenode"}
2023-11-01T15:06:19Z    INFO    preflight:localtime     local system clock is in sync with trusted satellites' system clock.    {"process": "storagenode"}
2023-11-01T15:06:19Z    INFO    bandwidth       Performing bandwidth usage rollups      {"process": "storagenode"}
2023-11-01T15:06:19Z    INFO    Node xxx started        {"process": "storagenode"}
2023-11-01T15:06:19Z    INFO    Public server started on [::]:28967     {"process": "storagenode"}
2023-11-01T15:06:19Z    INFO    Private server started on 127.0.0.1:7778        {"process": "storagenode"}
2023-11-01T15:06:19Z    INFO    pieces:trash    emptying trash started  {"process": "storagenode", /* removed */}
2023-11-01T15:06:19Z    INFO    trust   Scheduling next refresh {"process": "storagenode", "after": "3h59m29.796946439s"}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker    starting subprocess     {"process": "storagenode", /* removed */}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker    subprocess started      {"process": "storagenode", /* removed */}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker.subprocess Database started        {"process": "storagenode", /* removed */, "process": "storagenode"}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started   {"process": "storagenode", /* removed */, "process": "storagenode"}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker completed {"process": "storagenode", /* removed */, "process": "storagenode", "piecesTotal": 47153346816, "piecesContentSize": 47141446912}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker    subprocess finished successfully        {"process": "storagenode", /* removed */}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker    starting subprocess     {"process": "storagenode", /* removed */}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker    subprocess started      {"process": "storagenode", /* removed */}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker.subprocess Database started        {"process": "storagenode", /* removed */, "process": "storagenode"}
2023-11-01T15:06:19Z    INFO    lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started   {"process": "storagenode", /* removed */, "process": "storagenode"}
2023-11-01T15:06:20Z    INFO    pieces:trash    emptying trash started  {"process": "storagenode", /* removed */}
2023-11-01T15:06:20Z    INFO    pieces:trash    emptying trash started  {"process": "storagenode", /* removed */}
2023-11-01T15:06:20Z    INFO    pieces:trash    emptying trash started  {"process": "storagenode", /* removed */}

I’m wondering why the process is run by root… Thought it would be run as the local user it’s started from. (OK, found out there’s a new --user option, great!)

Any option to « nice » the whole process? It was never that violent for the disk I/O before. It was indeed using a lot of I/O, but never sent the load to something higher than 10. And server was responsive, now it’s just a nightmare for 10 minutes strait.

You can reduce the IO priority of filewalker here: Enabling the Lazyfilewalker

2 Likes

slow or fragmented subsystem check here

2 Likes

filesystem? clustersize ? more hardware info? dbs to ssd?

Thanks for your suggestions of threads (and @Ambifacient for the lazy option, I will set this up).

I’ve open the thread as I’m running Storj for many years now. And it didn’t have an impact on my servers since a few weeks. So to me it’s a major change that’s difficult to pinpoint.

Filesystem is ext4, with noatime/nodiratime.
Clustersize, I think you mean the allocated size for Storj, is 2TB (but may be lower for other servers)
Hardware is modern servers, most of them are 8 CPU with « Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz » or the like.
Disks are HDD, not SSD. All servers are using RAID 1 (mirroring).

Note that the issue happens for all my servers, not just for one (that could indicate an hardware issue).

I’ll test the lazy filewalker, and post back if I don’t get issues anymore. That may take a while, as the filewalker doesn’t run often (and that’s a good point in this situation!).

Thanks :slight_smile:

1 Like

raid1 is slow on writes. check 10% disk space free? defragmentation?

Again, the issue wasn’t there « before » (ie: for 3-4 years).

There’s more than 10% free space, for defragmentation I don’t know, since I’m under Linux I don’t do defrag anymore.

To me the issue is clearly linked to the crash (maybe) and restart of filewalker along the node starting up. But I don’t have the same issue when I just restart the node (for instance when I need to update the Linux kernel, about 4-6 times a year).

This timeout was implemented month ago.

Fragmentation builds up with time. Esp the databases
Esp. When all io and db is on (are yours cmr? )Disks.

The high usage of the storage network make it double worse.

Filewalkers take hours to complete.
They walk the files and node write and reads the same time al over the disk. So the check for writability fails.

increase the timeout and investigate.

Test How long takes the dashboard to load?

And fragmentation.

1 Like

It is not. It is just writes at normal speed of of a single disk.

If these serves are used only for STORJ, pretty useless to use RAID1 for that, besides you should only unused hardware.

3 Likes

Ok, now we agree, that it’s as fast as the slowest disk in the set. Also its not the slowest writing raid level. :nerd_face:

“His” raid 1 is (temporarily) slow on writes…it’s stated in the logs.

3 Likes

since you use ext4, you should. Ofc it may be better than ntfs, but after 4y storj, it will be fragmented too.

1 Like

Customer access patterns change, that’s a natural thing to expect.

The only recent software change that might be relevant is that part of the file walker process was moved to a separate process, so that it’s I/O priority can be tuned separately. This is what you see by the used-file-walker process. But it was there before, just not a separate process.

On the other side, we do now observe more data churn.

With ext4 and 64 GB of RAM for barely 2 TB of a node, the file walker should take maybe half an hour. If it takes longer, then that’s a reason to worry.

2 Likes

I don’t use it. I’ve just a few calls to uptime and the like on the API. Can I see the information in an API call?

I’ve got the same issue on recent installations (~5 months old), with brand new disks, always when the filewalker runs. But all are running on RAID 1, because the main purpose of these servers is to be used as a webserver, with redundancy, and STORJ exploits the free space (1-2TB).

For example Syncthing on a same hardware is slow to index all files (1M+), but the load doesn’t spike to 20, and the server is still responsive. My STORJ node with 4M+ files should be slow to index (filewalker), but shouldn’t add that much load.

Do you recommend any particular tool for this matter?

I do agree. But it should be better, not worse. I expect improvements of 5-10%, possibly the opposite in some cases. Here the system is not responsive for a few minutes, that’s not natural :slight_smile:

I wouldn’t be concerned if it would take 4 hours (I’m thinking Syncthing’s resync, or md raid’s resync). It’s just the spike as you can see in my first message.

I’ll try (in a few days) to set different settings, with all the help that was provided I should pinpoint the issue.

Thanks to all :slight_smile:

no, i don’t use linux.
but there are tools to defrag without unmounting (check option included)

one is mentioned here

1 Like

I fail to see how defrag could help in this case.

The problem of @dugwood is the high CPU usage, right?
That he gets during filewalker?
How is that connected to the disks being fragmented?
His graph shows the high CPU usage for the storagenode container and not mdadm or whatever RAID 1 he uses.

2 Likes

Right, CPU is used to read I/O (as I’ve learned in the atop’s code), but it’s mainly a high I/O usage than a high CPU usage.

RAID1 (md) may impact this if filewalker uses a lot of writes. If it’s only reads, RAID1 should provide about 90% improvement as we read on 2 drives.

Perhaps it’s linked to another issue, such as read/write bottleneck, as I’ve seen a lot of processes in D state (same as in my first screenshot) during a chown of the entire STORJ directory (I was switching the user of all the files, of course STORJ was stopped at the time). Say that the filewalker uses more than one walker at the same time, that could explain the very high load: load goes up by 1 when 1 new process is waiting for CPU or disk. If it went to 40, that could mean that STORJ was running 30-35 concurrent read/writes.

For instance, with Syncthing when there’s 1 thread by CPU, load goes to more than 8 (which is the number of CPUs on my servers), but if I limit it to 2 threads, then the load stays around 2-3 as expected.

I haven’t looked at the STORJ’s code, perhaps I should, but I will definitely look for concurrent workers/writers/readers in the configuration file.

Still, check that. If dashboard takes longer than few seconds to load it a a strong indication you ran out of IOPS on the disk databases are located.

Why? It will be getting worse. Amount of data you store increases, random IO hitting the disk increases, but the maximum IOPS your disk can support stays constant. Of course it will be getting worse.

You need to offload random IO from disks using one or more of many solutions discussed on this forum earlier.

20-30 of what? What are the units? What is on the axis of your graphs?

If filewalker consumes 100% CPU it’s great, means you have most io going to cache. But then you say your disk IO also increases?

10 of what?

Describe your storage setup.

For reference, my ZFS array with special device on 10+TB node completes filewalker in a few minutes, with few thousands iops hitting the special device and 100% cpu consumption.

Not sure it’s enabled by default. I don’t want to use it if it consumes more. If it doesn’t and just consumes resources when launched (in the « web » interface), then I might have a look.

I meant comparing to older versions. I’ve always had about 1TB of data, with high bandwidth usage (no issue there), but never such load spikes.

Linux « load average ». It represents the number of processes that are waiting for resources. Be it CPU, I/O…
Filewalker isn’t using 100% CPU, but 100% I/O, which create a stack of processes waiting to read/write from disk (so the « load » increases). See my first screenshot, you’ll read the « Load average ».

If you have a better indicator, please tell me. From what I read in your post and others, perhaps the dashboard will help with reading specific values that we can all talk about.

2 HDD mounted with mdadm in RAID1 on ext4. Linux kernel is 6.x.