StorJ consumes a lot of IOPS, affects other loads on the discs

SlavikCA · September 6, 2024, 1:19am

This is just me sharing experience from running StorJ node.

You know, that StorJ promote that idea, that you can put unused resource (disc space) into use and even get paid for it.

It works great for unused disk space. But you also need to think about IOPS.

Recently I had that idea, that I use 16TB HDD for video surveillance (Frigate) running in Ubuntu VM, really need around 1TB, and the rest - for StorJ. Currently I have about 5TB used by StorJ.

Well, I found StorJ sometimes uses so much IOPS on that HDD, that Ubuntu disk access latency grows really bad. And when the latency is larger than 8seconds (!) - VM crashes.

I created the bug for this issue for the Hypervisor:

github.com/harvester/harvester

[BUG] VM crashing when disc can't handle the IOPS load

opened 08:42PM - 05 Sep 24 UTC

SlavikCA

kind/bug reproduce/needed severity/needed

**Describe the bug** I have Ubuntu 22 VM, which runs on SSD. I created PVC, wh…ich uses 200GB of single HDD disc, formatted as ext4 and mounted it to VM. I'm running Frigate (video surveillance), which write data that PVC (HDD disc) ~~and should delete old data, when free storage gets low. I'm not sure, if disc space completely got to 0, or just to low value, but VM crashed few times.~~ Host HDD has plenty of free space. I also have another container, which heavily writes to the same host HDD disc (different PVC) And I think that makes some disk operation to time out and VM to crash: **Host T7920 dmesg output:** ``` [Thu Sep 5 19:46:11 2024] scsi_io_completion_action: 9 callbacks suppressed [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#13 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#13 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#13 CDB: Write(10) 2a 00 17 40 08 48 00 00 08 00 [Thu Sep 5 19:46:11 2024] print_req_error: 9 callbacks suppressed [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 390072392 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#12 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#12 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#12 CDB: Write(10) 2a 00 17 40 08 30 00 00 08 00 [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 390072368 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#89 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#89 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#89 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#89 CDB: Write(10) 2a 00 15 40 08 20 00 00 08 00 [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 356517920 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#11 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#11 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#11 CDB: Write(10) 2a 00 15 c0 08 00 00 00 18 00 [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 364906496 op 0x1:(WRITE) flags 0x8800 phys_seg 3 prio class 0 [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#68 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#68 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#68 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#68 CDB: Write(10) 2a 00 0c 4a d0 80 00 00 68 00 [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 206229632 op 0x1:(WRITE) flags 0x8800 phys_seg 13 prio class 0 [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#53 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#53 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#53 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#53 CDB: Write(10) 2a 00 16 00 08 28 00 00 08 00 [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 369100840 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#39 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#39 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#39 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#39 CDB: Write(10) 2a 00 15 40 08 38 00 00 48 00 [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 356517944 op 0x1:(WRITE) flags 0x8800 phys_seg 9 prio class 0 [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#10 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#10 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#10 CDB: Write(10) 2a 00 15 80 08 00 00 00 80 00 [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 360712192 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#67 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#67 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#67 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#67 CDB: Write(10) 2a 00 17 80 08 50 00 00 08 00 [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 394266704 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#69 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#69 Sense Key : Medium Error [current] [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#69 Add. Sense: Unrecovered read error [Thu Sep 5 19:46:11 2024] sd 12:0:0:1: [sdg] tag#69 CDB: Write(10) 2a 00 0c e1 90 b0 00 08 a8 00 [Thu Sep 5 19:46:11 2024] blk_update_request: critical medium error, dev sdg, sector 216109232 op 0x1:(WRITE) flags 0x8800 phys_seg 277 prio class 0 [Thu Sep 5 19:46:24 2024] scsi host18: iSCSI Initiator over TCP/IP [Thu Sep 5 19:46:24 2024] scsi 18:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 [Thu Sep 5 19:46:24 2024] scsi 18:0:0:0: Attached scsi generic sg0 type 12 [Thu Sep 5 19:46:24 2024] scsi 18:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 [Thu Sep 5 19:46:24 2024] sd 18:0:0:1: Attached scsi generic sg16 type 0 [Thu Sep 5 19:46:24 2024] sd 18:0:0:1: Power-on or device reset occurred [Thu Sep 5 19:46:24 2024] sd 18:0:0:1: [sda] 419430400 512-byte logical blocks: (215 GB/200 GiB) [Thu Sep 5 19:46:24 2024] sd 18:0:0:1: [sda] Write Protect is off [Thu Sep 5 19:46:24 2024] sd 18:0:0:1: [sda] Mode Sense: 69 00 10 08 [Thu Sep 5 19:46:24 2024] sd 18:0:0:1: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA [Thu Sep 5 19:46:24 2024] sda: sda1 [Thu Sep 5 19:46:24 2024] sd 18:0:0:1: [sda] Attached SCSI disk [Thu Sep 5 19:46:56 2024] k6t-37a8eec1ce1: port 2(tap37a8eec1ce1) entered disabled state [Thu Sep 5 19:46:58 2024] device tap37a8eec1ce1 left promiscuous mode [Thu Sep 5 19:46:58 2024] k6t-37a8eec1ce1: port 2(tap37a8eec1ce1) entered disabled state [Thu Sep 5 19:46:59 2024] device 37a8eec1ce1-nic left promiscuous mode [Thu Sep 5 19:46:59 2024] k6t-37a8eec1ce1: port 1(37a8eec1ce1-nic) entered disabled state [Thu Sep 5 19:46:59 2024] mgmt-br: port 2(vethe7101a4c) entered disabled state [Thu Sep 5 19:46:59 2024] device vethe7101a4c left promiscuous mode [Thu Sep 5 19:46:59 2024] mgmt-br: port 2(vethe7101a4c) entered disabled state [Thu Sep 5 19:58:37 2024] scsi host12: iSCSI Initiator over TCP/IP [Thu Sep 5 19:58:37 2024] scsi 12:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 [Thu Sep 5 19:58:37 2024] scsi 12:0:0:0: Attached scsi generic sg0 type 12 [Thu Sep 5 19:58:37 2024] scsi 12:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 [Thu Sep 5 19:58:37 2024] sd 12:0:0:1: Attached scsi generic sg16 type 0 [Thu Sep 5 19:58:37 2024] sd 12:0:0:1: Power-on or device reset occurred [Thu Sep 5 19:58:37 2024] sd 12:0:0:1: [sda] 419430400 512-byte logical blocks: (215 GB/200 GiB) [Thu Sep 5 19:58:37 2024] sd 12:0:0:1: [sda] Write Protect is off [Thu Sep 5 19:58:37 2024] sd 12:0:0:1: [sda] Mode Sense: 69 00 10 08 [Thu Sep 5 19:58:37 2024] sd 12:0:0:1: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA [Thu Sep 5 19:58:38 2024] sda: sda1 [Thu Sep 5 19:58:38 2024] sd 12:0:0:1: [sda] Attached SCSI disk [Thu Sep 5 19:58:39 2024] scsi host17: iSCSI Initiator over TCP/IP [Thu Sep 5 19:58:39 2024] scsi 17:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 [Thu Sep 5 19:58:39 2024] scsi 17:0:0:0: Attached scsi generic sg17 type 12 [Thu Sep 5 19:58:39 2024] scsi 17:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 [Thu Sep 5 19:58:39 2024] sd 17:0:0:1: Attached scsi generic sg18 type 0 [Thu Sep 5 19:58:39 2024] sd 17:0:0:1: Power-on or device reset occurred [Thu Sep 5 19:58:39 2024] sd 17:0:0:1: [sdg] 268435456 512-byte logical blocks: (137 GB/128 GiB) [Thu Sep 5 19:58:39 2024] sd 17:0:0:1: [sdg] Write Protect is off [Thu Sep 5 19:58:39 2024] sd 17:0:0:1: [sdg] Mode Sense: 69 00 10 08 [Thu Sep 5 19:58:39 2024] sd 17:0:0:1: [sdg] Write cache: disabled, read cache: enabled, supports DPO and FUA [Thu Sep 5 19:58:40 2024] sdg: sdg1 sdg14 sdg15 [Thu Sep 5 19:58:40 2024] sd 17:0:0:1: [sdg] Attached SCSI disk [Thu Sep 5 19:58:46 2024] loop1: detected capacity change from 0 to 419430400 [Thu Sep 5 19:58:48 2024] loop2: detected capacity change from 0 to 268435456 [Thu Sep 5 19:58:49 2024] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [Thu Sep 5 19:58:49 2024] mgmt-br: port 2(vethfc53da7e) entered blocking state [Thu Sep 5 19:58:49 2024] mgmt-br: port 2(vethfc53da7e) entered disabled state [Thu Sep 5 19:58:49 2024] device vethfc53da7e entered promiscuous mode [Thu Sep 5 19:58:49 2024] mgmt-br: port 2(vethfc53da7e) entered blocking state [Thu Sep 5 19:58:49 2024] mgmt-br: port 2(vethfc53da7e) entered forwarding state [Thu Sep 5 19:58:50 2024] mgmt-br: port 2(vethfc53da7e) entered disabled state [Thu Sep 5 19:58:50 2024] 37a8eec1ce1-nic: renamed from pod37a8eec1ce1 [Thu Sep 5 19:58:50 2024] k6t-37a8eec1ce1: port 1(37a8eec1ce1-nic) entered blocking state [Thu Sep 5 19:58:50 2024] k6t-37a8eec1ce1: port 1(37a8eec1ce1-nic) entered disabled state [Thu Sep 5 19:58:50 2024] device 37a8eec1ce1-nic entered promiscuous mode [Thu Sep 5 19:58:50 2024] IPv6: ADDRCONF(NETDEV_CHANGE): 37a8eec1ce1-nic: link becomes ready [Thu Sep 5 19:58:50 2024] k6t-37a8eec1ce1: port 1(37a8eec1ce1-nic) entered blocking state [Thu Sep 5 19:58:50 2024] k6t-37a8eec1ce1: port 1(37a8eec1ce1-nic) entered forwarding state [Thu Sep 5 19:58:50 2024] mgmt-br: port 2(vethfc53da7e) entered blocking state [Thu Sep 5 19:58:50 2024] mgmt-br: port 2(vethfc53da7e) entered forwarding state [Thu Sep 5 19:58:50 2024] k6t-37a8eec1ce1: port 2(tap37a8eec1ce1) entered blocking state [Thu Sep 5 19:58:50 2024] k6t-37a8eec1ce1: port 2(tap37a8eec1ce1) entered disabled state [Thu Sep 5 19:58:50 2024] device tap37a8eec1ce1 entered promiscuous mode [Thu Sep 5 19:58:50 2024] k6t-37a8eec1ce1: port 2(tap37a8eec1ce1) entered blocking state [Thu Sep 5 19:58:50 2024] k6t-37a8eec1ce1: port 2(tap37a8eec1ce1) entered forwarding state [Thu Sep 5 19:58:51 2024] k6t-37a8eec1ce1: port 2(tap37a8eec1ce1) entered disabled state [Thu Sep 5 19:58:52 2024] IPv6: ADDRCONF(NETDEV_CHANGE): tap37a8eec1ce1: link becomes ready [Thu Sep 5 19:58:52 2024] k6t-37a8eec1ce1: port 2(tap37a8eec1ce1) entered blocking state [Thu Sep 5 19:58:52 2024] k6t-37a8eec1ce1: port 2(tap37a8eec1ce1) entered forwarding state [Thu Sep 5 19:59:10 2024] vfio-pci 0000:d5:00.0: vfio_ecap_init: hiding ecap 0x19@0x900 ``` **Environment** - Harvester 1.3.1 on 2 master&worker nodes + witness Master&worker nodes are: - Baremetal Dell Precision desktop tower - Xeon Gold 5280 - 128GB RAM - NVM disc (used as OS disc for VMs) - Seagate HDD (used for additional PVCs, mounted to VM) **Support bundle** - https://s3.fursov.family/shares/supportbundle_9ae6faf0-dfdf-48ab-ad8f-8a3686a01fac_2024-09-05T20-43-57Z.zip From inside VM: ``` slavik@t7920-gpu:~$ df -hT Filesystem Type Size Used Avail Use% Mounted on /dev/vdb1 ext4 196G 177G 9.7G 95% /mnt/hdd ... # mounted in /etc/fstab UUID=3cc755d7-218b-4eda-9a7c-4539af659983 /mnt/hdd ext4 defaults 0 1 ``` Can anything by done about it? Or slow disk will inevitably result in crashing VM?

Because the high latency should not result in VM crash.
But as SNO you better know, that StorJ can put so much IOPS load on HDD, that the idea, that StorJ using only “unused” resource is not really practical.

With StorJ you need to think what other loads are on the same disc.

Valid scenario is to store backups on the same discs, as StorJ. Because saving backups is not latency-sensitive.
But any latency-sensitive application would have issues running on the same HDD as StorJ.

arrogantrabbit · September 6, 2024, 1:43am

This is the worst possible scenario. Requirements for video and storj are the opposite.

You have sequential IO with a few massive files versus random io with hundreds of millions of small files. Well optimized surveillance system won’t support that.

You need fast access to metadata to those millions of files. Not a requirement for video.

In the end — don’t run storagenode on an NVR

Alexey · September 6, 2024, 2:35am

You may run a node on NVR, just not on the same disk
But yes, the load is very similar to NAS.

EasyRhino · September 6, 2024, 4:04pm

Good anecdote. My storj disks are too busy filling up or deleting files to put much persistent data on.

(although sometimes I will fill in unused space with chia plots. chia farming uses almost zero iops).

But yeah, while all my storj disks are in my home server, my home data is on separate disks.

Alexey · September 7, 2024, 7:21am

Not necessarily, just depends on the load type.
For example, my nodes shares disks with the hypervisor and several VMs (not so much as in the past, but still), they seems all happy.
Those VMs not exactly about the storage, just their VHDX there, they all about networking and stuff (k3s cluster with Longhorn and the small Nomad cluster on three nodes + Consul and Vault in particular).