100% HDD usage normal?

Hi Everyone!

I noticed that my HDD that I am using as the data directory is at a constant 100% usage with it all being accessed from the “Blob” directory. I’m concerned 100% usage is going to drastically shorten the drive life. Is this normal?

Thanks!

storage nodes are meant to be accessed continuously for small files reads and writes. What is your drive ? I guess your node is running on windows.Correct ? If it’s an SMR drive it is struggling with IOPS

1 Like

Any drive you use with Storj is going to be used constantly. Don’t use any drives you don’t want to run into the ground, or that aren’t rated for 24/7 activity.

Right now I’m just using spare drives I have hanging around (I knew I’d have a use for them some day!) so I’m not out anything if they die from the stress.

1 Like

I’m not sure if this would produce significant hard drive activity but I just noticed my node has logged about 12,000 deletes since yesterday. How full is your storage node drive? I wonder if fragmentation is an issue.

1 Like

this is partly why i optimized my own raidz setup for highest possible raw iops.
currently it seems to be around 4% activity across 9 hdd’s

@Teelow
you could try and create another node on another hdd… this should make it so activity is being split between the two drives and thus should minimize load on either hdd.
100% seems high, you might have an SMR drive, no matter what then option of having multiple nodes should help it out a lot… the first 2 extra nodes should be the most beneficial… after that the % of improvement pr drive added goes down. … obviously

@Mark
120k deletes in last 40 hours and still going… and ingress still basically none existing… sadly

1 Like

I have similar issue, but different in terms of how often it happens. What I read above this is not the case.

  • I’m not talking about constant HDD utilization, but high spikes close to 100% once per a couple of days, lasting for a couple of hours.
  • the HDD is not SMR [WD 12TB EFAX, WD120EFAX-68UNTN0]
  • no RAID
  • no significant uploads/downloads/deletes shown at that moments in Logs
  • no other user/app is using the HDD
  • neither the PC or the app was restarted (I saw the app is always scanning whole disk after restart, I would understand that)

My setup is: Windows 10, HDD is located in Synology and connected via 1GB LAN by iSCSI.

See various graphs below. Beside of that, when I copy any files to/from the HDD it does not show such high activity.

I want to know what is storj app doing, why it is scanning whole storage every now and then, again and again. How to prevent it. This is giving the disk hard time.


Screenshots: (if does not work on mobile phones, click it and it will open)

Taskmanager on the PC during the scan:

Taskmanager / processes - it is the storj node:

PC Resource monitor - all traffic point to \storage\blobs… What is it looking for, again and again?

Synology - Realtime Utilization during the storj scan

Synology - realtime IOPS

Synology - utilization 1 day history - you see the extreme spikes

Synology - utilization 1 week

Synology - month

I reckon the charts are self explanatory…

It’s probably garbage collection. The issue is likely be aide you are using is so. You should really be running the node on the Synology itself. Or do you have a model that doesn’t support docker?

On the Dahsboard I can see quite consistently Garbage amount under 100MB. Is it cleaning almost every day? And Does it need to scan/read every single file of the millions of tiny files? It is really giving the HDD hard time.
Yes my Syno DS216play does not support Docker. I wish Storj would create native app for Synology…

I’ve been running storj node for one year, and did not notice such heavy load until two or three months ago. It renders heavy load to the disk while no or very low transfers in/out to the storj network. I need to know if it is expected behaviour. If yes I would consider leaving the program, because from my experience, HDD will not survive long with such load…

1 Like

clearly sounds like it’s some sort of program cycle, never the less it’s a problem…

the larger the node gets the more activity… is the disk full?
ofc if the disk was full it would slow down a lot in some cases… i believe a regular hdd is about half the speed and iops at the closest part of the spindle (disk)
and since a hdd will normally fill from the fastest to the slowest parts, then when a hdd goes past the 50% mark in capacity one will start to feel a significant decrease in performance…

ofc this might not be what you are seeing, because it sounds very much like a programming thing… however the reason that you haven’t seen it in the past may have been due to the disk being faster when having more capacity free.

what can you do… well if you disk is full or close to being full… just add another node so the load is spread over two disk… having 1 node and 1 hdd running is the max load setup.
storj have been working on getting the nodes to run fine on just 1 disk pr IP
but just adding another node will put you at 2 disk pr IP giving you a 50/50% split in load for in most cases…

its not really a fix, more like a patch that doubles you hdd resources…

i know it’s a very blanket all slight gamble solution… but its easy to spend a lot of time fixing a problem that might not be able to be fixed… this way you progress…

ofc if you got 1tb on the 12tb drive it wouldn’t be my choice, but if you got 8 or 10… then i wouldn’t even think twice before going that route.

anyways i’m sure there are as many ideas as there are people…
oh and non of the images seems to post…

if it’s metadata / garbage collection… some ssd caching might help… if thats an option…

I noticed it does not show images on mobile phone, but if you click it will show.
Eventually, they are all stored on http://brite.cz/data/

I agree. But my HDD is 12TB marketing = 10TB real, from which 6TB of storj and 2TB of my own data, and around 2 TB free.

well, the load is not generated by upload/download, so adding another node will not influence that behaviour, I believe.

From the behaviour, I believe it is programmatic issue within the storagenode.exe, either planned or unplanned

my Synology cannot do that, it is pretty old, nearly basic model

I would like to get response from storj development whether or not is this built-in function and how to avoid so frequent entire disk scanning

1 Like

Please have a looks at this topic where responses from Storj are already available.

even if you don’t consider your hdd to be working much because of it getting a couple of mb egress and ingress… then it might mean a lot more than you think…
the iops load can be quite significant, and thus splitting the load over two nodes should greatly reduce any issues… by simply reducing the storj workload by 50%
on top of that the garbage on each would be half… so if you are seeing an issue of 100% utilization in 1hour, then adding another drive might cut that down two half just by the load being split… however you also double your overall iops, and thus the disk / disks might spend less time seeking, which can be a large % of the overall workload / latency when running at 100%

thus 1 hour a day might go down to 10minutes across both… or it may remove it all together…

sure it’s not a perfect solution… and in regard to storj development… brightsilence is very knowledgeable on the inner working of storj.

so listen to him and see if you cannot resolve the problem with a software fix, but it’s most likely some sort of io bottleneck…

if you are sure its storj thats causing it… and not because you have windows and it tries to index the blobs folder, or that your antivirus tries to scan the blobs folder…

there are a few pitfalls that will crush performance because some things like virus scanning the blobs folder is … demanding to say the least

it’s very common to run multiple nodes, thus single node setups gets a serious punishment…
i think my avg on the pool that is mainly used by storj is 200-400 iops
which is about 50-100% of what most hdd’s will do, but my node is also 14tb… so would be a bit more active than yours is…

Thank you for the link, will check that