Doubt with changing hard drive

I have two nodes on the same computer.

The largest node is 8.6 Tb. It’s on a 14TB hard drive.

The smallest node is 1.26 Tb. It’s on an 8TB hard drive.

Hard drives are shared to store more things. There is space available for the node to grow.

The large 9.6 Tb node had responsiveness issues. I set it with “storage2.monitor.verify-dir-writable-timeout: 1m30s” and “storage2.monitor.verify-dir-readable-timeout: 1m30s”. With this configuration the response problem was solved.

What do you think would be better? what dou you recommend ?

1 Leave it as is.

2 Eliminate the small node, keep the 8.6 Tb node, with the 14Tb hard drive. Delete the 1.26 Tb node, use the 8 Tb HDD for something else.

3 Put both nodes on the same 14TB hard drive.

4 Change the hard drive of the 8.6 Tb node to another smaller 8 Tb hard drive that I have. Leaving the small 1.26Tb node on the same 8Tb disk. The large node would have to be reduced to less than 8Tb to make the disk change.

5 Figure out what the bottleneck you hit on the larger node is, and then remove it.

2 Likes

I think the biggest node bottleneck is the hard drive. The hard drive is not capable of responding to so many data input or output requests.

I would:

  1. check the 3 drives if one ore more is SMR tech. check smart data.

  2. defragment the big node with ultradefrag while set to full. (.yaml setting to 8.6TB restart)

  3. buy an small ssd (512GB?) move the log, orders and databases of all 2 nodes to the ssd.

  4. wait 14month and set up an 3rd node for vetting with the 8gb spare. (or move the non-storj stuff there.)

  5. Profit :grimacing:

p.s.:
(2 nodes on one disk is against tos)

1 Units are not smr.
2 I was told not to defragment the drive. Which is not good for storj.
3 You could move the registry and database to the system ssd.

I have been looking into moving the logs and database to the ssd.
If the records and the database are lost, is there a problem?
Last year the computer’s ssd stopped being detected. I sent it under warranty and they refunded my money. Data from ssd was lost. The imported data was saved on another disk and I did not lose it.

4 I can wait a few months. I don’t think that in 14 months the nodes will finish filling the disks. I think I have a lot of capacity to fill the disks in the same IP address.

Would moving and deleting non-storj data to another disk improve access to the storj disk?

Loss of statistics for dashboard.
Maybe the node has to be restarted with the new place for dbs in case of ssd failure.
Nothing to bad. Node can survive this if detected by uptimerobot.

If moved away the data is propably more safe. Data disk will go broken anyway.(hopefully far away in the future.)

And yes. Propably best effect is defragmenting after move of dbs and nonstorj data hapened.

In windows defrag is recommended. In linux its optional. But especialy with ultradefrag free version there is a gain of speed to the filesystem. Will run for days/weeks in the background. Not disturbing the node much.
Even if you cancel it after a week most files are defragmented. And it prevents further fragmentation.

Greetings

1 Like

Registry and database. All the files in the root of the disk where the node is? Are there hidden files?

Does windows defragmenter work fine?

In this forum they have recommended me not to defragment the disk.

What do you mean with registry?

For moving dbs there are topics here in the forum.

No hidden files

Basicaly
-stop the node
-move dbs
-announce path for dbs in config.yaml with notepad++
-start the node
-go to logs and dashboard and check if working.

Windows defrag works fine. It does basic defrag.
Ultradefrag is way better and free.

Who? Where?

Please stop defragmenting NTFS. there are no benefits. Small files are already stored within MFT. MFT does not get fragmented as long as you don’t fill the disk to the brim. (You should not do that with any FS). Defragmenting is a waste of time (and, if you have snapshots, space). It’s a “feel good watching blocks gets moved around” snake oil.

1 Like

as you see, opinions are different here :smile:

1 Like

But the beauty of the technology is that opinions don’t matter, and facts are verifiable. I did not see any verifiable data from those posts beyond the pacebo effect – “node feels faster”.

Logic, on the other hand, does not support any defragmentation benefits.

Defragmentation helps in two way, both of which are aimed to minimize need for seek and eliminate seek latency:

  • Consolidating MFT – this shall never get fragmented if you keep 10% of disk free
  • Consolidating chunks of single file together so that disk does not need to seek mid-read, thus allowing you to achieve maximum datasheet promised sequential throughput on large files located on the outer disk tracks.

This latter point is irrelevant for storj: vast majority of files are smaller than 4k – i.e. they cannot get fragmented by definition, and the rest are smaller than 16k; provided that file allocation size quantizes by the sector size, and you have sufficient free space, only fraction of that fraction ends up fragmented.

Moreover, with the node operation, sequential read is never a bottleneck – node does so little of it it’s not even worth the bytes this message consists of. Random access, driven by customer requests, is a bottleneck here, in addition to the database usual file locking and updates.

Hence, defragmenting the disk may have a small positive effect (you still have metadata seek, and only eliminate mid-read seek) on few thousand of files, and have no effect on many million of smaller files. At the same time, probability that those files will get accesses is therefore just as tiny, so whatever small benefit could be gained – likely won’t be.

These are verifiable facts: you can build a histogram of file sizes for your node yourself, and compare it with the sector size, and do the same mental experiment. BTW, some defragmenting tools report how many files are fragmented, and then how often are those files accessed by the node:- that would be another way to confirm that running defragmentation is pointless for any reason except seeing nice blue cubes neatly aligned in the window (I recommend Tetris, as a replacement app to satisfy that craving)

Drawbacks of defragmentation are real: if your filesystem supports and contains snapshots – you are (sometimes, dramatically) wasting space by defragmenting. During defragmentation your disk is IO limited, hindering performance of everything else that needs it.

How to improve random IO performance then? It has been discussed, from using tiered storage solutions, or zfs with special device, where MFT and small files should end up on SSD, to filesystem tweaking to eliminate random IO completely, such as turning off SYNC, and increasing system filesystem buffer sizes (including carefully tweaking transaction group size for array performance on ZFS).

Various vendors will be happy to sell you defragmentation tools, but that is snake oil: even if that helped, the benefit is so minuscule, that it’s not worth doing it. Moreover, modern filesystem don’t fragment as the old ones used to. The key however to keep 10-15% of space unoccupied. To be clear, even if you don’t do that for the storagenode – it won’ matters, as vast majority of the node files won’t be fragmented in the first place, partly due to being stored in MFT, in case of NTFS.

In Windows GUI setup you likely have your identity on the system drive in the %AppData%\Storj\Identity\storagenode folder, by default the wizard will setup the node on your system drive too, so the folder orders will be in %ProgramFiles%\Storj\Storage Node\orders.
We usually recommend to copy/move your identity to the data location and update it in the config.yaml, then restart the service.
You may also move orders to the data location too, but the node can survive without them and will create this folder if it’s lost. The only issue with the orders folder, that you can lost some unsent orders, and will lost a payment for them.
Databases can be recreated, if they lost, they contains stat for the dashboard. So, if they lost, you will lost a history, but payments will be safe - they calculated on the satellite.

defragmentation on NTFS not only places clusters of the one file close together, it also defragments a free space, so most of pieces will be placed together, thus seek time is shorter and upload and download a little bit faster. Usually this is enough to give to disk a chance to finish write or read within 1m timeout even during a high load like a Garbage Collector combined with a filewalker and customers activity.

So, I would repeat, the defragmentation for the data location on NTFS should not be disabled.
For ext4 the picture is different and it works better for the node’s pattern a little bit and doesn’t require a defragmentation.

NTFS doesn’t support them, unlike zfs or BTRFS. There is ReFS, but it’s not durable at the moment.
All other uses an underlaying management like LVM. Unfortunately under Windows you may have a snapshot only for virtual disks or in VM management system, if your Windows in the VM. I do not consider File History as a snapshot in the same meaning as in zfs or LVM, it’s more like a backup. The only close option is to use shadow copies.

This would be the case if node read data sequentially. it does not, so location of the files does not matter.

Furthermore, seek time is dominated by the rotational latency (waiting for the sector to fly by) rather than positional (moving the arm with heads). In this context whether next files is right next to the previous one is irrelevant, unless it is read right away. Which for storage node is virtually never the case.

And remember, storj median file size is very close to (on th border of magnitude of)the sector size, so potential for file segmentations is minimal to start with.

It’s easy to verify – use filemon, and see how many fragmented files per second does the node read.

Yes, I was referring to shadow copies. See vssadmin create shadow command – it creates an actual snapshots, that can be even be made persistent.

I would just suggest to run your own node on Windows and disable a fragmentation. When it would be more than 1TB, you likely will start to see timeouts from time to time.
This is not a theory, this is a simple practice.

2 Likes

Fragmentation matters a lot for database files. From my observation I/O related to database writes actually overshadows I/O related to pieces themselves. Though, a vacuum would still be more preferable than defragmenting, as vacuum both rewrites the database files from scratch (likely defragmenting them), and removes unused parts of the database files (which defragmentation cannot do).

I think the main point here is that both defragmentation and vacuuming can provide just a marginal improvement at best, compared to a drastic measures, like accelerating access with SSD, or tweaking other parameters, like sync writes.

In other words, if optimizing disk IO is the goal - first solution that provide massive impact at low cost shall be exhausted, before dealing with changes that provide small incremental improvement at significant cost. And in most cases, after implementing the former ( sync off, or SSD cache, etc), the latter (defragmentation, vacuuming) are no longer relevant (or applicable or worth doing, let alone necessary).

Can’t agree with this statement. Defragmentation and vacuum are tools not to improve metrics like latency, but to prevent their regressions in long-term operations. As such, they have different use cases. You cannot operate a large node on a single I/O-limited device, but fragmentation will also block you from operating a small node on a single I/O-limited device that would nominaly be enough.

In my own case implementing regular vacuums allowed me to operate a node on just HDD for a long time without taking costs of additional hardware. My belief is that vacuum should be run by default on node startup, like the file walker.

3 Likes

That’s fair. I would not operate a node on a such a device to begin with… but people do do that.

I am going to activate the defragmentation of windows.
When the big node gets empty enough I’ll switch to the 8Tb disk.
Thank you so much.