Downtime for increasing disk space

Hello,

I was wondering if it’s a problem if a storage node is down for about 4-6 hours to increase the partition size that’s used for storj?

And if so, how damaging would it be for the node’s reputation and about how long would it take for that reputation to recover from it?

From what I recall, at launch you’ll only get about 5 hours of monthly downtime before your node is quarantined/disqualified.

At this point in the beta, this is not currently enforced so this won’t be an issue right now. However, you should look into ways to reduce or eliminate this downtime as 4-6 hours to grow a partition is not reasonable in a production system.

I’ve done all kinds of migrations on Linux that include adding RAID/encryption to existing volumes or moving volumes to different storage devices with no downtime. Perhaps if you tell us what process you will perform, we could suggest a no-downtime alternative – or at least explain how to use this downtime to prepare your system for no-downtime storage changes in the future.

One way I could eliminate downtime for increasing storage space would the a option to specify multiple data directories.
I currently use VMware ESXI on my dedicated server and allocate storage as needed, and if I would want to increase a already-existing partition with something like GParted, it’d have to go through all of the data, and that can take quite a while unfortunately. While if there’s a option to specify multiple data directories, I could just create another partition without downtime and add it to the VM without issues

If it’s an ext2/3/4 volume then this can all be done online. VMWare should support online growing of virtual disks. The VM should then notice the disk capacity change.

Inside the VM (presumably this is where storagenode is running?) you can increase the size of the partition in the partition table and then resize2fs on the partition. It will indicate to you that online resizing is required because the volume is mounted. I do not use (g)parted but it may even do all of this automatically for you if you ask it to resize the partition.

Once resizing is done, the new space will be immediately available on the mounted volume. Then you just have to recreate the storagenode container with an increased STORAGE capacity value.

1 Like

Ah didn’t know there was a way to increase a partition without taking it offline.
Will give it a try later this weekend. Thanks!

No problem. Note also that using LVM for all volumes makes things just as easy even in much more complicated setups. With LVM you can move an in-use volume to different underlying storage. See pvmove.

Yeah I think I chose to not use LVM on this VM for some reason…
Will see if I can convert to it by using https://github.com/g2p/blocks. Only afraid that I might run into data corruption or some kind of failure, and currently don’t have enough storage to do a backup of this disk.
Think I’ll wait for black friday and get a server with bigger storage first as I was planning to do that anyway, and then just migrate to that and then do a full backup and convert it to lvm

As in other threads, I highly recommend that LVM filesystems be limited to a single physical drive. There’s a temptation to simply add an additional drive onto the volume group in order to add more space to a single logical volume. It’s very important to remember that if a logical volume spans multiple physical drives and any drive within the volume group fails, the entire logical volume is lost.

So…

  • RAID applied across multiple drives first and then space divided via LVM is OK.

But

  • LVM applied directly across multiple drives is not OK.

Yep, this is what I do.

Right. I always set the VG allocation policy to cling for this reason, and double-check the output of lvs -o +devices after making any changes to ensure that volumes do not span PVs.

1 Like

I don’t doubt your method or expertise. I’m just attempting to clarify best practices and possible problems for individuals reading along who may not already understand how to utilize LVM effectively.

I’d be reluctant to use “blocks”, it still talks about Python 3.3 and the last commit was five years ago.

I’m doing the same thing right now, but why multiple hours of downtime? Just set up the new disk, rsync onto the new disk, do it again to pick up changes; shut down your node, do it yet again with --delete, restart with the new disk, done.

Yes a storage node that balances across multiple volumes would be cool, but if you need multiple disks you can easily use LVM or btrfs.

I’d caution against using multiple disks for a storage node. The official advice is to run a second storage node instance with a new identity for each physical disk. (You can even run them on the same physical host easily using Docker.)

If you absolutely want to use multiple disks for a single node, RAID5 makes the most sense as only one disk is “wasted” providing redundancy.

RAID0 appears at first to be the most cost-effective approach, but each additional disk dramatically increases the risk of total volume failure. Any time a disk fails you would have to start over from scratch with a new identity and no data stored, and go through the vetting process again. RAID5 allows you to avoid that scenario at the cost of one disk’s worth of data. (Note that an LVM logical volume spread across multiple PVs has the same failure risk as a RAID0.)

But going back to my first point, running a node per physical disk has several advantages:

  • You can fully utilize each disk, wasting no space for redundancy, which is already built-in to the network.
  • If one disk fails, only that node is affected. The other nodes don’t lose their storage or escrow.
  • You can use disks of different sizes without wasting space. (RAID uses the smallest PV size to determine the size of the array. LVM in linear mode can use multiple disks of different sizes but there is no redundancy in this mode.)

Interesting. I thought it was actually not recommended to run multiple identities on the same server, as some may use that to “game” the system to get more traffic and such

It was not allowed during early stages, but that changed. You can’t really game the system as the IP filtering in place prevents multiple nodes behind one IP to get more data than a single node. So there is no advantage. The current advise is to run one node per HDD. But only start a new node when the ones you already have are starting to fill up.