High read load and upload cancelled on a full system

Alexey · December 14, 2023, 3:05am

symlinks are not supported, you may destroy your node. The docker’s paths approach may work, but you need to have a complete copy of the blobs folder, and this is usually 99% of data, and it’s still too dangerous - it’s very easy to make a mistake with any of the content and your node will be disqualified pretty fast.
The offline is less dangerous and you have 12 days before suspension, then you will have a week to bring it online.

NodeOps · March 4, 2024, 12:26pm

Just to wrap this thread up…

fsck was much faster than it seemed to me! and copying data was much slower. Seems like I got the numbers switched, but who knows. I the end copy was no faster than 1-4 MB/s, around 2MB/s average i believe. But I also tried different way to do the copy, and had lockups/reboot, so I had to restart the process a few times.
I ended up using rclone to copy the data. I copied from the destination host, as I had lots of issues with source.

This is the way I did it. The first part after sync is my rclone config, so create that first using
rclone config.

This is my config:

--------------------
[Storj1]
type = sftp
host = 192.168.2.10
user = root
pass = *** ENCRYPTED ***
disable_hashcheck = true
--------------------

This is the command I used.
rclone sync Storj1:/Storj /mnt/Pool1/TempPool/Storj -P --log-file=/mnt/Pool1/TempPool/rclone-sync.log --log-level DEBUG --log-format=pid --ignore-checksum --cutoff-mode soft --max-backlog=999999 --checkers=10 --exclude='**/trash/**' --exclude='**/archive/**' --exclude='**/temp/**'

I liked the logging as rclone “gui” some time seems to lock and display wrong performance data at least. I disabled checksum everywhere I could/was asked to sure if it did anything, but better than nothing I guess ( and Storj is in general responsible for data is valid ). Setting a large backlog give me a more realistic idea of when rclone is done. Checkers worked when I used rclone copy, but might be broken with doing sync, at least I only got 4 checkers running and not the 10. and lastly I excluded data which in some form or sense is garbage, not need to copy that.

I switch from rclone copy to rclone sync and the speed of copying seemed faster and work sort of like robocopy. There is no resume transfer, but running the command again it would just sync what had changed since last.

So once everything was copied over. I shutdown docker, check the storagenode wasn’t running and started rclone again for at last run. The downtime was 3-4 hours.

The config from running docker and running it on freebsd/jails is a bit different. So I had to copy the certificate/identity files manually as they are placed elsewhere than the rest of the files.

I did the switch over today and the node is running again But it also took around 3 month, plus 5 min to change the network config . I wondered a few times if I wouldn’t be easier/cheaper/faster, to just buy a new disk and do an offline clone of the disk. Would have been faster at least(!), and had I needed a new disk I would likely have gone that route. I “lost” around 1/4 of the data, for reason that are unknown to me. Could just be some one deleted a lot of data and it was on my node or something else ( doesn’t really matter, as I saved most and will likely have a full drive within a few months )

Now all there is left is to clean the “old” drive and start copying data back and eventually do a switch over back to this disk again, this time with only data on it and no OS.

Alexey · March 6, 2024, 8:39am

Sorry… But why rclone and not rsync? Did you copy data to the external storage, like, for example, Storj?

NodeOps · March 6, 2024, 9:41am

Why not? rclone seems faster given the vast amount of small files and can do multiple tasks in paralle, both transfers and checkers. As the config says “SFTP”, so just ssh from one node to the other, nothing fancy