Assuming ext4, and assuming that the source and target directory pages are already in cache, which I think is reasonable here, that’s two writes to directories + a write to the journal.
Multiplying 3 by 439561 and dividing by 250 IOPS, that’s 1.5 hours of busywork for the HDD.
Immediate removal is worse: a write to the source directory, removal of an inode, marking data pages as free in the bitmap, maybe (though unlikely for storage nodes, as files are usually not fragmented much) removal of a separate extent tree.
GC makes it better because some of these writes can be batched, making a single write for many files. And it is done in a thread with low priority, as it should be. Though, the current implementation of that low priority thread is suboptimal, so not sure whether these gains are realized