It will only write that file on a successful run. You don’t have to go to extreme values. A short test duration of at least 30 seconds is all we need. If you can zip the trace file for that one we can look into it and see if there is something left to optimize. Based on your dd output I would guess there is not much room for improvement but if you like I will pass it over to the developer team.
And one more detail. The benchmark tool has some overhead by its own. It needs to sign orders and so on. That part might consume some extra memory and cause crashes. The benchmark isn’t designed to run for hour long work loads. The scale for this one is more like minutes maybe up to a full hour.