This is a good point - we’ll need to specifically make sure we consider that requests must start within the hour, and then not submit orders for a window until all of the requests that started within that hour finish. Good point!
The challenge with a system this large is I remember that we have at multiple times had different reasons for both having this window and then for shortening it. I don’t currently remember all of them, perhaps @littleskunk has some to add. The main reason IMO is for our usage limit implementation.
We have two bandwidth usage counters - allocated and used. Whenever a download is requested, we allocate some bandwidth measure on the satellite. The uplink doesn’t necessarily need to use this bandwidth (it picks which nodes it wants to use, canceling some due to long tails), so we need some way of clearly invalidating allocated but unused bandwidth. The best way to do this is have a time deadline for when allocated bandwidth expires. This allows the used bandwidth measure to accurately reflect against the bandwidth limit older than a certain age. If we were to allow orders to be submitted at any time indefinitely, we would never be able to say that usage past a certain age is a complete accounting. This would complicate not only the realtime project bandwidth limits, but also our invoices, payouts, etc.
So, that’s why we need orders to expire. Then there’s a question of how soon they should expire. There are tradeoffs here, and we used to have it set to a week. Having it be a week again caused implications for payouts, invoices, etc, and we discovered that almost all SNOs were able to submit orders well within 48 hours, so 48 hours seemed like a more than adequate grace period. We ask SNOs to try and target 99.5% availability, so downtime of more than 48 hours is already something we anticipate disqualifying a node for.
All of the above plan is independent of bugs though! If we have bugs you’re hitting we should totally fix those.
One goal we have with this project is to improve the situation here, both in the handling on the storage node side of expired orders, and in the monitoring (we do definitely have satellite-side monitoring on this, but it doesn’t currently make it easy to group by affected storage nodes).
@nerdatwork - is this an ongoing issue for you? or does your database have old, expired orders in it? Are new orders still getting submitted after they’ve expired?