Blueprint: Sparse Order Storage

jtolio · May 29, 2020, 10:15pm

We’ve recently published a really exciting blueprint here: https://review.dev.storj.io/c/storj/storj/+/1732/3/docs/blueprints/sparse-order-storage.md

This blueprint will make a huge impact on Satellite performance, cost, and operational simplicity. The current naive system of keeping track of serial numbers, used or unused, is a massive amount of load on our Satellite databases, to the point that it accounts currently for 99% of all of the data in the Satellite DB. The plan below seeks to eliminate terabytes of data storage in used serial tracking in Postgres in exchange for a single hash per node.

The core idea for this blueprint actually comes from the field of certificate revocation, and we think it’s pretty neat. This will make Satellites much easier to run, which is important as we start to think about making community-hosted Satellites a broad reality.

What do people think about this plan?

BrightSilence · May 31, 2020, 8:38am

I had to sit down with a good cup of coffee for this one.

We don’t charge or pay for uploads! Uploads cost and pay out $0. So let’s simply
stop tracking bandwidth usage for uploads.

Would this mean upload bandwidth will no longer be available in node db’s, dashboard and API as well? Or is this tracked separate from orders?

Windowshave a defined expiration time for new submissions based on the window id, but the window state per node will be kept in the Satellite database instead of the storage node bandwidth rollups table.

I guess a similar question, does this apply only to window state or will this impact reporting data available to the node in bandwidth rollups?

Storage node bandwidth rollups are as simple as querying the bandwidth totals
out of the signed state structures per node per window.
Bucket bandwidth rollups will continue to work the usual way.

I think that answers my question. I should have kept reading first, but slight adjustment of the question then. Will this have a possible impact of the timing when these rollups will be available, which could lead to delays in reporting?

This looks like a solid solution to the issue of a growing database and delays in processing! Most important bit of text O(log n)! Gotta love seeing that.

Odmin · June 2, 2020, 10:56am

So, if you have a plan to move back to Postgres I can recommend this guide for performance tuning/improvements, I think you already know almost of described here but may be it will be helpful.

zeebo · June 2, 2020, 3:41pm

Yeah, upload orders may stay around for reasons like this, or other issues w.r.t. DoS stuff. The goal is to remove them while keeping all of the features they enable, so hopefully that can happen. They won’t need to be submitted, though.

If it helps, you can think of every window as already existing with a value of 0 and they get incremented when orders are submitted. We’re just changing which window orders get put in to from using submission time to using order creation time.

The reason this is happening is because we need to have the satellite and storage nodes agree on which window an order goes in so that the appropriate proof can be generated and verified. Using the timestamp of the order creation to be the bucket the bandwidth is accrued to solves this.

BrightSilence · June 2, 2020, 3:45pm

Got it! Makes complete sense. Thanks for the reply.

jtolio · June 4, 2020, 10:41pm

Just an update that we realized there is a massive simplification possible to this whole thing that eliminates the need for complicated sparse merkle tree data structures entirely.

We’ve updated the design doc. https://review.dev.storj.io/c/storj/storj/+/1732 (readable version at: https://review.dev.storj.io/plugins/gitiles/storj/storj/+/42b52511f120dd39ab94470945ddc74aa53af89e/docs/blueprints/sparse-order-storage.md)

BrightSilence · June 5, 2020, 7:25am

That does seem a lot simpler!

What wasn’t entirely clear to me though, is how the node determines when the orders for a specific window are submitted. My question is mostly, how does the node prevent delays in order submission while also ensuring all orders for that window are included. (including some possibly long running transfers)

Pentium100 · June 5, 2020, 7:45am

The Storage Node will stream to the Satellite all of its Orders for that hour, and the Satellite will result in one of three outcomes: the orders for that window are accepted, the window was already submitted for, or an unexpected error occurred and the storage node should try again.

What would happen if the node started submitting orders and then crashed/lost connection/etc? Would that make the window “used-up” or would the partial submission be disregarded?

nerdatwork · June 5, 2020, 1:49pm

I don’t think there is partial submission allowed since

Submitted Orders for an hour is an all-or-nothing process.

It will be retried in next window and has 48 hours to submit that order (before its dropped?)

@jtolio I would like to know how the situation where SN is failing to submit orders repeatedly is handled when its past 48 hours.

In current proposal it seems the orders past 48 hours will be dropped rather than have a sliding window mechanism to make sure those pending orders are submitted in rare exceptional cases.

BrightSilence · June 5, 2020, 3:09pm

So, the node is not allowed to crash or lose connection?

The question what happens in such occasions is valid. There would need to be something built in that verifies the order batch is complete.

nerdatwork · June 5, 2020, 3:12pm

I am not invalidating anything, merely stating my understanding of the document which anyone is welcome to negate or approve.

jtolio · June 5, 2020, 3:57pm

Storage nodes already reject requests from uplinks that have orders that are over an hour or so old, so once a window is over an hour old, it is safe to submit it to the satellite, because no new orders created in that hour will be accepted afterwards

Yep, as @nerdatwork said, partial submissions will be disregarded, specifically, only complete batch orders (where at the end of the batch the storage node indicates it is done sending) will trigger processing on the Satellite.

It is already the case that if an order is older than 48 hours, the satellite will not accept it. We do this to avoid a number of other security issues, so it’s not as simple as doing a sliding window. So, those orders will simply be rejected, and the storage node should stop retrying.

BrightSilence · June 5, 2020, 4:04pm

Thanks for the responses!

Looks great and I hope it will help relieve the processing delays on saltlake.

Pentium100 · June 5, 2020, 4:14pm

What happens if the customer has a slow connection and takes more than an hour to upload/download the file?

nerdatwork · June 5, 2020, 4:20pm

Could you please elaborate those issues if possible ?

As far as I know, currently there is no mechanism to monitor how many orders are dropped in the network. How many storage nodes are affected by dropping of these orders or how many orders are dropped per satellite.

Dropping order equals not paying SNO for a legitimate order. I am the only SNO in the network who has this issue (as per support) and would like to know if any solution could be implemented to fix this. I understand miscreants will try to take advantage of any loophole but this shouldn’t be used as a means to drop legitimate orders imo.

BrightSilence · June 5, 2020, 4:35pm

There is a status field in the orders.db file. But without the decode of the status numbers I can’t tell whether that could show us orders that have been rejected or expired before sending. For my nodes this status is 1 for all orders. Which I assume means it was sent and accepted by the satellite. If someone feels like it they can dig through the code to find out what this status means, but unfortunately I don’t have time for that myself right now.

jtolio · June 5, 2020, 6:15pm

This is a good point - we’ll need to specifically make sure we consider that requests must start within the hour, and then not submit orders for a window until all of the requests that started within that hour finish. Good point!

The challenge with a system this large is I remember that we have at multiple times had different reasons for both having this window and then for shortening it. I don’t currently remember all of them, perhaps @littleskunk has some to add. The main reason IMO is for our usage limit implementation.

We have two bandwidth usage counters - allocated and used. Whenever a download is requested, we allocate some bandwidth measure on the satellite. The uplink doesn’t necessarily need to use this bandwidth (it picks which nodes it wants to use, canceling some due to long tails), so we need some way of clearly invalidating allocated but unused bandwidth. The best way to do this is have a time deadline for when allocated bandwidth expires. This allows the used bandwidth measure to accurately reflect against the bandwidth limit older than a certain age. If we were to allow orders to be submitted at any time indefinitely, we would never be able to say that usage past a certain age is a complete accounting. This would complicate not only the realtime project bandwidth limits, but also our invoices, payouts, etc.

So, that’s why we need orders to expire. Then there’s a question of how soon they should expire. There are tradeoffs here, and we used to have it set to a week. Having it be a week again caused implications for payouts, invoices, etc, and we discovered that almost all SNOs were able to submit orders well within 48 hours, so 48 hours seemed like a more than adequate grace period. We ask SNOs to try and target 99.5% availability, so downtime of more than 48 hours is already something we anticipate disqualifying a node for.

All of the above plan is independent of bugs though! If we have bugs you’re hitting we should totally fix those.

One goal we have with this project is to improve the situation here, both in the handling on the storage node side of expired orders, and in the monitoring (we do definitely have satellite-side monitoring on this, but it doesn’t currently make it easy to group by affected storage nodes).

@nerdatwork - is this an ongoing issue for you? or does your database have old, expired orders in it? Are new orders still getting submitted after they’ve expired?

nerdatwork · June 5, 2020, 7:46pm

Just to add for non alpha SNOs, this was set at 45 days then to 7 days and now for 2 days.

This shows there are SNOs that are dropping orders for some reasons. How many SNOs are there in the network that face this issue ?

Is satellite keeping track of failed orders historically ? Are these expired orders purged from databases once accounting (invoices/payouts) is done? I am trying to understand can Storj count the number of orders expired in hour/week/month/quarter to see if this is a large scale issue.

Few orders getting dropped for a SNO with few TBs would mean nothing but smaller SNOs with 500GB would lose a lot. This will force them to quit the network due to less payout owing to dropped orders. Personally I am against non-payment of any legitimate order but I understand we are trying to gather more info about this and finding our way to a solution (knock on the wood).

Yes. I had a support ticket for this which was open for 6 months and closed without a solution. I was told to make a forum post which will help gather more info on the issue from other SNOs.

Yes. I have taken a backup of orders.db which had 1423 orders dropped but no way of grouping by satellite.

I am sorry but I don’t understand this question.
I think you mean are new orders getting submitted after old orders are expired - in that case, yes. Thankfully my power company is smart enough to cut my power at random times in a day thereby forcing my node to resubmit those orders before their hour is up.

BrightSilence · June 5, 2020, 8:55pm

There is a satellite id in the order_archive_ table. To make it semi readable you should use the hex() function. You can group by that.

Here’s the decode for which satellite is which that I use in the earnings calculator.

CASE hex(satellite_id)
   WHEN 'A28B4F04E10BAE85D67F4C6CB82BF8D4C0F0F47A8EA72627524DEB6EC0000000' THEN 'us-central-1'
   WHEN 'AF2C42003EFC826AB4361F73F9D890942146FE0EBE806786F8E7190800000000' THEN 'europe-west-1'
   WHEN 'F474535A19DB00DB4F8071A1BE6C2551F4DED6A6E38F0818C68C68D000000000' THEN 'europe-north-1'
   WHEN '84A74C2CD43C5BA76535E1F42F5DF7C287ED68D33522782F4AFABFDB40000000' THEN 'asia-east-1'
   WHEN '7B2DE9D72C2E935F1918C058CAAF8ED00F0581639008707317FF1BD000000000' THEN 'saltlake'
   WHEN '004AE89E970E703DF42BA4AB1416A3B30B7E1D8E14AA0E558F7EE26800000000' THEN 'stefan-benten'
   ELSE '-UNKNOWN-'
END satellite_name