Important change in the traffic accounting system (customer and storage nodes)

Lets start with some basics. How does the satellite tracks the download traffic? How does it know what the customer has to pay and what the storage node payout should be? The accounting for used space has not changed and I will not talk about that part. Lets focus on the traffic accounting and explain how that works, what has changed and what are the consequences.

How does it work
Uplink would like to download a file. It requests an order limit from the satellite inclusive a signature. The uplink creates multiple orders with different sizes, signs them and sends them one by one to the storage node together with the order limit. The storage node has to deliver the data in order to get the next order with a bigger size in it. The order limit from the satellite is the guarantee that this download was authorized and will get paid by the satellite.

What has changed
What happens if a storage nodes sends the same order more than once? The order limit contains a unique serial number. In the past (v0.29) the satellite was keeping track of the submitted serial numbers and will reject duplicates. The database queries are way too expensive for different reasons. We changed the way the satellite is processing orders. Instead of rejecing duplicates we are doing a blind insert into the database. If the serial number already exists it will get overwritten. The performance improvement is significant. The problem is that orders are valid for 7 days. A storage node can submit the same orders each day and it would get accounted each day. To solve that problem the accounting will ignore every order that hasn’t expired yet. The accounting gets delayed for 7 days / shifted by 7 days. The download traffic doing the last week of a month will not be included. Instead it will get shifted into the next month.

What are the consequences
With v0.30 the traffic accounting for customers and storage nodes are delayed by 7 days. v0.30 is already deployed on Stefans satellite but not on the other 3 satellites. You will notice the 7 day shift on the storage node webUI and on the satellite webUI.

Open questions
Finishing graceful exit on the last day of the month will result in a wrong hold back amount. The payout of the last 7 days will be missing in the hold back calculation. An easy way to fix that would be to increase the graceful exit requirement to 9 month. The missing 7 days will get paid in the next month with 0% hold back amount. Should we increase the graceful exit requirement or accept that the hold back calculation might be a few cent off?

6 Likes

Thank you for this informative and well written post @littleskunk
I will pin it to the category, so people can see this news quickly!

1 Like

I know this is probably a little too late, but have you considered simply delaying payout and charging by 7 days in order to be able to take duplicates into account. This would keep accounting simple because all payouts are still related to exact calendar months and it would also fix the graceful exit problem. Essentially that’s what your doing for bandwidth already with this solution. Except instead of delaying the payout you shift the accounting period. To me it would make much more sense to get the payout a little later but stick with the same accounting period. That way you don’t have to make decisions on how the dashboards would display bandwidth use with awkward shifts etc. I fear it might be hard to explain to both SNOs and customers.

7 Likes

Just change the reporting period.

Sounds a bit twisted… I foresee tons of SNOs asking what’s going on with payouts… ^^’

I think that whatever the path you go down, it’s very important that the node dashboard reflects precisely what will be paid. If not, everyone’s gonna get confused…

If the last week isn’t included, maybe the dashboard should start its charts on the 3rd week of each month.

… Hmm. That doesn’t sound like a good idea TBH… :confused:

1 Like

Is it correct? (month or days?)

9 months would take it out of the period for which escrow is held back. The last 7 days of month 9 will no longer have any held back amount. So all of it will be paid out in the next month.

it sounds like an uplink could use the same order any number of times (and receive the same data) and the operator is only paid for the first use of that order.

1 Like

I can’t follow you. How does a change on the satellite side affect which requests storage nodes will allow or reject?

I re-read the description of the change, and I misunderstood the first time around. I was wondering how nodes handle duplicate reqeusts from uplinks, but this change in the satellite looks like it has no effect on accounting (insert and reject dupes, versus always insert with overwrites)

1 Like

thanks for letting us know you re-read it @donpdonp – sounds like everything is copacetic now? :slight_smile:

I am also inclined to interpret the text above as well as @donpdonp . It sounds so that some part of our Egress was not first taken into account in one way, but now in another.

That’s not the case. The exact same thing is taken into account. This change just changes when the satellite checks that nodes are not claiming credit for a transfer more than once. This is the situation before and after.

Before
Every time the node sends orders to the satellite it checks whether this order wasn’t previously sent. This requires a lookup in the database every time a node sends orders. Which uses a lot of resources on the satellite.

After
Rather than performing the lookup the satellite writes all orders to the db, overwriting any order with the same serial number. If your node would have sent the order on December 31st, it would be counted for payout in December. But because the order won’t expire for another 7 days, the node could send the order for the same transfer again in January and it would also be counted for payout in January. Nodes could use this to cheat the system, hence the change to ignore orders that haven’t expired yet. These will now be paid out in the month in which they expire.

So the only real change is when orders are paid out, not which transfers are or aren’t paid. It’s only a change on the satellite side. This wasn’t specifically said, but I’m pretty sure nodes already check that serial numbers for orders haven’t been used before and will still do so after this change is implemented. If an uplink would request a download using the same serial number as used before, the node will reject it.

That said, I still maintain that it would be a lot easier for accounting if instead of not paying for traffic in the month it happened in, payout would simply wait for the orders to reach their expiration date before calculating and processing payouts and charging customers.

5 Likes

@BrightSilence thanks for the explanation. I absolutely agree with you that payments should be made somewhen around 8-9 of the month so all this can be taken into account

1 Like

Short update:

  1. Short term: Decrease order expire time to 24h. This will reduce the delay to 2 days. It will still be annoying in the billing UI. I can get this change into v0.31 and we can hopefully deploy it next week.
  2. Mid term: Storage nodes have wrong timezone which makes it impossible for us to set any expire date less than 24h. With the v0.30.5 release tomorrow we want to collect some feedback about that problem and hopefully we can find a solution to get all storage nodes in sync. That would allow us to reduce the expire time down to 6h. That will reduce the delay to 1 day. (+6 hours). Even if we are quick we are talking about 2 changes and I don’t like to deploy it all at once. So more likely fix the timezone bug with v0.31 and reduce the expire time with v0.32 or later. (This only works if the timezone bug can be fixed by storage node operators and we support the transition with good documentation)
  3. Long term: Let the rollup job handle a traffic preview and a final result. For 7 days (assuming the other fixes woudn’t exist) the result will be a preview. After 1h the customer would see a good result but it might change during the next 7 days. If a storage node submits the same order twice the rollup job will decrease the preview result from yesterday and increase the result from today. The sum will stay the same. I don’t have an ETA.

what if anything can sno’s do on our end as far as the timezone sync goes, my node timezone is set to UTC

and because we are talking about a performance critical part here that also means number 7 only works after 1 is implemented. I would even expect that the developer will force number 2 as well. Otherwise the load on the database will be too high.

I have a forum post prepared for that. We can start as soon as v0.30.5 is deployed tomorrow.

Thanks for the update!

Assuming you still mean the last two days of the month are paid out / billed in the next month? Or is it now a payout / billing delay after months end?

I don’t think I’m following the long term solution you outlined. It sounds like rollups for the past 7 days could still change, so it would not be possible to pay out / bill those last 7 days in the same month again. Perhaps I’m missing something.

7? Is that supposed to be 3?