Changelog v0.14.11

Changelog v0.14.11

This storage node update is optional, not compulsory for the functioning of storage nodes. For Satellites, the update is required.

Storage node:

  • Drop certificate table - The storage node keeps track of the uplink ID it contacts. The implementation has changed so we don’t need that table anymore. We expect a performance improvement on uploads and downloads. The process of dropping the table will take about an hour—during this time, your node will be unresponsive, and the dashboard won’t open.
    https://github.com/storj/storj/pull/2498
  • Remove database locking - We’ve changed how database locking works. For write operations, the database still needs to be locked, but only for a short time. Read operations are now possible even while the database is locked.
    https://github.com/storj/storj/pull/2410
  • Limit concurrent uploads - Slower nodes like a Raspberry Pi3 are having a hard time getting any data. They were accepting too many concurrent uploads and were unable to finish them in time. In this new release, we added a config storage2.max-concurrent-requests: 10. This will allow slow nodes to focus on a smaller number of uploads and finish them as fast as possible while refusing the uploads it couldn’t process anyway.
    https://github.com/storj/storj/pull/2397
  • In memory used space/bandwidth tracking - The storage node needs to know how much bandwidth and disk space is free. Instead of querying the database every time, we calculate bandwidth and free disk space once upon startup and then keep it in memory.
    https://github.com/storj/storj/pull/2469
  • Change voucher log message - A storage node can get a signed voucher from the Satellite only if it’s already vetted and not disqualified. We corrected the log message in order to eliminate the confusing previous message that lead some SNOs to believe their node is disqualified when really it was only in the vetting stage.
    https://github.com/storj/storj/pull/2362

Satellite:

  • Repair checker use reliability cache - Instead of killing the database with too many requests, the repair checker is now caching the storage node reputation. This will speed up the repair checker and reduce the performance impact on the database.
    https://github.com/storj/storj/pull/1976
  • Faster Uptime checks - The discovery service was pinging all storage nodes and requesting additional data like the wallet address. We removed this ping because the second request will still tell us if the node is online or not. This allows us to check the uptime of all storage nodes more frequently.
    https://github.com/storj/storj/pull/2491
  • Fix repair trigger - The repair checker was too nervous and added too many segments to the repair the queue even if a storage node was offline for only a minute. The Reed-Solomon numbers allow us to be more tolerant now. We now trigger repair if a node is offline for more than one hour.
    https://github.com/storj/storj/pull/2490
4 Likes

Thanks for including PR links! Love that!

just so I do not misread it, if my node is offline for more than an hour I begin loosing Data? Just hypothetical, if I am offline for two hours with the smallest allowed node of 500GB, all my data could be gone, because it is theoretical possible that that amount could be rebuild in this time?

  1. What happen with my withheld?
  2. Did i get disqualified? I am not 5 hours offline.

This would need to be confirmed but, I think this is what happens.

  1. All pieces that drop below the RS repair threshold get put on the repair queue (so that’s a relatively small subset of all your data)
  2. The data will remain on your node until garbage collection cleans it up.
  3. You won’t be paid for that stored data until cleanup.
  4. I see no mention of the DQ criteria changing, so your node should be fine.

I don’t know who is paying for these repairs though. But I don’t think your escrow would be used in this scenario. Keep in mind that many other nodes had to fail for the piece to reach the repair threshold. Many of which likely won’t return. So their escrow would be used for the repair.

This is mostly speculation based on what I’ve read. So please someone from storj confirm. :slight_smile:

Theoretical yes. Practical you might lose a fraction of your data for each hour you are offline.

In the whitepaper model the plan was to disqualify storage nodes early in order to avoid any file loss. Allowing a downtime of 3 days would be impossible because we would lose files.
The current implementation has a big advantage. The downtime for repair and disqualification are disconnected. A storage node can go offline for 3 days without getting disqualified.

Note: The 3 days are a target I am fighting for. Don’t take that as an fact. We might end up with the same 5 hour window again

1 Like

Is the node be informed which pieces it can delete after coming back online?
Or do they just accumulate?

Thats why need clear and understandable rules for both sides. I am not fixed on one or five hours or even three days!

But for all must be clear there will be outtakes, and they can happen when i sleep, when this result for a large finacial loose for a SNO this would be bad for anyone which system just need an reboot.

100% correct. The part about the repair cost is a bit tricky. For our model we need to keep the repair cost low. If the storage node gets disqualified we need to repair everything and the price for that will be the hold back amount. If the storage node is offline for a few hours we don’t need to repair everything and are happy that the node is comming back online. That bill goes on us.

1 Like

Ok, then your changelog was misleading me. You wrote that if a node is offline for more than an hour, but the real condition is: (offline > 1 hour) & below the RS repair threshold.

This mean under normal situations in short time offline nothing will happen.

It would be great to get this offline allowance to a higher value, but have a system in place to maybe only a allow a certain percentage of nodes to go offline before a repair is immediately triggered or some other repercussions. But the point would be to allow a small number of nodes offline if the network has enough redundancy online to support it.