Changelog v1.6.4

graphtek · June 25, 2020, 12:18am

For docker nodes ? For update

littleskunk · June 25, 2020, 12:22am

On a short question without understanding the background I will give a short answer. Yes.

The long answer would be: We haven’t published v1.6.3 on docker so there is no need to push out v1.6.4 right now. The number of windows nodes that have to deal with the v1.6.3 bug is relativ low. It could get worse with v1.6.4. → Slow rollout even for the bugfix. Maybe not as slow as usually but at the end I don’t see a reason to update the v1.6.3 nodes to v1.6.4 within a few minutes. We shouldn’t rush it and make sure the bugfix is working as expected.

BrightSilence · June 25, 2020, 1:11am

For the phased rollout it might be good to stick with the same seed and just reset the cursor so effected nodes with 1.6.3 get this version first.

Sasha · June 25, 2020, 5:26am

I take it, it will randomly start dropping “Used” serial numbers, rather than new serial numbers that haven’t expired yet within the 1hr.

Ideally these serial numbers should be dropped in the oldest timestamp first, rather than at random.

littleskunk · June 25, 2020, 8:07am

Nope for 2 reasons.

Tracking the timestamp requires a lot of additional memory.
If you drop the oldest timestamp first then I can cheat you and download my file twice without having to pay for it. If you drop a random serial number that might be a serial number from 10 minutes ago but from my perspektive it is very unlikely that 29 storage nodes dropped the same serial number. Even if you dropped it I can’t abuse it.

naxbc · June 25, 2020, 9:22am

I´m on the middle of a graceful exit on one of my windows nodes. I stopped the update service while the process doesn´t end. Right move?

BrightSilence · June 25, 2020, 9:25am

You should be able to update just fine. GE will resume after the update is done. If you’re on 1.6.3 I think it’s probably better to get that update as there is a known issue in that one. If you’re on 1.5.2 you may as well finish the GE on that version.

naxbc · June 25, 2020, 9:28am

I´m on 1.6.3
So, should I just keep the update service running?
Are we sure it won´t break the GE process? I have quite an amount in held…

BrightSilence · June 25, 2020, 9:33am

I’m sure that Storjlings have said in the past that a restart is fine and the process will proceed as long as you don’t stay offline too long. That’s good enough for me, but I haven’t done a code review on this. You’ll have to decide for yourself whether you want to rely on that. But the alternative is sticking with a version that has caused fatal errors on some nodes and took them offline. So that doesn’t sound like a great idea either. I would take the update when it’s available, but perhaps @littleskunk can advise on this.

kevink · June 25, 2020, 10:19am

Wouldn’t a ring buffer automatically drop the oldest orders without tracking any timestamps?
If the oldest timestamp is dropped, in what situation can you cheat me and download files twice which isn’t covered by the unlikelyhood of 29 storagenodes all dropping the same orders? (because then there must be a situation where all 29 storagenodes can’t submit the orders and are all dropping the same orders. So the only possibility I can think of is a satellite downtime of more than 1 hour)

littleskunk · June 25, 2020, 10:33am

serial numbers != orders

kevink · June 25, 2020, 10:36am

ok, guess I may not have understood what serial numbers and orders are about in that context. So I’ll just leave it at that.

naxbc · June 25, 2020, 10:37am

Should I just leave the updater service running in my windows node that´s running a GE?

Odmin · June 25, 2020, 10:41am

Could you please explain what is serials numbers and how it woking?
I think a lot of questions on this thread that was/will are gone after it…

anon27637763 · June 25, 2020, 11:51am

My understanding is that:

Serial Numbers are used by the customer to retrieve stored files.
Orders are submitted to the satellite for purposes of tracking payment for SN actions.

SGC · June 25, 2020, 12:58pm

it’s voodoo i tell ye… all voodoo
i think the whole idea with the serial vs timestamp is that each serial is unique and time stamps can be the same… thus you can utilize a timestamp from multiple hosts, while if a one time use unique serial is used, then it will only be able to be used one…

but i duno… it was just what i thought made sense when i read it… maybe wrong… not that i really need to know the details interesting subject… but really understanding the storj programming … just think about how complex your local storage solution is… this is just online using much more unreliable storage and higher latencies…

i can imagine the programming must be … complicate to put it mildly… might even end up putting zfs to shame lol… eventually… so understanding it without months of deep dives into the subject is unlikely…

BrightSilence · June 25, 2020, 2:25pm

I believe it’s like this. When a customer wants to download a file, it gets a signed download order from the satellite for the nodes it wants to download from. That download order has a serial number. The signature is checked by the node to make sure the satellite knows about this download and will pay the node for it. The serial is used to make sure that download can’t be requested more than once. Without a serial, the customer could use the same download order over and over again and the node would only be paid once.

The reason why removing a random serial from memory is better is mostly because there is never a moment you can be sure the serials aren’t in memory anymore for the nodes you want to download from. If after an hour or several hours you always remove the oldest one, you could determine a moment after which it is likely that all nodes have dropped that serial number and you can try again. With it being random there is always a high likelihood that several nodes still know about it and your download would fail, even if a long time has gone by.

Sasha · June 25, 2020, 11:20pm

How could the double spend / cheat occur if the node still retains orders for 48 hours as per above?
Did this new change create a possibility now of a random used serial to be double spent… Maybe this needs further review to not allow this to occur and should be in-line with the highlighted above statement of retaining orders for 48 hours. Randomly removing them creates the possibility of a double spend and I don’t understand why this was accepted where SNO’s wouldn’t be compensated correctly in this circumstance.

Surely instead of randomly deleting a used serial (which now allows that serial to be double spent) the better way would be to write the data from memory to SQLite DB serial, and free up the memory pool for the next lot. Process the memory > to SQLite DB as a regular dump, rather than deleting from memory and not updating SQLite that it was used.

littleskunk · June 25, 2020, 11:29pm

Do we now count chickens befor they hatch? I have added the monkit information and the config value to the changelog for a reason. If you are concerned about double spends you can adjust them and you can even verify the impact. I don’t see a reason to talk about hypothetical double spends. We have added the monkit data for a reason.

Sasha · June 25, 2020, 11:32pm

I don’t think it should be up to the SNO to decide what the appropriate value should be for the mempool for these serial numbers. StorJ should set this so that mempool is never full and a delete for a used serial never occurs because if it does, then that serial could be used again for a double spent. Instead of a delete it should be written to the SQLite DB and be included in the 48 hour window where orders are kept.