Changelog v1.17.4

For Storage Nodes

Graceful Exit Fix
The satellite keeps track of the last IP address of each storage node. When an uplink requests a list of storage nodes the satellite returns the last IP address so that the uplink doesn’t need to resolve any DNS entries. The same trick we will use now for a graceful exit. That way the graceful exit node doesn’t need to resolve any DNS entries. We believe this will reduce the number of connection issues caused by overloaded routers or network hardware.

Handling Corrupted Order Files
The storage node will now detect and handle corrupted orders or corrupted order files. Worst case it will skip an entire order file and just continue with the next one to make sure the storage node still gets paid for every valid order that it is still holding.

Untrusted Satellites in Payout History
With the removal of Stefan’s satellite, the corresponding line was missing in the payout history. The storage node dashboard will now display the payout but with an empty name. We hope this minimal solution will work for now.

Order Submission Phase 3
Over the last few releases, we have transitioned to a new accounting system. In the old accounting system, the satellite was keeping track of submitted and unsubmitted serial numbers. This was needed to reject double submissions but it was an expensive validation. In the new accounting system, storage nodes are supposed to group orders by creation hour and submit that batch once to the satellite. The satellite will write the sum into the accounting tables. If the storage node tries to submit an order twice, the satellite will notice that it has already an entry in the accounting table and reject the double submission. In the future (not the current release) we expect better performance and also better scaling. First, we need to remove some leftovers from the old accounting system.

21 Likes

“satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “error”: “order: unable to connect to the satellite: rpc: dial tcp 35.236.51.151:7777: connectex: No connection could be made because the target machine actively refused it.”, “errorVerbose”: “order: unable to connect to the satellite: rpc: dial tcp 35.236.51.151:7777: connectex: No connection could be made because the target machine actively refused it.\n\tstorj.io/storj/storagenode/orders.(*Service).settleWindow:454\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore.func1:412\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-11-23T14:25:56.414-0500 ERROR contact:service ping satellite failed {“Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “attempts”: 6, “error”: “ping satellite error: rpc: dial tcp 35.236.51.151:7777: connectex: No connection could be made because the target machine actively refused it.”, “errorVerbose”: “ping satellite error: rpc: dial tcp 35.236.51.151:7777: connectex: No connection could be made because the target machine actively refused it.\n\tstorj.io/common/rpc.TCPConnector.DialContextUnencrypted:107\n\tstorj.io/common/rpc.TCPConnector.DialContext:71\n\tstorj.io/common/rpc.Dialer.dialTransport:146\n\tstorj.io/common/rpc.Dialer.dial:116\n\tstorj.io/common/rpc.Dialer.DialNodeURL:80\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:124\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

looks like saltlake is down

We are migrating Kubernetes Clusters. What you are facing is a temporary DNS issue. This problem should disappear for you soon.

2 Likes

What happens if the storage node code keeps producing malformed order files? For example if some future bug corrupts every new order file?

Then the storage node would skip everything it can’t read.

Wouldn’t this lead to another wave of forum posts of “I haven’t been paid this month”? Just asking.

Yes that would be the consequence of your hypothetical future bug.

I do not know what do you have to do to currupt all order files.

Bugs happen. Just that. Don’t worry, it’s just a hypothetical for now. Besides, if Storj engineers believe that the risk of having a systematic error in order file generation is smaller than an occasional error, then it’s fine.

i think it is much bettery system, as today if there is corruption, then we discover is next month usualy, when there is smaller payout, and it is too late to submit this orders, in new one we will lost only currupted part.

2 Likes

Handling Corrupted Order Files sounds great, thanks. Though does anyone know why the order files get corrupted in the first place? This is too common to be one-off hardware issue and would be good to understand the cause. If sno stored data files with this error rate we’d all be disqualified by now.

Hard drive cache and/or file system without journaling are common mistakes.

2 Likes

Was there any evidence of this to be a cause?

I had corrupted orders on my few nodes which all have 100% audit/uptime and I’m using ext4 with Filesystem features: has_journal. What would I check to confirm the drive cache issue?

1 Like

I don’t know your specific setup. I am just saying what the common mistakes are.

Docker kill timeout would be another easy one.

Or power interruptions or other unsafe shutdowns. Unplugging an external disk unexpectedly. There are many reasons that can cause data corruption.

1 Like

Will this version be released automatically? Means my node will update automatically or do i need to stop the node, remove the docker image and relaunch the node again? Is this already in production? 'cause my node is still on 1.16.1

Rollout of version 1.17 isn’t done yet. Give it a few more days.

As long as you have the watchtower setup (or equivalent if you’re on Windows… I’m not sure how it works on Windows), it should update automatically in the coming days.

For GUI, Windows has storagenode-updater exe that automatically updates.

1 Like

got it on docker this morning

1 Like