Order: unexpected EOF

sorry2xs · September 10, 2020, 11:42pm

@moby the files still exist unsent order old file on the desktop.

BrightSilence · September 11, 2020, 12:14am

That was quick! Nice job!

moby · September 11, 2020, 12:22pm

Another thing that would be helpful for us is if anyone could provide me with an example of a corrupted file that failed to send and is causing this issue. If you would like to help, please send a corrupted orders file to me at moby@storj.io
@donald.m.motsinger do you still have the corrupted file?

donald.m.motsinger · September 11, 2020, 1:46pm

I just sent you all 4 corrupted files I have.

moby · September 11, 2020, 2:14pm

Received. Thank you They should help a lot in my investigation.

Darthsaiber · September 11, 2020, 2:25pm

It is not clear to me how to solve this error in the log
What to do?

Windows 10 installation

moby · September 11, 2020, 3:18pm

Unfortunately the logs do not provide enough information yet to be able to tell you precisely what to do, but hopefully we can make them clearer in the future. Basically, one or more of your unsent orders files has been corrupted. There is one unsent orders file for every hour, but presumably the corrupted one(s) are before the log you posted above (2020-09-10T11:53:51.382-0300) - maybe you can narrow down which files it could be based on that.

An easier solution might be to move all the files out of the directory (don’t delete, just move), then put them back in one at a time or in batches to send them out - the node should be configured to attempt sending every 5 minutes by default, but it can be reconfigured.

By the way, since you have timestamps for the logs and can figure out when you first saw this error, you should be able to make the process a little easier by only messing with files before that point. You can do that by taking the last portion of the unsent file which looks like “1599692400000000000” (representing the timestamp of the order creation hour) and plugging that in here: https://play.golang.org/p/0HBazvcfYPa

Darthsaiber · September 11, 2020, 4:21pm

Once the unsent files causing the error have been found, what should be done with those files? Do I leave them there or is it fixed just by moving and returning?

sorry2xs · September 11, 2020, 4:23pm

what was the cause of the corruption

Darthsaiber · September 11, 2020, 4:53pm

i sent my corrupted files

moby · September 11, 2020, 9:44pm

By the time the fix is rolled out, those files will be expired, so I do not think there is much benefit in keeping them around. But the fix should submit any orders at the beginning of the file that have not been corrupted. Thank you for sending your files

@sorry2xs as far as I understand, corruption is always going to be a possibility when it comes to writing files like this, even if there are no bugs in the code. However, we are discussing ways of improving the order saving system so that the consequences related to this type of corruption will be minimal.

nerdatwork · September 12, 2020, 6:48am

Even after moving all unsent files to another folder. I still get

ERROR orders listing orders {“error”: “order: unexpected EOF”, “errorVerbose”: “order: unexpected EOF\n\tstorj.io/storj/storagenode/orders.readLimit:515\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:238\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:196\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore:398\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:192\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:139\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

I did stop the node, moved ALL unsent files to another folder then restarted my node but it didn’t work.

moby · September 12, 2020, 10:50am

I am really confused about how that could happen. That code path will only get run when there are order files > 1hr old inside the orders/unsent folder. How long after you restarted did this error occur?

sorry2xs · September 12, 2020, 10:51am

did you leave that other folder in the original unsent folder if take the new folder out.

nerdatwork · September 12, 2020, 11:05am

@moby The error shows up at every 5 min interval.

Its in a different drive altogether.

Update: Since nothing was working I checked databases and bandwidth.db had wrong # of indexes so I fixed it. Now it did send orders but also gave above error.

moby · September 12, 2020, 11:28am

Yes, once you see it for the first time, you will see it every 5 minutes, but I am curious how long passed between when you started the node after moving all the files out and seeing the error for the first time - it should have been at least an hour.

nerdatwork · September 12, 2020, 11:39am

moby · September 12, 2020, 3:37pm

This shouldn’t have anything to do with the bandwidth.db file. You will only see it if there are no more orders in the database to send. At that point, it will try to send any filestore order limits. So if you do ls <storagedir>/orders/unsent, how many files do you see?

nerdatwork · September 12, 2020, 3:45pm

I know but I did the steps SNOs before me performed and it didn’t work out hence my attempt to check everything.

As of now I see 10 files. I assume those are 2 files each of all 5 satellites for current and previous hour. I haven’t copied any of the older files since it starts giving up that error again.

cyjambo · September 13, 2020, 10:01am

Had the same issue and this solved it. Thanks guys!