Node restarting - Error malformed cache

Hello,

I get a weird error from a storagenode running in a docker container. The container keeps restarting. The drive seems to be ok, there are no SMART errors and I checked the file system with fsck, there are no errors. Here is the error message from the log:

ERROR failure during run {“Process”: “storagenode”, “error”: “Failed to create storage node peer: trust: malformed cache: unexpected end of JSON input\n\tstorj.io/storj/storagenode/trust.LoadCacheData:110\n\tstorj.io/storj/storagenode/trust.LoadCache:36\n\tstorj.io/storj/storagenode/trust.NewPool:91\n\tstorj.io/storj/storagenode.New:425\n\tmain.cmdRun:82\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267”, “errorVerbose”: “Failed to create storage node peer: trust: malformed cache: unexpected end of JSON input\n\tstorj.io/storj/storagenode/trust.LoadCacheData:110\n\tstorj.io/storj/storagenode/trust.LoadCache:36\n\tstorj.io/storj/storagenode/trust.NewPool:91\n\tstorj.io/storj/storagenode.New:425\n\tmain.cmdRun:82\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:393\n\tstorj.io/common/process.cleanup.func1:411\n\tgithub.com/spf13/cobra.(*Command).execute:983\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1115\n\tgithub.com/spf13/cobra.(*Command).Execute:1039\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:267”}
Error: Failed to create storage node peer: trust: malformed cache: unexpected end of JSON input

Any suggestions would be welcome.

you need to find trustcash file and delete it, it got damaged

1 Like

Thank you. Deleting the trust-cache.json file helped. Now the node is running. However I am getting another error:

WARN ordersfilestore Corrupted order detected in orders file {“Process”: “storagenode”, “error”: “ordersfile corrupt entry: ordersfile: checksum does not match”, “errorVerbose”: “ordersfile corrupt entry: ordersfile: checksum does not match\n\tstorj.io/storj/storagenode/orders/ordersfile.(*fileV1).ReadOne:215\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite.func1:275\n\tpath/filepath.walk:492\n\tpath/filepath.walk:516\n\tpath/filepath.Walk:587\n\tstorj.io/storj/storagenode/orders.(*FileStore).ListUnsentBySatellite:229\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders:194\n\tstorj.io/storj/storagenode/orders.(*Service).Run.func1:137\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78”}

and then a bunch of failed upload errors. Could I just simply delete files in the active orders folder?

Yes, you can delete them, but some usage would be unpaid as a result.
Failed uploads are normal, if they failed due to a long-tail cancelation and not problems with blobs or i/o or database.

Hi,

I’ve got the same error today on my node. Deleting the trust-cache.json helped my node start working normally. Do you know why this happens, and is there some way to fully prevent or auto solve it?

for me it happens so small times, that it not worth to think about solving automatically. I can happen when cache makes renew and you have sudden restart, then it break the file.

2 Likes

Hello @andrijajepro,
Welcome back!

And more interesting why on Windows in several times of magnitude more often than on any Linux (1-2 times for last several years) or FreeBSD (none).

Hi @Alexey,

It has happened once more on that node, but not on any other nodes of mine. Do you have any recommendation on how to make sure this doesn’t happen again, or just be solved automatically when it happens?

Please check cables, the power supply, the disk to exclude hardware issues and make sure to always perform a clean shutdown.
E.g. - use an UPS if you have power cuts, do not reset the PC, etc.

Hi @Alexey,
Thank you for your response!

This node is hosted on a VPS which also runs some experimental containers and other testing software, so it can happen that it has up to 10 restarts per day, but it’s still in the allowable uptime range. It would be difficult to prevent these restarts, so that’s how it is on that node currently.

Do you have any recommendations on some kind of auto solving solution or I should keep manually deleting the trust_cache.json when it gets damaged? I can’t check it instantly after the cache gets damaged, so sometimes the node hangs for multiple days, significantly lowering the payout. If it becomes that frequent, I will sadly have to turn off the node.

rm -rf trust_cache.json (and any other files that would be getting inevitably corrupted) on boot would be horrible and obnoxious but will work.

All of this can be avoided if you send node SIGTERM and wait a few seconds before power cycling. If you do all of that but still get corruption - you are not mounting storage correctly.

Obviously, if you force-kick the instance even once, let alone 10 time a way, it’s not suitable for running storagenode.

Why don’t you start a separate instance for storagenode, that you won’t abuse?

I still don’t see how experimental software and containers can cause restart of the host 10 times a day, let alone forceful power cycle.

Basically, you need to find an actual problem, and not attempt to apply bandaid on a bear attack victim.

1 Like

Hi,

Some of these restarts are caused by the hosting’s frequent DDoS attacks, which is not in my responsibility or ability to take care of in this case.

I will try running rm -rf trust_cache.json on boot of the VPS, currently that seems to be the best solution.

Of course it is! You can vote with your wallet and switch to provider that knows what they are doing. Hosting is ddodsed and customer instances are restarting, let alone abruptly, as a result? Lol. They are either incompetent or liars. I don’t know what’s worse. But what’s for sure is that you shall stop giving them your money and/or time.