Suspension for piecestore download fail on 1 week old node

fa91986743e69e22bc92 · October 21, 2021, 4:25pm

Looks similar to the last 2 entries on the pinnned “error codes” thread. Not really sure how to interpret the log. I have 3 other nodes on the same network with no issues all showing 100% audits/suspension values.

SGC · October 21, 2021, 4:56pm

order created to long ago sounds like the audit attempt wasn’t processed by the storagenode in time… i think its like 5 + minutes if not 30…

so that seems really odd, have you had downtime recently?

but really having audit errors on a week old node is extreme and weird, you didn’t accidentally use the same identity for the new node as one of the older ones perhaps… that would be my gutshot guess.

fa91986743e69e22bc92 · October 21, 2021, 5:07pm

I recall having to wait for the identity key to generate and then verifying it with the commands that output 2 and 3 respectively for each node Ive set up. I should also note the nodes have been set up weeks/month apart from each other. Maybe the timing response lag is from my hyper visors back up? Im using proxmox and having it back up the storj VMs(Just the OS not the storage drives) to a proxmox back up server hosted on a synology VM. Im unsure about the VMs functionality while its being backed up but I assume the first back up taken will have the longest downtime on the VM with subsequent snapshots being brief as they’re only looking for changes. If thats the case will this problem fix its self? I have not noticed an increase in the audit/suspension % yet for the affected node but its only been about 12 hours since I got the notice.

Alexey · October 21, 2021, 6:49pm

Hello @fa91986743e69e22bc92 ,
Welcome to the forum!

Setup the time sync tool or enable guest tools for your hypervisor to sync clocks with host.
By the way do not use snapshots to restore a node - it will be disqualified, because any snapshot is outdated on time of creation and will have missed pieces.

fa91986743e69e22bc92 · October 21, 2021, 8:53pm

I set up guest tools from this guide.
https://pve.proxmox.com/wiki/Qemu-guest-agent
What sort of time line should I expect to see my audit scores improve?

As for restoring a node, Would it be possible for me to restore the VM>Delete all storj data except for the identity file>Then install/reconfigure storj with the identity file? Im mostly backing up the VM to save me some time on configuring it in the hyper visor. I have all the identity files for my nodes backed up on my NAS as well. I guess I just dont understand how migration/reinstall would work. I can also make a generic storj template thats got all the virtilo drivers and guest agents ready but no storj data.

Thanks for the help!

Alexey · October 21, 2021, 9:34pm

The time offset ideally should be close to zero.

The identity without its data is useless. The data without its identity is useless too.
So if you going to remove data - you must remove the identity too.
The lost data will never recovered to the same node.
As soon as you start the identity without its data it will be disqualified.
So if you lost data or identity, you can only start a new node - with a new generated identity and clean storage.

To migrate existing identity and data you can use these guides: How do I migrate my node to a new device? - Storj Docs

fa91986743e69e22bc92 · October 21, 2021, 9:42pm

Thanks for the great replies.

To clarify my question about the time line for audit improvement, I meant the time frame to see my audit score % improve on my dashboard so my node can return to normal status. I assume your response was referring to the time sync offset between the hyper visor and VM being ideally close to 0ms?

Blockquote
The identity without its data is useless. The data without its identity is useless too. So if you going to remove data - you must remove the identity too. The lost data will never recovered to the same node. As soon as you start the identity without its data it will be disqualified. So if you lost data or identity, you can only start a new node - with a new generated identity and clean storage.

I wasn’t clear enough in my explanation I think, When I said deleting storj/data I was talking about the OS portion. My storj client data storage is configured as separate ZFS pools by my hyper visor and attached to the VM running storj as another virtual disk. Would it be possible in this case to restore the OS via a back up or fresh OS install then reattach the “old” ZFS storage to the new OS VM with the identity and be good to go? I do understand that if I lose my ZFS pools with the actual storj clients data on it I would have to start fresh regardless of the OS/VM/identity status.

How to migrate the Windows GUI node from one physical location to another? - Storj Docs???

Alexey · October 21, 2021, 10:09pm

There is no specific timeline to recover, unless you are talking about online score. The online score would recover during the next 30 days online.
The audit score may not recover ever, especially if your node lost data.
The suspension score may recover, but timeline is unknown - depends on have you fixed an issue or not.

The time offset between time servers and your VMs. If your hypervisor host is not synced with time servers, the VMs would not be synced too.
Since you use Windows VM, you can install this tool to sync time: https://www.timesynctool.com/

Yes, if you would use the identity from that data. I would recommend to move the identity to the disk with data to make less confusion.
Make sure that you would not restore from backup the node’s data.

fa91986743e69e22bc92 · October 21, 2021, 10:21pm

Looks like Im just in trouble with the suspension status. My audits seem ok. So if it is a timing issue I should see the suspension lift?

My original post showed 5 audit starts with 3 downloads.
I just rechecked and Im showing a 6 to 4 now, so Ive had a successful audit since trouble shooting?

Thank you for the program suggestion and all the help.

Alexey · October 21, 2021, 10:31pm

Only for online score.
In case of suspension score - the recover time is unknown. You should fix the problem first.
As soon as your node will start to pass audits, the suspension score would slowly recover.
If the problem would not be fixed - your node will be disqualified.

fa91986743e69e22bc92 · October 21, 2021, 10:33pm

Looks like i’ve had a single successful start/download when compared to my first screen shot since your help? Cant attach two images to a single post so this is in reference to my above posting.

Alexey · October 21, 2021, 10:35pm

Unfortunately the count doesn’t tell me anything
The text of the log will be much better, than image.
Please, use the copy button and paste the log between two new lines with three backtick (these: ```) like this:

```
log text
```

fa91986743e69e22bc92 · October 21, 2021, 10:43pm

C:\Program Files\Storj\Storage Node\storagenode.log:81499:2021-10-20T23:49:57.981-0700  ERROR   piecestore      download failed               {"Piece ID":
"DTKTPPWAIHJXH33WAL2QED2SJDKTBRWCX5APZJMJK5ZK46B3NZ5Q", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action":
"GET_AUDIT", "error": "order created too long ago: OrderCreation 2021-10-20 23:49:58.330921593 +0000 UTC < SystemClock 2021-10-20
23:49:57.9818483 -0700 PDT m=+473316.290201001", "errorVerbose": "order created too long ago: OrderCreation 2021-10-20 23:49:58.330921593
+0000 UTC < SystemClock 2021-10-20 23:49:57.9818483 -0700 PDT m=+473316.290201001\n\tstorj.io/common/rpc/rpcstatus.Errorf:87\n\tstorj.io/storj
/storagenode/piecestore.(*Endpoint).verifyOrderLimit:45\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:490\n\tstorj.io/common/p
b.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleR
PC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:104\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Serv
er).Serve.func2:97\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}
C:\Program Files\Storj\Storage Node\storagenode.log:81543:2021-10-21T00:29:52.489-0700  ERROR   piecestore      download failed               {"Piece ID":
"VPHDDHGWW2G3IGTRVQBLIAB34NRIHM3S5X45F3KSG272I3JCKIGQ", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Action":
"GET_AUDIT", "error": "order created too long ago: OrderCreation 2021-10-21 00:29:52.471923958 +0000 UTC < SystemClock 2021-10-21
00:29:52.4891865 -0700 PDT m=+475710.797539201", "errorVerbose": "order created too long ago: OrderCreation 2021-10-21 00:29:52.471923958
+0000 UTC < SystemClock 2021-10-21 00:29:52.4891865 -0700 PDT m=+475710.797539201\n\tstorj.io/common/rpc/rpcstatus.Errorf:87\n\tstorj.io/storj
/storagenode/piecestore.(*Endpoint).verifyOrderLimit:45\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:490\n\tstorj.io/common/p
b.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleR
PC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:104\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Serv
er).Serve.func2:97\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}

Alexey · October 21, 2021, 11:00pm

Your clock still not match the order creation time. The offset is m=+475710.797539201
Have you installed the NetTime?
If yes - please, sync your time. It’s not possible to have 2021-10-21 00:29:52.471923958 +0000 UTC less than 2021-10-21 00:29:52.4891865 -0700 PDT
The date should be 2021-10-20 17:29:52.4891865 -0700 PDT

fa91986743e69e22bc92 · October 21, 2021, 11:03pm

Alexey · October 21, 2021, 11:06pm

Please try to restart the storagenode service.

fa91986743e69e22bc92 · October 21, 2021, 11:15pm

Shutdown and restarted the VM after restarting storj from powershell and got the same output with the same off times.

Alexey · October 22, 2021, 6:06am

And the same time
Seems these are old records. On your VM the time is 4:12 PM. The timezone is supposedly UTC-7

So no new records so far. Now just keep it running - it should recover the suspension score every successful audit.