Restarted Server And Node Got Suspended?

So I had to restart my server because of an update and now I see this…


image

It was OK before the restart. It took only a few minutes. Is it just temporary? How do I solve this? Node is brand new. Earned only $0.14 so far. What do I do? Start a new node instead? :crazy_face:

Check your logs to get an idea why.

Does not say. I see some errors, though.

  WARN        contact:service        Your node is still considered to be online but encountered an error.        {"Satellite ID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "Error": "contact: failed to dial storage node (ID: XXXX) at address nodex.domain.com:28967 using QUIC: rpc: quic: timeout: no recent network activity"}

  ERROR        contact:service        ping satellite failed         {"Satellite ID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "attempts": 1, "error": "ping satellite: rpc: dial tcp: lookup us2.storj.io: Try again", "errorVerbose": "ping satellite: rpc: dial tcp: lookup us2.storj.io: Try again\n\tstorj.io/common/rpc.TCPConnector.DialContextUnencrypted:114\n\tstorj.io/common/rpc.TCPConnector.DialContext:78\n\tstorj.io/common/rpc.Dialer.dialEncryptedConn:220\n\tstorj.io/common/rpc.Dialer.DialNodeURL.func1:110\n\tstorj.io/common/rpc/rpcpool.(*Pool).get:105\n\tstorj.io/common/rpc/rpcpool.(*Pool).Get:128\n\tstorj.io/common/rpc.Dialer.dialPool:186\n\tstorj.io/common/rpc.Dialer.DialNodeURL:109\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:124\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

Oh just realized its a new node suspension also drops much faster on new nodes then it does on older nodes. It will probably recover if no other sats have dropped.

You can recover from suspension, although for your score to drop that quickly after a simple restart is concerning. Try

grep -i GET_AUDIT storagenode.log | grep -i failed

replacing storagenode.log with the path to your log file. This is if you are still running the nodes via native binary, which is the last I read.

Yes. It is a new node. Most probably just failed 1/2 audits or something. Should I wait it out or create a new one?

Yea I would just wait to create a new node its not that big of a deal, Maybe it was a permission issue?

OK. Will check. Thanks! I run it as a service but can grep the journalctl.

Should not be. Node was running fine in the last 9 days since start.

Does the node not output to a log file when run as a service? I would expect there to be a log file on the node drive, although admittedly I have not tried to run a node as a service myself.

No. Only if you tell it to. I’d rather use journalctl and vacuum it.

Yea you can set it to run a log but you can also not set a log in the config…

2 Likes

Yes. IMHO it is pointless to both write in journal and in a log file.

Well if you use loki to keep track of it, its not useless :slight_smile:

No. I do not, so I guess I am fine. :slight_smile:

Anyway, I will let the node run and monitor it to see, if it recovers. Would it get more audits from the satellite now that it is suspended from it?

But its easier to grep a log file then a journal though

Nah your node is so new that audits happen alot less.

Nah.

journalctl | grep foo > bar

Doesnt a journal get deleted after a reboot?

What do you suggest? Ditch the 23 GB it stored, generate a new identity and start a new one? :smiley: