My node isn't working anymore! Need help on debugging

Hi,

My 1 year old node is not working since this morning.
The container is running but the web interface is not.

Here are the very last logs I have from ‘docker logs’ command:

2020-09-11T05:01:01.502Z INFO orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs sending {“count”: 325}
2020-09-11T05:01:01.503Z INFO orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs finished
2020-09-11T05:01:01.503Z ERROR orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs failed to settle orders for satellite {“satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “error”: “order: unable to connect to the satellite: rpccompat: context canceled”, “errorVerbose”: “order: unable to connect to the satellite: rpccompat: context canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settleWindow:464\n\tstorj.io/storj/storagenode/orders.(*Service).sendOrdersFromFileStore.func1:422\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}
2020-09-11T05:01:01.682Z INFO piecestore downloaded {“Piece ID”: “B6NTLV7P5IMFADCU73ESZOG2XOQ7AFOFMUL56DBFLTJXBKM7NC5A”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Action”: “GET_REPAIR”}
2020-09-11T05:01:01.733Z INFO piecestore downloaded {“Piece ID”: “67YLEAW4WLRRH4JWPWRFQAKQPWCEAXK6LYEQLUWYCXS2UKBT7QRQ”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET”}
Error: debug: http: Server closed

Could you help me understand whats going on?
The disk attached to it is still mounted and seems to be OK (’#df -h’ says 82 GB availableon the attached disk).

Thanks a lot!

Please, try to reboot your router, then check your WAN IP on the main page of your router and compare it with IP on Open Port Check Tool - Test Port Forwarding on Your Router
If they are different - your ISP placed you behind the NAT. You can call them and ask them to fix the problem, you can say that you want to have an external access to your IP webcam (it’s usually understandable by support, do not try to explain that you need a public IP :slight_smile: ).

Make sure that your ADDRESS parameter has an external address with port, i.e. -e ADDRESS=external.address.tld:28967

If you uses a public IP as your external address, make sure that it is updated or use a DDNS address with updater instead.

Also, check your local IP, is it unique in your network? And make sure that the port forwarding rule still point to the right local IP of your PC with storagenode.

The next check - is your firewall. Make sure that it has the inbound rule for the port of the storagenode and have no outgoing rules. If it has an outgoing rules, then add another one to allow any traffic from the PC with storagenode to any port and any host.

Thanks for your reply.
Actually, the problem was related to my Docker engine. The container had an issue and its status was “unknown” ; it wasn’t possible to stop or delete it (even with docker kill command).
I restarted the whole server (a debian VM) and it works now.

But I had another issue: the disks mapped to my VM changed (e.g. : /dev/sdd1 changed to /dev/sde1). 2 different storj storage disks have been switched but one of the container kept its original identity file…
You know what it means: One of my nodes started with a wrong set of data. It hasn’t been disqualified so far but I think it will be in the next few minutes or hours…

Which lessons should you learn from it?
1. When you map your storage devices, always use UUID in the fstab (e.g. “f1ca2fca-895f-4066-91c4-8a33197e1284” instead of “/dev/sdd1”).
2. Always point to the identity directory located in the disk attached to your storj (i.e. avoid mapping to a local identity directory)

2 Likes